Are you able to compare, say, Wests results to those of a pure google text search on the keyword terms?
[ To do that you'd need some example of large legal texts fully online and thus indexed by google - I dont know if that exists ]
Its sometimes hard to discern the value of the tech versus the quality of the implementation + usability factors - but your observations are interesting. I wonder how search on medical information compares...
gord.
I haven't thought of a fair test to run yet. The two engines do different things. My West can search case decisions, statutes, administrative codes, briefs filed to the court, secondary sources (sort of the research paper of the legal field) and the news.
So I tried doing a search on the news only. I searched "ycombinator" and the results returned are news articles only, whereas on Google someone probably wants and gets the YC home page, this site, or the actual function. None of those show up on the West site.
Then I ran a search of these terms on each (I didn't enter quotes on the actual searches): "massachusetts custody modification"
On Westlaw, I get cases, and statutes on point. With extra terms I will easily get to cases that deal with my specific issue. On Google, the first link is a divorce resource site and the rest are for lawyers.
Searching statutes might work. But the main reason statutes search well on Google is the Cornell Law site. The quality of results for statutes is probably a bigger testament to them and their cataloging efforts.
I would say both search engines hit their target markets well. Most people searching "massachusetts custody modification" don't want 20 decisions of the Mass SJC. And people searching the same on Westlaw don't want attorneys. Google is much much faster though. It returns in a fraction of a second. Westlaw took about 12 seconds to return 10,000 hits. First three hits were decided yesterday, which is pretty cool.
There is a group of people creating an open legal database. I can't remember its name. I think they are based in the San Fran area. I think it was started by some hacker that worked on opening up some other government data and is now on the court system. I have the bookmark buried somewhere and of course can't find it. Does anyone know which one I am talking about? We could maybe test that database versus the commercial West one.
I'm surprised the big G hasn't just paid some money to get that data, given their plan to scan all the worlds books.
I wonder what percent of all text is legal or medical.
There certainly is a lot of legal text. Lawyers certainly are good at creating volumes of paper. For example, the Supreme Court just decided a case, Wyeth v. Levine. It will be recorded in volume 555. So to date the Supreme Court decisions have filled 554 volumes of 1000 pages each. And that is just one court. Every state court, state appeals court, and state supreme court, federal court, land court, etc has similar volumes and page counts.
And all of this is just the primary sources. Once you add secondary sources, aka books and papers written by learned scholars on individual topics or cases, the number of books and pages increase by orders of magnitude. And we still haven't archived any statutes (those go on forever, for each state) or any administrative law. And each one of those has comment sections that go on for pages whereas the actual rule is only a paragraph.
I wonder what percentage this is, too. I bet it is still extremely small compared to what the rest of the world has produced. There are so few law writers when compared to all other writers.