After futzing for hours with XSLT and writing scripts to submit content via the REST API, I found out about FTS4 in SQLite, and was impressed by it's relative simplicity. I had something working in under an hour in Python.
- http://haystacksearch.org/ - http://code.google.com/p/pysolr/
http://wiki.basho.com/The-Riak-Fast-Track.html
It's a quick, interesting, and to the point into to Riak, including theory, installation, and usage. The Basho guys have seriously great documentation, I've enjoyed browsing through the Riak wiki despite not really having any live Riak deployments.
I'm wondering what these unique capabilities are. Speed? Smaller memory footprint? And I wonder what the reason is behind doing this. I'm all for C project but very curious as to why when Lucene was very well done.
We use mmap() heavily, and when running on 64-bit systems take liberal advantage of the giant address space. Using the system to do more of the buffering also allows us to have lightweight processes that can start quickly. We think there will be both speed and memory advantages in the long run. The other main difference is that C is much easier to integrate with other languages than Java. We're starting out with Perl bindings, but have plans for Ruby, Python, Lua, Tcl, and others. The goal is to offer a truly native interface from the language of your choice.
The degree of host language integration is wild. You'll be able to seamlessly subclass just about any part of the C library in any supported language. Nothing is ready beyond C and Perl, but eventually you'll be able to have your indexer in one language, and your customized searchers in a couple more, while all sharing the same shared system cache.
As for why? Marvin, the main developer started the project as KinoSearch at a time when Lucene wasn't really ready for prime time. He's been very interested in real time indexing, and at the time Lucene didn't handle this well. I got interested because I was looking for something lighter weight than Lucene, where I could try to blend the boundaries between search and database retrieval. Lucene had too many layers of abstraction for my purposes. A parallel might be SQLite and Postgres. Both have their place, but Lucy is more on the SQLite side of things.
easy bindings for every dynamic language. they start with perl
If Lucy can deliver the latest progresses in Java Lucene as a usable C library, that should be a very good news for me. Lucene still is the best choice for large data indexing and searching solutions.
Some parts will be leading Lucy, and some will be catching up. There's already increasing cross-pollination between the two. It's a very loose port at this point.
For everyone else: Lucy's apparently a full-text search library written in C targeting dynamic languages, with Perl bindings to start with.
The synopsis is quite elucidative. Just cpanm installed it and in 10 minutes had a program that indexes and searches a collection of files with highlighting. Looks promising!