What does "real-time" mean in this context? Is it indexing database content in real-time? Is it in reference to the look-ahead, predictive query completion LinkedIn has?
What would compel someone like me -- a dev who has ownership over the very significant search piece of my company's primary product -- to give this serious evaluation?
The biggest conceptual difference seemed to be that IndexTank was specifically written to autoscale - it was designed from the ground up to run on cloud providers, and to instantiate new resources as needed. It also has no central point of failure.
Solr Cloud (and things like Solandra) deliver some of this functionality to Solr.
Lucene (which Solr uses as its index) cannot expose newly indexed data immediately after it's added.
Lucene exposes IndexReaders for searches, which offer a snapshot view of the index. In order to search across new documents IndexReaders need to be re-opened, a somewhat expensive operation. Expensive enough to prevent it from happening after each document is added, especially if they're added frequently.
The latest version of Lucene supports "near real time" search, but afaik it's not widely used (with Solr).
Definitely considering replacing our search backend at TwitchTV with this...
If you want to hear more, just ping me at emil@helpjuice.com
"IndexEngine: a real-time fulltext search-and-indexing system designed to separate relevance signals from document text. This is because the life cycle of these signals is different from the text itself, especially in the context of user-generated social inputs (shares, likes, +1, RTs)."
ADD: also, does it support non-english languages at all?
For anyone looking for a job at LinkedIn, making impactful contributions to this project could be a way in.
The indextank repo proper is interesting (and useful) enough, but indextank-service (https://github.com/linkedin/indextank-service) made my jaw drop a little. It's a full administrative stack for deploying indextank as a service.