I'm able to host a service on a Pi at home with full-text search and a regular peak load of a few rps (not much, admittedly), with a CPU that barely spikes above a few percent. I've load tested searches on the Pi up to ~100rps and it held up. I keep thinking I should write up my experiences with it. It was pretty much a drop-in, super-useful library and the team was very responsive with bug reports, of which there were very few.
If you want to see how responsive the search is on such a small device, try clicking the labels on each story -- it's virtually instantaneous to query, and this is hitting up to 10 years * 12 months of search shards! https://progscrape.com/?search=javascript
I'd recommend looking at it over Lucene for modern projects. I am a big fan, as you might be able to tell. Given how well it scales on a tiny little ARM64, I'd wager your experiences on bigger iron will be even more fantastic.
I wanted users to be able to search their backups. As I’m using Rust Tantivy looked like just the right thing for the job. Indexing happens so fast for an email I did not bother to move the work to a separate thread. And search across thousands of emails seems to be no problem.
If anyone wants search for their Rust application they should take a look at Tantivy.
I basically just need a fulltext search.
https://www.postgresql.org/docs/current/textsearch.html
https://www.crunchydata.com/blog/postgres-full-text-search-a...
https://github.com/paradedb/paradedb/blob/dev/pg_search/Carg...
after listening
Extending Postgres for High Performance Analytics (with Philippe Noël) https://www.youtube.com/watch?v=NbOAEJrsbaM
And inside of the main thing - Quickwit(logs, traces, and soon metrics) https://github.com/quickwit-oss/quickwit
Had a surprisingly good experience with combined power of Quickwit and Clickhouse for multilingual search pet project. Finally something usable for Chinese, Japanese, Korean
https://quickwit.io/docs/guides/add-full-text-search-to-your...
to_tsvector in PG never worked well for my use cases
SELECT * FROM dump WHERE to_tsvector('english'::regconfig, hh_fullname) @@ to_tsquery('english'::regconfig, 'query');
Wish them to succeed. Will automatically upvote any post with Tantivy as keyword
Perhaps most importantly, separation of compute and storage has proven invaluable. Being able to spin up a new search service over a few billion objects in object storage (complete with complex aggregations) without having to pay for long-running beefy servers has enabled some new use cases that otherwise would have been quite expensive. If/when the use case justifies beefy servers, Quickwit also provides an option to improve performance by caching data on each server.
Huge bonus: the team is very responsive and helpful on Discord.
[0] https://github.com/hound-search/hound
[1] http://swtch.com/~rsc/regexp/regexp4.html
Different use-cases for alternatives to Lucene depending on your needs.
The only way to add fields is to reindex all data into a different search index.
The java sdk of meilisearch was also nice, same: no need for a cli and manual configuration. I just pointed it to a db entity and indexed whole tables...
Would love that for tantivy
Yes that's how you use tantivy normally, not sure which json config you mean.
tantivy-cli is more like a showcase, https://github.com/quickwit-oss/tantivy is the actual project.
Some of us have specific principles of which things like opt-out telemetry might run afoul.
OP will choose their software, I choose mine and you choose yours; none of us need to call each other petty or otherwise cast such negative judgement; a free market is a free market.
Mad props to all the team! I'm a firm believer they will succeed on their quest!
There are cases where this will probably never be possible (fields with lossy indexing where the datatype's indexing algorithm changed), but in many cases all the information is there, and it would be really nice if such indexes could be identified and upgraded.
[0]: https://wynds.com.au [1]: https://github.com/quickwit-oss/tantivy-cli
other than its runtime characteristics, the codebase is well organized and a great resource for learning about information retrieval.
You have a big list of separate libraries providing support for a variety of languages? Great. Unfortunately that doesn't help me make a real multi-language app though. Doing that work right now, with multiple indexes and routing the query, seems very difficult.
- Their website :)
Often you take the results from both vector search and lexical search and merge them through algorithms like Reciprocal Rank Fusion.
Which works great when you want to retrieve documents that actually contain the specific keywords in your search query, as opposed to using embeddings to find something roughly in the same semantic ballpark.