4 months ago: " Meilisearch, open-source alternative to Algolia in Rust lands a $15M Series A"
It's not the first time I see, there are at least 2-3 daily submissions reaching the FP in this manner so I'm curious: "built in Rust" = marketing these days?
That expectation includes a few things such as stability and operational UX (ie how easy it is to run and maintain).
And these (in my experience as a Rust developer) stems from the fact that it's much easier to get the MVP and business logic taken care of becau I'm not bogged down by the drudgery of menial tasks that C++ imposes.
There's also a much lower "devtime" cost to adding UX in Rust than C++
Of course, this all holds equally true when comparing Rust to a higher level language like TypeScript and its rich ecosystem, but it does come at higher resources utilisation for the same task too (on average, maybe not always, especially after the code gets JITed).
Knowing Meilisearch is written in Rust makes me confident I can probably just run `./meilisearch` and get something working. I can also guess it'll be more resource efficient (CPU, memory) than ElasticSearch. I also *hate* ElasticSearch developer experience, and have had extremely good DX with Rust tools, so I can guess maybe their query language is saner. Maybe all this is wrong, but this is what I'm feeling when I see "written in Rust". So yeah, writing it conveys some meaning.
For me, "built in Rust" can be a real marketing argument. Indeed, Rust is a language that has proved its safety in the past. Building a technical product in Rust guarantees stability and safety (no memory issues in general) and performance (no garbage collector issue), so it brings more trust to the users.
Also I would recommend not conflating no GC and performance. There are lots of reasons for Rust being fast and many have nothing to do with no GC. The main reasons a lot of languages with GC are slower is due to allocating on the heap as opposed to the stack, and in general Rust does a lot of static linking and the compiler has the full amount of information to optimize calls without needing to move stuff to the heap. That's the main perf win.
Actually there are times when GC is more efficient than than automatically freeing memory because GC can batch cleanup work.
There's a number of technical people with decision-making powers that pay attention. And a part of them prioritize Rust-written projects.
It's a sound (literally) and safe investment.
I don't get the people getting ticked off by the "written in Rust" clarification. Can we finally stop pretending that all programming languages are equal? They absolutely are not.
I do assume any post with "written in rust" does better on hacker news.
Rust is currently in the process of trying to eat some of C++'s cake (as well as that of Java, C# or Go). The usual response from C++ (Java, etc.) devotees is that Rust hasn't been tried on large projects so it cannot be compared. Which absolutely makes sense.
Each large scale project that demonstrates that Rust can be used successfully in a domain where C++ (Java, etc.) traditionally rules is a step forward for the Rust community.
Also, as with every language, there is a hype period. We're currently in the Rust honeymoon. My personal honeymoon has stopped a while ago, but Rust remains my favorite language for the foreseeable future.
Based on historical data, a good lower bound for its future could be Ruby. According to TIOBE, Rust overtook Ruby in popularity, while Ruby has maintained roughly the same popularity for years. At worst, I expect Ruby to stay about as relevant as Ruby on Rails. But it doesn't look that way...
If a set of users are using a product only because it is built in X, that user base is most likely the early adopter audience for X and it dangerously masks whether that product has product-market fit or not.
So if a product markets itself as built in X, it is appealing to early adopters of X.
The long-tail of users on the other hand, care more about what painful problem the product is solving for them.
Now, some of the features of X might provide benefits to end users, but the long tail of users care more about those benefits they get rather than the fact that X provides those benefits, and that the product uses X.
It’s an open-source search engine which one could self-host. Language and tooling matter a lot to me and is often deciding factor.
First of all, Rust is relatively new so this tells me that the codebase is likely new.
Secondly, I think rust tends to attract smart people who like programs to be small and fast. Case-in-point meilisearch is a simple, single binary download.
Both of these together indicate that a project has a higher chance of being freshly written code, by smart people, that is small and fast.
Before I get a bunch of we’ll actually’s, I’m not saying these things are true 100% of the time.
- It's fast. - It's safe. Or more specifically, memory safe, which implies that it will be harder to compromise than similar products written in a different language.
Also, these two points are not hype.
Seems unnecessary to jump right to cult following. Not "cult" but RSS following is often the case where keyword in title makes the difference. I wonder why does it bothers you if it is not relevant for you. What is your problem? Why can people let others be?
> It's not the first time I see
Obviously and your comment is not first complaining about that title contains "implemented in X".
More broadly, if there had been two headlines on the front page today and the other said "Open source search engine written in Node / JS" I would make assumptions about the 7 million dependencies and endless security updates in every single one of them that I'd have to monitor. Obviously I would also skip straight past that one. So yes, the technology choice is important.
There is ongoing work to strengthen this. I do not know the status.
Built in Rust means no annoying GC pause, and that's important for a database.
And it also hopefully means less on-heap abuse.
https://docs.meilisearch.com/learn/what_is_meilisearch/compa...
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...
Also, we have to keep in mind that every comparison written by a company is always oriented.
My test data set is 1.5M doc * 3-10 fields * 10-50 characters. Meilisearch has slightly better multi-language support, but typesense is much better on batch reindex speed and ram usage while a bit shy on supporting asian languages. The query speed is similar in light to medium load, I didn't stress test on query.
That being said, our cluster is much smaller than other ones I’ve worked with in the past, so I can’t comment on its reliability at massive scale. I’ve also been very impressed with how active contributors are on GitHub and in their Discord. Everyone seems like good people, and it’s a project I’m excited to keep using.
80% of ElasticSearch's value add (wrt search anyway) is all the clustering and frame work that allows you to span the search over tens or hundreds of machines "easily".
I think the same is true here. Probably the comparison should be with the underlying search libraries that ES sits on.
I suppose this comparison makes sense in a world where most people don't run their own servers much any more since the clustering etc would be a problem for the cloud offering and not the consumer.
Or configurability. I looked at this again now that 1.0 is out, but besides the .NET client still being in an alpha state, it’s also very zero-configuration. There seems to be no configurability regarding tokenization strategies, for example.
Now, I certainly see the appeal, I barely understand my own ES code and meilisearch replicates probably 70% of it with no configuration at all, that’s impressive, but it also means that switching would mean giving up on those 30%.
https://docs.meilisearch.com/learn/advanced/storage.html#lmd...
>For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk, so all the data structures can fit in memory.
> [...]
>It is important to note that there is no reliable way to predict the final size of a database. This is true for just about any search engine on the market—we're just the only ones saying it out loud.
Looks like a 10MB document is taking ~200MB, from their docs. I don't think that scales linearly though, since it's a reverse index it is going to scale based on the number of unique words it finds, with each document adding a bit on top of that. You'd expect it to have a pretty big index to cover common english words, and then each document adds a bit on top of that.
Definitely seems like somewhere they could make some improvements though. Some transparent compression could probably help, and with zstd's dictionary feature it can be fine tuned to the data they're actually seeing.
Not about to replace xapian in kiwix (offline wikipedia reader) any time soon, I think.
With manticore, we've tried to run into these issues in benchmarks, but the only problem we got was temporary high IO load when indexes need to be re-indexed with new or changed documents. In total it's at 50-70% of the RAM usage compared to Meili.
We'd be happy to re-visit, but looking at the docs - it seems to be about the same as it was back then (a year ago).
Would be nice if you could check a query and then start the instance with an appropriate memory configuration.
I was hoping the cloud version would be more appealing, granted there seems to be a generous free tier but the next option is $1200 a month?!
just noticed you don't get high availability on free tier which sucks, but I guess if search is mission critical to the point you need it, you would be willing to pay. Most of these database type companies start off targeting enterprise and then roll out self-serve solutions as they scale.
Maybe I will try out the cloud version then even though I expect my site would probably be well in the free tier limit, like I said it seems like a very generous tier.
I'd love to use Meilisearch as you describe, but their so-called SDKs are just for clients, so you still need the Meilisearch server listening on localhost.
I would love to see something like SQLite based on Meilisearch (i.e. a fully self-contained search library like https://github.com/mchaput/whoosh). Do you know if such a thing exists?
My use case is that i want to start creating some indexes that are "per-user" and some "per-company" where a company(customer) might have many users. This is to do some sort of double tenant isolation. I will create different keys that have permission to specific indexes and deliver those to the user somehow. My current solution does hacky things with Elasticsearch like adding query filters by user/company-id attributes in the background automatically. But since meilisearch would be customer facing, i need stronger guarantees around permissions per index.
I tried this out a year ago on Meilsearch locally, but haven't stress tested it by creating thousands of them like production.
Or is there a better way to do this. This is also a reason where memory-only systems like Typesense didn't make sense to me. I'm fine with taking a performance hit by going to disk to pull the right index. Not every index will be used all the time. I might also look at sharding/partitioning features if present.
Unfortunately the performance of indexing (constantly changing records) wasn’t great and Meilisearch would fall behind on indexing records for hours.
Meilisearch has been amazingly great for projects where records don’t change all that much (eg docs, or even a customer database), but if you have for example a fast paced ecommerce system with 50k records constantly changing (eg product inventory), it falls over pretty quick. We had to transition over to Elastic for this aspect of our app.
The other issue we faced is their Rails gems falling out of step with the server, and when fixes came out, the Rails gem was incompatible for a while.
I really really hope 1.0 increases performance to the point where it becomes production ready, because the initial out of the box performance (before getting bogged down with indexing) was pretty amazing. Better than Elastic and on par with Algolia.
I recommend keeping Meilisearch on your radar. It is going to be great.
I wish the best for the Meili team and hope they succeed!
We did a lot of improvement to the indexing part of the engine and now can auto-batch updates which gaves incredible improvements. We will continue to work on this in 2023. Can I know the version you were using?
Other than that, it is simply great. Ranking stuff is great, simple, I only need custom weights there, some additional functions (not just asc/desc) and it would be perfect.
There's also many search libraries if you want to embed search more deeply into your app. I have a list of modern search systems and libraries here: https://manigandham.com/post/search-systems-libraries
If Algolia would offer an instance based pricing on cpu, ram and storage they would be the clear winner imho.
The vast majority of small/medium customers would rather pay-as-you-go than maintain a fixed cost instance, and it allows Algolia to efficiently pack them into a multitenant architecture instead of wasting resource overhead.
Been following along for a while and it's a great project. ElasticSearch needs some competition.
For us, there are two things missing for us before we could make the switch:
1. Multi-index search; Standard use-case is searching across e.g. users and companies. Common in many SaaS-applications, where you want a single search field with type-ahead for e.g. contacts/organisations/tasks/events.
2. Decay functions; Basically to gradually phase out results for things based on age, distance or something similar. ElasticSearch has pretty good support for these. https://www.elastic.co/guide/en/elasticsearch/reference/curr...
Are points (2) through (4) true? Has any of the points been an issue for you in practice?
About (2) we will work on exposing two new ranking rules to be able to control that.
For (3) I thought it was fixed.
We decided to implement (4) the PUT and POST this way after looking how others were doing that.
The only real thing I am missing is a typeahead feature.
wow your project looks very interesting. How do you handle things like the filesystem changing while your indexer is offline? Do you reindex from scratch at startup?
Regarding typeahead, is this what we call "query suggestions"[1]? At the moment, we think that this is something that frontends and SDK can provide rather than the engine, so that means you wouldn't find it at the Milli level. We think you could maybe build an ancillary suggestion index and make two queries instead of one when typing, so as to get both results and suggestions at once.
Here's a chat link[2] to our latest discussions on the topic; feel free to come and weigh in if you're interested!
[1]: https://roadmap.meilisearch.com/c/31-query-suggestions
[2]: https://discord.com/channels/1006923006964154428/10685073658...
And yes, query suggestions are exactly what I mean. Thank you for informing me, I guess I will have to look into how I can make it myself :-)
The "Comparisons" page says there is no limit for number of indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)
However, the "Limitations" page says there is a limit of ~180 indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)
Can you clarify what, if any, are the limitations of # indices?
I would like to know the use case for needing more than 200 indexes. We have handled multi-tenant with a single index and multi-tenant tokens. https://docs.meilisearch.com/learn/security/tenant_tokens.ht...
But I'm not sure what's behind the claim that "it supports all languages", aside from handling unicode? Does it support stemming at all? Does it have customized stop words per language?
I'm excited to see all the things they'll build in the future.
SQL queries, asking for records based on something like field a has to contain b or something like that are easy to formalize and fulfill by an RDBMS. But the SQL queries get hairier and hairier when the query involves multiple fields or even multiple unrelated tables. Or free form text. And those queries are harder to index.
On top of all of that, Humans often want things sorted in an order that isn't straight-forward to express in SQL. What is "relevancy"? All of that can be done in SQL, but it's not what RDBMS engines shine at.
Both Meilisearch and Typesense are really different regarding resource consumption and performance. I would say that where Typesense would have a better indexing performance (Meilisearch has recently improved indexation speed), Meilisearch will guarantee a much faster search performance while keeping impressive relevancy. Regarding the consumption, as Typesense is entirely on RAM and Meilisearch is using memory mapping, Meilisearch would take more disk space but less RAM.
https://pulpflakes.com/fmisearch/
It's a search over an index of fiction in the English language, first published in periodicals. Searchable by author, artist, magazine name and specific issue. Biggest index has about 200K documents, doc sizes are tiny.
Integrated with my WordPress site by handwritten PHP. Which was fun.
Performance is great. I didn't run into too many issues, and those I did i could resolve. What i remember:
1. The rules for text searches are too strict by default and if the order of words is different, will result in no matches. A, B will not return a result if B A is in the database.
2. Creating an index, uploading documents and changing settings required quite a bit of work. A week's worth of coding, almost. Would have loved to have a reasonably robust shell script that could take a JSON file with metadata on index and do the grunt work.
3. I have multiple types of documents, would have liked search to cover all of them so I don't have to change search type manually each time.
4. The default number of documents and max uploaded file size is too low. 200K and 200 MB or something. But it fails even on smaller file size.
The above sound like complaints. They're problems I ran into and others might. I love how productive Meilisearch made me. Thank you.
This feature was a student project, and I'm not sure if it will find its usage. If you are using Meilisearch with ClickHouse, or if you think this feature is worth something, please let me know.
I found this issue which tracks crates.io publication: https://github.com/meilisearch/meilisearch/issues/3367
Would be nice to see that made a priority. Having a powerful search engine that can be embedded in a larger application and made portable (like being able to deploy to WASM) would be extremely novel and valuable. Given Rust is already in use, I think it may not necessarily demand too much effort. When search becomes a focus for what I’m working on, perhaps I will make that happen if not already done yet.
Thanks for making this available to people.
P.S. great to see your documentation search is powered by your own product (!)