Clustered ElasticSearch has been rock-solid for me and I've used it in anger many times. The level of maintenance needed is close to zero, both initially and long-term. Compare that with the abysmal experience of setting up a sharded MongoDB cluster for example...
Please enlighten me how ElasticSearch is "a lot of work to operate" (heard that one multiple times), and what you're comparing it to.
There are so many switches and dials to tune, and unless you really learn it in depth, you won't know which ones you need. It's difficult to even determine what hardware requirements you have. And it's a hard sell to tell your business guys "I think elasticsearch will work better if we give it more... CPU? Memory? Disk speed? I'm not really sure." and can't provide any concrete metrics to back that up.
Another place where footguns abound is upgrading from one version to another, especially if you've got plugins installed. There are tricks that you have to learn the hard way.
At this point, I think long and hard before reaching for a solution like elasticsearch. If I've got a DBA whose entire job it is to master the tech and wield it expertly, that's one thing. But if I'm part of an early stage startup, I just can't justify the lost time and potential for catastrophe.
But that's true for any data store. This isn't any different. Nor is an RDBMS. They all need HA/replication. And that is rarely trivial.
Honestly, I think this is why managed/hosted solutions (AWS RDS for example) are so popular - they remove a large part of the complexity for you.
Personally I don't understand why there are so few search libraries/systems to choose from, given that "search" is one of the fundamental pillars of CS.
First time it happened for me and I was pretty angry at it
Here is a little comparison to enlighten your questions: https://docs.meilisearch.com/resources/comparison_to_alterna....
[1] https://vespa.ai
I run a few 10TiB ES clusters (which, is not much to be fair) but infrequently find that I have to reindex or reshard the cluster because I can’t just add another node. There’s something to be said for understanding the index rotation too, and access patterns.
It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.
- if you consider that “it’s slow” is what you have to debug.- Hey, a node died! - Run Terraform to stand up a whole new cluster and restore it from a snapshot. - Update the app to point at the new cluster. - Run Terraform to delete the old cluster.
I'm pretty happy with this arrangement.
This is exactly it. This is a problem you encounter with every database engine, but in most of them you can quickly find the bottleneck and fix it. With elasticsearch... it's a frustrating and expensive game of trial and error.
For information, what does "10TiB" refer to in this context?
Is it the size of what ES takes in RAM, or the size of ES' index, or is it the total size of the corpus that ES must index? Or corpus size + index ?
In all there were about 50k documents, and we mostly cared about the title field. Elasticsearch would randomly bloat up to occupy a huge amount of RAM. Restarting it would make it work for a few days. It would also occasionally crash.
We got rid of it and went with some levenshtein distance based database query
I'd love to use it again sometime but the experience was not good, and Googling for information brought up all kinds of very complex use-cases shared by others
It will take you something like 10 minutes to start and populate MeiliSearch, you will be able to test it just by going to the server HTTP url in no time!
I implemented the student facing course catalog web interface for a single org. One of the funnests (most fun) parts was the heuristics in the query parser. Like patterns for recognizing course numbers and boosting those exact match results. Really helps you appreciate all the fit & finish that goes into proper search engines.
This was the olden days, when we just used Lucene directly.
The problem is that ES is deceptively simple to operate. As millions of people who have found things like their medical records shared with the world can attest.
I have a database with 15k documents, each with around 70 pages of text, HTML formatted.
I'm using ElasticSearch currently, with the Searchkick gem.
30 min playing with MeiliSearch. So far:
- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;
- Blazing fast to search, at least 3x faster in all my random tests so far;
- Literally zero config;
- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.
- The docs state that `Only a single filter is supported in a query`. This is kind of a dealbreaker for my use case, since I need at least a `user_id` and a `status` filter. ElasticSearch can work with multiple filters. Also, don't understand why you call it `filters` instead of `filter` then. Are multiple filters in the roadmap?
- My search UI has a sort by `<select>`, where you can choose, for instance, `last updated asc` or `last updated desc`, amongst others. In my understanding, that would be cumbersome with MeiliSearch, since it would require (1) a settings change to alter the ranking rules order beforehand [0], which would not even work in production due to race conditions or (2) maintain multiple indexes each with a pre-defined ranking rule order and switch between them depending on the UI criteria?
- As an extension of the last question, I see that a lot of what you call "search settings" are considered by ElasticSearch query parameters. For instance, I can easily query ES for the title or description fields just by setting that as a parameter. In MeiliSearch that would require a change in the index settings beforehand, right?
PS: The docs, specially in the Ruby SDK, could use some work in the filters section. It took me a while to understand I should pass a string, like index.search("query", filters: "user_id:3"). I was trying a hash like `filters: { user_id: 3 }`.
[0] https://docs.meilisearch.com/references/ranking_rules.html#u...
- Currently, we only support single filters. The multiple filters option is coming soon. https://github.com/meilisearch/MeiliSearch/issues/425
- Custom ranking rules on the fly is something imaginable on our solution. We didn't do it yet because it complexifies the search query parameters. We are waiting for feedback like yours to implement this kind of feature.
- To return only the field you need, it's already possible during the search https://docs.meilisearch.com/guides/advanced_guides/search_p.... To restrict attributes to search in during the query. We had this feature on a previous version. But like the last answer, no one used it, and it complexifies the search query.
I went looking, but found nothing regarding any operations management.
* How does this scale?
* How is it monitored? Where do I get the metrics for it? (indexing performance, search performance, etc.. Stuff not found in the OS)
* Are there any kind of throttling or queueing capabilities?
* What's the redundancy/HA approach?
* I'll ask about backups, though its the least of my worries as indexing databases like this and ES should be able to be rehydrated from source. However, snapshots may be faster to restore than reindexing.
This might be a nice local dev tool for something, but I'm not sure how you run a business critical application with it? I'm wondering if I'm missing something.
Edit: formatting
Edit2: also wondering about security too
* 2 parts.
- Vertical scale: We use LMDB as a key-value store. This one uses the power of memory mapping. It made our search engine use mainly the disk and will do not need a machine that will have TB of RAM.
- Horizontal scale. We are working on sharding and replications (Raft). Development is progressing well, and the functionality should come out soon.
* Currently, it is not monitored at all. This feature is planned. https://github.com/meilisearch/MeiliSearch/issues/523
* We use a queue for updates. You can find here the complete guide https://docs.meilisearch.com/guides/advanced_guides/asynchro...
* As I said previously, we are working on HA with a raft consensus.
* We will add snapshots in no time (disk folder saved in s3). A little more time for backups (version agnostic, need indexing).
We are already working with Louis Vuitton on an application in production. The app is in production from 9 months, and there hasn't been a single problem.
https://github.com/typesense/typesense
It supports multiple filters and has HA for reads as well.
You’re always bounded by max single shard latency AND by coordination latency.
Ignoring how expensive it would be, over-sharding and over scaling (I.e. low volumes of data per shard and low shards per host) could reduce max single shard/host latency, however it’ll increase coordination latency but also memory (which directly or indirectly will cause more coordination latency).
Perfect data per shard and perfect shard per host numbers are currently an unsolved problem. They heavily depend on the domain, I.e. data types, data volume, data ingest, mappings, query types, query load.
:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!
But probably 99% of users using ES don't need sharding.
Here's a public list of search projects (in rust, c, go): https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...
Or try Seq which is a log-focused system: https://datalust.co/seq
Not that I'm complaining - I love LMDB, and it's been rock solid and bug free in my experience (thanks, Howard!) - but it's low level C, not rust, and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks, be aware that you are not completely getting it.
But other than that: Thanks! This looks like a great project!
> and safety implied by "made in Rust" is not, in fact, guaranteed
Just about every Rust program depends on some C code, usually at least in the form of a libc. So you could lodge this criticism against almost every Rust project.
> and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks
Rust's safety story covers neither race conditions nor leaks.
It covers a type of race condition, namely unsynchronized concurrent access to memory.
that would be a first
In this way MeiliSearch is comparable to ES, especially for site search and app search working out of the box as standard with its http api.
MeiliSearch does not offer distribution yet, but it is something the team is working on :)
Also for bonus points there is a distributed version of Solr called Solr Cloud.
When I first read the title I thought it might be a Rust based Lucene engine or something, and thought that would be pretty cool. Though no idea how that would work. On its own, this is a pretty nifty little tool, however I think the framing as an ES alternative is what feels wrong to me, and apparently others in the comments as well.
So it's not "an alternative to ES in general", it's "a thing designed to be an alternative for a subset of ES use cases", and the comparison document is pretty clear about this.
https://docs.meilisearch.com/resources/comparison_to_alterna...
This overhyped description coupled with on-by-default analytics suggests to me MeiliSearch should be dismissed regardless of potential usefulness or technical merit.
"We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch. We only send the platform on which the server runs once by day. No other information is sent. If you do not want us to send events, you can disable these analytics by using the MEILI_NO_ANALYTICS env variable."
Do you have any information on security topics like using TLS, client authentication, etc?
I've no issue with offloading SSL to a different process though, I tend to prefer doing that anyway a lot of the time.
1. exact match this nested JSON field (with support for lists of values)
2. negative match this nested JSON field (with support for lists of values)
coupled with the ability to filter by "timeframe", then pump it through to visualizations (tables/graphs) in Kibana
MeiliSearch would be cool if it spoke the API Kibana expects from Elasticsearch
There are tons of easy setup examples but they lack access control and encryption. All of my servers must write logs. When one of them gets cracked, the attacker must not be able to read all the other servers' logs and steal all the PII. An attacker can use an ARP attack to MITM server connections to Elasticsearch. Without encryption, that attack yields all the PII.
I hope Meilisearch can someday help fill this gap in the free DevOps toolset.
We're currently leaning towards Manticore Search[1], which is a fork of Sphinx Search[2].
It would be really nice to be able to point tools like MeilliSearch or ElasticSearch to a data location and have it index all the data without me writing code to send individual records to the API.
We will probably make MeiliSearch accept different indexable formats (i.e. CSV, JSON, JSON-lines) in a future version.
Is there anything similar here? Otherwise all the queries need to go through our servers first to ensure the filter is present.
We will work on a more feature-full API key system including the one you are talking about. This is on our roadmap IIRC.
Note that we are reworking the js library and there will probably be React integration too!
Can this run on top of my postgres database?
For that you will need to also define the different attributes your document is composed of.
We thought about providing a simple tool to extract the documents from an SQL table into the MeiliSearch directly.
You will find that most tools provide document level permissions to some degree by storing user/group IDs on the document and adding filters to the query. However, it generally requires custom implementation work to integrate it into your systems and prevent spoofing of the filters.
Hope you're doing well!
[1] https://github.com/valeriansaliou/sonic
[2] https://docs.meilisearch.com/resources/comparison_to_alterna...
Often ElasticSearch makes a mistake in typing because the programmer has made a mistake in data format, if you fixed that mistake your data would now not fit the format that ElasticSearch has chosen for it (actually don't know if this is still a problem because it has been years since I have ran without all my fields being mapped first) but actually don't see how it couldn't be a problem.
so theoretically if you didn't want to go through the trouble of defining a wrapping you could just reindex all your data fixed in such a way that ElasticSearch will choose a better type for individual fields but why would you do this?
And I mean what does MelliSearch do? I wonder - because looking through this code here https://github.com/meilisearch/MeiliSearch/blob/master/meili... (and not being a rust guy my understanding of it is probably off) but it seems like maybe it is no configuration because it expects you to follow its semantics. Which to be fair lots of things do, at the base level, everything has a title, description, date.
But if I have a domain with different or probably more advanced semantics what happens?
Search Engines are generally configurable because you want to add other fields and rank hits in those fields higher than other things, or maybe do a specific search that only targets those fields - like say Brands based search.
on preview: lots of other people with similar views it seems, I got maybe a bit ranty just because the title sets me off when it just is so wrong it even seems like lying.
It is blindingly fast and easy to setup.
https://github.com/meilisearch/MeiliSearch#create-an-index-a...
Indexes are config. This is not really zero-config if you require API calls before it can receive data.
Also, there's nothing about TLS or access control. These will be required for any production deployment. At the minimum, let us specify a TLS key.pem and cert.pem file and create write-only and read-only access tokens.
sphinxsearch.com/