MeiliSearch: Zero-config alternative to Elasticsearch, made in Rust (opens in new tab)

(github.com)

437 pointsqdequelen6y ago113 comments

113 comments

93 comments · 28 top-level

heipei6y ago· 19 in thread

I know the project doesn't claim it, but the title somewhat implies this: I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales. If anything, ElasticSearch for me has been one of the easiest pieces of infrastructure to operate, for me pretty much "zero-config". Let me elaborate: You can run ElasticSearch via Docker command-line, if you want a cluster you just supply IPs of the other nodes. Then you start indexing documents with simple HTTP calls. You can add or remove nodes at any time and don't have to do anything but to start another ElasticSearch instance. If you run out of space or performance just start another node. Everything needed for management, indexing, search is available through HTTP APIs, no tools needed.

Clustered ElasticSearch has been rock-solid for me and I've used it in anger many times. The level of maintenance needed is close to zero, both initially and long-term. Compare that with the abysmal experience of setting up a sharded MongoDB cluster for example...

Please enlighten me how ElasticSearch is "a lot of work to operate" (heard that one multiple times), and what you're comparing it to.

jniedrauer6y ago

I've been bitten by elasticsearch twice in my career, and I've seen others bitten by it as well. Once you put it in production, you can't just run it from docker on your workstation. You have to set up a cluster with enough capacity for whatever load you're going to throw at it, gracefully handle failures, updates, scaling up as load increases, etc.

There are so many switches and dials to tune, and unless you really learn it in depth, you won't know which ones you need. It's difficult to even determine what hardware requirements you have. And it's a hard sell to tell your business guys "I think elasticsearch will work better if we give it more... CPU? Memory? Disk speed? I'm not really sure." and can't provide any concrete metrics to back that up.

Another place where footguns abound is upgrading from one version to another, especially if you've got plugins installed. There are tricks that you have to learn the hard way.

At this point, I think long and hard before reaching for a solution like elasticsearch. If I've got a DBA whose entire job it is to master the tech and wield it expertly, that's one thing. But if I'm part of an early stage startup, I just can't justify the lost time and potential for catastrophe.

natefox6y ago

> Once you put it in production, you can't just run it from docker on your workstation.

But that's true for any data store. This isn't any different. Nor is an RDBMS. They all need HA/replication. And that is rarely trivial.

Honestly, I think this is why managed/hosted solutions (AWS RDS for example) are so popular - they remove a large part of the complexity for you.

1 more reply

cormacrelf6y ago

I once wasted a whole day trying to get two instances up on GKE. Permissions problems, about ten configs for the JVM alone, many more for ElasticSearch. You would fix one, restart, wait ten minutes, browse 50 pages of logs, google for half an hour, add a config, and goto 1. Never got it going in the end.

amelius6y ago

My experience is roughly the same, unfortunately.

Personally I don't understand why there are so few search libraries/systems to choose from, given that "search" is one of the fundamental pillars of CS.

JeanMarcS6y ago

Just been bitten by the plugin issue after an apt upgrade.

First time it happened for me and I was pretty angry at it

Kerollmops6y ago

MeiliSearch is "zero-config" compared to ElasticSearch in terms of setup to make it work for end-user instant and relevant search engine. Our engine follows the Algolia engine in terms of typo-tolerance, relevancy, and speed.

Here is a little comparison to enlighten your questions: https://docs.meilisearch.com/resources/comparison_to_alterna....

heipei6y ago

Thanks, hadn't seen that, that makes a lot more sense. I agree that ElasticSearch is definitely not "zero-config" when it comes to building certain bespoke applications on it that go beyond simple filtering or query-relevance document search.

ksec6y ago

May be adding Vespa [1] to comparison?

[1] https://vespa.ai

dijit6y ago

Elasticsearch is easier than mongo in some ways and harder in others.

I run a few 10TiB ES clusters (which, is not much to be fair) but infrequently find that I have to reindex or reshard the cluster because I can’t just add another node. There’s something to be said for understanding the index rotation too, and access patterns.

It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.

- if you consider that “it’s slow” is what you have to debug.

kstrauser6y ago

That's approximately how large our clusters are. Fortunately, ours are read-only, so our admin story is:

- Hey, a node died! - Run Terraform to stand up a whole new cluster and restore it from a snapshot. - Update the app to point at the new cluster. - Run Terraform to delete the old cluster.

I'm pretty happy with this arrangement.

jniedrauer6y ago

> if you consider that “it’s slow” is what you have to debug.

This is exactly it. This is a problem you encounter with every database engine, but in most of them you can quickly find the bottleneck and fix it. With elasticsearch... it's a frustrating and expensive game of trial and error.

antpls6y ago

> I run a few 10TiB ES clusters

For information, what does "10TiB" refer to in this context?

Is it the size of what ES takes in RAM, or the size of ES' index, or is it the total size of the corpus that ES must index? Or corpus size + index ?

heipei6y ago

Would be interested to hear why you can't add another node in some cases.

1 more reply

fareesh6y ago

We built a "Yelp for Colleges" product several years ago. The product needed a unified search where students could search for either a course or a college or a question from the forums with typeahead / autocomplete to get them to where they wanted to go quickly, with support for misspellings.

In all there were about 50k documents, and we mostly cared about the title field. Elasticsearch would randomly bloat up to occupy a huge amount of RAM. Restarting it would make it work for a few days. It would also occasionally crash.

We got rid of it and went with some levenshtein distance based database query

I'd love to use it again sometime but the experience was not good, and Googling for information brought up all kinds of very complex use-cases shared by others

Kerollmops6y ago

Go on and try MeiliSearch, 50k documents are easily handled by the engine and with not much RAM usage.

It will take you something like 10 minutes to start and populate MeiliSearch, you will be able to test it just by going to the server HTTP url in no time!

specialist6y ago

Neat.

I implemented the student facing course catalog web interface for a single org. One of the funnests (most fun) parts was the heuristics in the query parser. Like patterns for recognizing course numbers and boosting those exact match results. Really helps you appreciate all the fit & finish that goes into proper search engines.

This was the olden days, when we just used Lucene directly.

atombender6y ago

Elasticsearch is very memory-intensive, and it's difficult to know exactly how much memory it will actually use, so you just have to throw a lot of RAM at it to avoid OOMing, then monitor it carefully, and hope your query concurrency won't accidentally blow the limits. Understanding why Elasticsearch is caving unpredictably under load is difficult, and GC pausing can be a significant performance sink.

rodgerd6y ago

> I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales.

The problem is that ES is deceptively simple to operate. As millions of people who have found things like their medical records shared with the world can attest.

winrid6y ago

I've had issues scaling writes to it. You can get around it, but maybe this would be better in a high write environment.

pqdbr6y ago· 12 in thread

I'm impressed.

I have a database with 15k documents, each with around 70 pages of text, HTML formatted.

I'm using ElasticSearch currently, with the Searchkick gem.

30 min playing with MeiliSearch. So far:

- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;

- Blazing fast to search, at least 3x faster in all my random tests so far;

- Literally zero config;

- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.

pqdbr6y ago

Since this got upvoted and I see the devs are replying to questions, here are some! I'm also going to point how ElasticSearch works for comparison.

- The docs state that `Only a single filter is supported in a query`. This is kind of a dealbreaker for my use case, since I need at least a `user_id` and a `status` filter. ElasticSearch can work with multiple filters. Also, don't understand why you call it `filters` instead of `filter` then. Are multiple filters in the roadmap?

- My search UI has a sort by `<select>`, where you can choose, for instance, `last updated asc` or `last updated desc`, amongst others. In my understanding, that would be cumbersome with MeiliSearch, since it would require (1) a settings change to alter the ranking rules order beforehand [0], which would not even work in production due to race conditions or (2) maintain multiple indexes each with a pre-defined ranking rule order and switch between them depending on the UI criteria?

- As an extension of the last question, I see that a lot of what you call "search settings" are considered by ElasticSearch query parameters. For instance, I can easily query ES for the title or description fields just by setting that as a parameter. In MeiliSearch that would require a change in the index settings beforehand, right?

PS: The docs, specially in the Ruby SDK, could use some work in the filters section. It took me a while to understand I should pass a string, like index.search("query", filters: "user_id:3"). I was trying a hash like `filters: { user_id: 3 }`.

[0] https://docs.meilisearch.com/references/ranking_rules.html#u...

qdequelenOP6y ago

Hi, many answers to these questions. But first, I'll put you on the link to the public roadmap. A lot of the stuff we're working on is in there. If you need/love a feature, please add a heart emoji on it. https://github.com/orgs/meilisearch/projects/2

- Currently, we only support single filters. The multiple filters option is coming soon. https://github.com/meilisearch/MeiliSearch/issues/425

- Custom ranking rules on the fly is something imaginable on our solution. We didn't do it yet because it complexifies the search query parameters. We are waiting for feedback like yours to implement this kind of feature.

- To return only the field you need, it's already possible during the search https://docs.meilisearch.com/guides/advanced_guides/search_p.... To restrict attributes to search in during the query. We had this feature on a previous version. But like the last answer, no one used it, and it complexifies the search query.

Kerollmops6y ago

Just to add a little note here, we are currently working on the functionality of multi-filter queries, because we are aware of our community!

rubyn00bie6y ago

But... what happens if I need more than one instance? I'm genuinely curious. I hope this doesn't come off as an asshole comment. Isn't the whole point of ES versus just plain ol' lucene or solr the horizontal scalability of it?

qdequelenOP6y ago

We are currently working on the sharding and the replication (Raft). Development is progressing well and the functionality should come out soon.

hashhar6y ago

I agree with all your points but a minor nit pick that Solr has been horizontally scalable for quite a while now.

natefox6y ago

> in production.

I went looking, but found nothing regarding any operations management.

* How does this scale?

* How is it monitored? Where do I get the metrics for it? (indexing performance, search performance, etc.. Stuff not found in the OS)

* Are there any kind of throttling or queueing capabilities?

* What's the redundancy/HA approach?

* I'll ask about backups, though its the least of my worries as indexing databases like this and ES should be able to be rehydrated from source. However, snapshots may be faster to restore than reindexing.

This might be a nice local dev tool for something, but I'm not sure how you run a business critical application with it? I'm wondering if I'm missing something.

Edit: formatting

Edit2: also wondering about security too

qdequelenOP6y ago

Hi, to answer your questions.

* 2 parts.

- Vertical scale: We use LMDB as a key-value store. This one uses the power of memory mapping. It made our search engine use mainly the disk and will do not need a machine that will have TB of RAM.

- Horizontal scale. We are working on sharding and replications (Raft). Development is progressing well, and the functionality should come out soon.

* Currently, it is not monitored at all. This feature is planned. https://github.com/meilisearch/MeiliSearch/issues/523

* We use a queue for updates. You can find here the complete guide https://docs.meilisearch.com/guides/advanced_guides/asynchro...

* As I said previously, we are working on HA with a raft consensus.

* We will add snapshots in no time (disk folder saved in s3). A little more time for backups (version agnostic, need indexing).

We are already working with Louis Vuitton on an application in production. The app is in production from 9 months, and there hasn't been a single problem.

karterk6y ago

If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense

It supports multiple filters and has HA for reads as well.

_8j506y ago

You can add more nodes to scale search speed with ES, can you do the same with this?

yazaddaruvala6y ago

More nodes is more throughput, not lower latency.

You’re always bounded by max single shard latency AND by coordination latency.

Ignoring how expensive it would be, over-sharding and over scaling (I.e. low volumes of data per shard and low shards per host) could reduce max single shard/host latency, however it’ll increase coordination latency but also memory (which directly or indirectly will cause more coordination latency).

Perfect data per shard and perfect shard per host numbers are currently an unsolved problem. They heavily depend on the domain, I.e. data types, data volume, data ingest, mappings, query types, query load.

:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!

fnord1236y ago

That's marked for Q3: https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aiss...

But probably 99% of users using ES don't need sharding.

2 more replies

manigandham6y ago· 5 in thread

Awesome, glad to see all the competition in the search space now. There are other projects like Sonic, Tantivy, Toshi and more that have more functionality if you need alternatives.

Here's a public list of search projects (in rust, c, go): https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...

qdequelenOP6y ago

Take a look at our comparison page: https://docs.meilisearch.com/resources/comparison_to_alterna...

manigandham6y ago

That's a good overview of the alternatives. Nice work on this.

tmzt6y ago

Are there any that fit the log searching use case, apart from loki which doesn't do full text searching?

manigandham6y ago

Do you mean you want full-text search against logs? In that case they all work, you just have to ingest the logs as documents in each one.

Or try Seq which is a log-focused system: https://datalust.co/seq

DarkCrusader26y ago

I think vector[0] does what you are asking for.

[0] https://github.com/timberio/vector

2 more replies

beagle36y ago· 4 in thread

Mostly "made in Rust", but from the github readme[0] "MeiliSearch uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads."; so a lot of the credit goes to LMDB, and safety implied by "made in Rust" is not, in fact, guaranteed.

Not that I'm complaining - I love LMDB, and it's been rock solid and bug free in my experience (thanks, Howard!) - but it's low level C, not rust, and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks, be aware that you are not completely getting it.

But other than that: Thanks! This looks like a great project!

[0] https://github.com/meilisearch/MeiliSearch#how-it-works

burntsushi6y ago

True, but there are significant components in pure Rust, such as `fst` (full disclaimer, I wrote it). Which is written in purely safe Rust.

> and safety implied by "made in Rust" is not, in fact, guaranteed

Just about every Rust program depends on some C code, usually at least in the form of a libc. So you could lodge this criticism against almost every Rust project.

> and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks

Rust's safety story covers neither race conditions nor leaks.

comex6y ago

> Rust's safety story covers neither race conditions nor leaks.

It covers a type of race condition, namely unsynchronized concurrent access to memory.

1 more reply

hyc_symas6y ago

(I find this whole post ironic. We made LMDB as a standalone library so other projects could use it, of course. But for applications like this - scalable, fulltext search, nothing comes anywhere close to OpenLDAP. Somebody else in this thread mentioned "triple-digit queries per second" as if that was a difficult achievement. OpenLDAP handles queries with complex filters at millions of queries per second. It also has a complete security model, providing fine grained access control, something none of these newer projects have even begun to think about. You guys all need to study existing tech better before starting to write your own solutions...)

pixiemaster6y ago

> You guys all need to study existing tech better before starting to write your own solutions

that would be a first

ghayes6y ago· 3 in thread

The goal of ElasticSearch, I always thought, was that it scales horizontally and can handle the loss of multiple nodes without availability- or data-loss. It's interesting to build a single-server replacement, and this can likely work for many use-cases, but it's definitely a different approach from ElasticSearch itself.

tpayet6y ago

Replication for MeiliSearch is on its way :) The main differentiator is that MeiliSearch algorithms are made for end-user search not for complex queries. MeiliSearch focus on site search or app search, not analytics on hyper large datasets

rjammala6y ago

what is the size of the largest dataset that you have indexed with MeiliSearch?

1 more reply

nodesocket6y ago

Is it just replication (can sustain node failures) or also sharding the data?

1 more reply

otterley6y ago· 3 in thread

MeiliSearch appears to be more of an alternative to Lucene than it is to Elasticsearch. Lucene is the search engine that runs on a single instance; ES is the horizontally-scalable distribution and aggregation layer atop the instances. Absent a similar aggregation layer, MeiliSearch isn't "elastic" as the comparison implies.

tpayet6y ago

Actually Lucene is the library for search that Elastic uses under the hood. Lucene does not provide any HTTP API, which Elastic does. Before using Lucene, you have to build the interface around it.

In this way MeiliSearch is comparable to ES, especially for site search and app search working out of the box as standard with its http api.

MeiliSearch does not offer distribution yet, but it is something the team is working on :)

otterley6y ago

My concern is that by comparing it to Elasticsearch, you implicitly minimize the amount of engineering effort required to go from single-node to a distributed system. It is a non-trivial exercise that you will undoubtedly realize once you get into the dirty details.

jpgvm6y ago

You might be thinking of Solr. Which is the server developed by the Lucene team. Lucene is used in most full-text search systems written in Java.

Also for bonus points there is a distributed version of Solr called Solr Cloud.

Bedon2926y ago· 3 in thread

While this might be an alternative for that one specific use case (search bar), it does not feel like a viable alternative to ES. I am sure it is great at that specific case, and don't want to knock them on that. But, I have never used ES for a simple search like they are. when I use ES, I want to store billions of records redundantly and search them by text, time, and/or location. And then create visualizations with the results.

When I first read the title I thought it might be a Rust based Lucene engine or something, and thought that would be pretty cool. Though no idea how that would work. On its own, this is a pretty nifty little tool, however I think the framing as an ES alternative is what feels wrong to me, and apparently others in the comments as well.

nicoburns6y ago

https://github.com/tantivy-search/tantivy is a Rust based Lucene-alike.

mst6y ago

I've seen ES used for meilisearch's precise use case quite a few times before now.

So it's not "an alternative to ES in general", it's "a thing designed to be an alternative for a subset of ES use cases", and the comparison document is pretty clear about this.

https://docs.meilisearch.com/resources/comparison_to_alterna...

ellimilial6y ago

Seconding. Text searching is a horribly hairy problem. I know 2 businesses for which the main source of income is tuning ES/Solr to particular user needs. Starting from performance, through templating case-specific queries to custom plugins.

eliseumds6y ago· 3 in thread

Pretty heavy user of ES here, and one cannot compare the two products.

the_arun6y ago

What is the rationale for not comparing the two?

wolco6y ago

Only one filter. Very fast limited search. Not great for anything remotely complex like searching with two conditions.

beastman826y ago

But Rust! Lol

seemslegit6y ago· 2 in thread

Hardly an "alternative to Elastic search" if only because the later is scalable beyond a single machine.

This overhyped description coupled with on-by-default analytics suggests to me MeiliSearch should be dismissed regardless of potential usefulness or technical merit.

greendave6y ago

The analytics seem pretty benign.

"We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch. We only send the platform on which the server runs once by day. No other information is sent. If you do not want us to send events, you can disable these analytics by using the MEILI_NO_ANALYTICS env variable."

seemslegit6y ago

The practice itself is malignant, either explicitly ask upon first run or require a MEILI_YES_ANALYTICS env variable to enable it.

1 more reply

time0ut6y ago· 2 in thread

Nice. This looks promising. Very clean API. I like the focus on a narrow use case.

Do you have any information on security topics like using TLS, client authentication, etc?

Kerollmops6y ago

Currently we think this kind of security can be enabled by a simple nginx setup, allowing autorefesh of certificates easily (e.g. certbot). But in the future we will probably handle that in the engine itself.

mst6y ago

I was thinking it might be nice to be able to have an HMACed token with an expiry as an option - so e.g. my main http-serving thing could provide one of those to allow the frontend to read for a bit but kick the user off after half an hour or whatever if the token isn't refreshed.

I've no issue with offloading SSL to a different process though, I tend to prefer doing that anyway a lot of the time.

1 more reply

MuffinFlavored6y ago· 1 in thread

The real power of Elasticsearch for me is the ability to filter logs by:

1. exact match this nested JSON field (with support for lists of values)

2. negative match this nested JSON field (with support for lists of values)

coupled with the ability to filter by "timeframe", then pump it through to visualizations (tables/graphs) in Kibana

MeiliSearch would be cool if it spoke the API Kibana expects from Elasticsearch

mleonhard6y ago

If only one could set up Elasticsearch and Kibana using infrastructure-as-code (IaC). I spent several days trying and still haven't succeeded. Elasticsearch config is full of foot-guns.

There are tons of easy setup examples but they lack access control and encryption. All of my servers must write logs. When one of them gets cracked, the attacker must not be able to read all the other servers' logs and steal all the PII. An attacker can use an ARP attack to MITM server connections to Elasticsearch. Without encryption, that attack yields all the PII.

I hope Meilisearch can someday help fill this gap in the free DevOps toolset.

nreece6y ago· 1 in thread

Looks pretty good. The single filter approach is restrictive though.

We're currently leaning towards Manticore Search[1], which is a fork of Sphinx Search[2].

[1] https://manticoresearch.com [2] http://sphinxsearch.com

qdequelenOP6y ago

We are working on multi filters and faceting. https://github.com/meilisearch/MeiliSearch/issues/424 https://github.com/meilisearch/MeiliSearch/issues/425

throw031720196y ago· 1 in thread

Are the documents stored on disk or only in memory?

tpayet6y ago

We are using LMDB as the key/value store, so the documents are memory-mapped (usually on disk, and in memory when needed)

niyazpk6y ago· 1 in thread

Does anyone know if this supports bulk indexing? My team has a lot of data in S3 in parquet format. (We could change the format to something else if that helps).

It would be really nice to be able to point tools like MeilliSearch or ElasticSearch to a data location and have it index all the data without me writing code to send individual records to the API.

Kerollmops6y ago

This is not something that MeiliSearch supports currently but I am working on making the engine be able to index other formats than JSON, I saw great performance improvements when indexing simple CSVs.

We will probably make MeiliSearch accept different indexable formats (i.e. CSV, JSON, JSON-lines) in a future version.

bradrobertson6y ago· 1 in thread

Looks promising! Are there any docs coming on a production ready setup? Reading below it looks like you're working on high availability, but even in the single machine scenario, do you have recommendations for persistence, fault tolerance etc?

Kerollmops6y ago

I would say that you must add your own nginx (or else) in front of our HTTP only engine, in term of fault tolerance we are working on high availability.

throw031720196y ago· 1 in thread

We use Algolia and use the public API keys with search filters encoded so they can only search their data (I.e. account_id:123)

Is there anything similar here? Otherwise all the queries need to go through our servers first to ensure the filter is present.

Kerollmops6y ago

The current API key system is a simple and temporary solution.

We will work on a more feature-full API key system including the one you are talking about. This is on our roadmap IIRC.

kvz6y ago· 1 in thread

Is there already a browser library that can talk to MeiliSearch?

Kerollmops6y ago

Yes, there is, you can find all clients on this documentation page: https://docs.meilisearch.com/resources/sdks.html

Note that we are reworking the js library and there will probably be React integration too!

dhruvkar6y ago· 1 in thread

I've never used elasticsearch and only had a brief toy project with Algolia. The demo on the github repo looks awesome.

Can this run on top of my postgres database?

Kerollmops6y ago

To make MeiliSearch expose the documents that are stored in your PostgreSQL (or any other database) you must extract them and store them in our engine using the HTTP API we provide to you. https://docs.meilisearch.com/references/documents.html#add-o...

For that you will need to also define the different attributes your document is composed of.

We thought about providing a simple tool to extract the documents from an SQL table into the MeiliSearch directly.

bberenberg6y ago· 1 in thread

Does any tool in this category (This, Elastic, or whatever else) support something like permissions on a per document level?

jschumacher6y ago

Hey B! Funny seeing you here. I'm now running product at http://sajari.com

You will find that most tools provide document level permissions to some degree by storing user/group IDs on the document and adding filters to the query. However, it generally requires custom implementation work to integrate it into your systems and prevent spoofing of the filters.

Hope you're doing well!

ghh6y ago

I wanted to mention Sonic [1] as another lightweight document indexing alternative written in rust, when I found MeiliSearch to provide a thoughtful comparison page [2]

[1] https://github.com/valeriansaliou/sonic

[2] https://docs.meilisearch.com/resources/comparison_to_alterna...

bryanrasmussen6y ago

ok I just looked through things a bit but the phrase 0 config worries me - first off I could conceivably run ElasticSearch with 0 configuration but then it needs to make decisions as to what types things are, and how things should be analyzed, and sometimes those decisions are not what I want.

Often ElasticSearch makes a mistake in typing because the programmer has made a mistake in data format, if you fixed that mistake your data would now not fit the format that ElasticSearch has chosen for it (actually don't know if this is still a problem because it has been years since I have ran without all my fields being mapped first) but actually don't see how it couldn't be a problem.

so theoretically if you didn't want to go through the trouble of defining a wrapping you could just reindex all your data fixed in such a way that ElasticSearch will choose a better type for individual fields but why would you do this?

And I mean what does MelliSearch do? I wonder - because looking through this code here https://github.com/meilisearch/MeiliSearch/blob/master/meili... (and not being a rust guy my understanding of it is probably off) but it seems like maybe it is no configuration because it expects you to follow its semantics. Which to be fair lots of things do, at the base level, everything has a title, description, date.

But if I have a domain with different or probably more advanced semantics what happens?

Search Engines are generally configurable because you want to add other fields and rank hits in those fields higher than other things, or maybe do a specific search that only targets those fields - like say Brands based search.

on preview: lots of other people with similar views it seems, I got maybe a bit ranty just because the title sets me off when it just is so wrong it even seems like lying.

dalore6y ago

Wow to see this popup is strange as I was just implementing this yesterday.

It is blindingly fast and easy to setup.

mleonhard6y ago

> MeiliSearch can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.

https://github.com/meilisearch/MeiliSearch#create-an-index-a...

Indexes are config. This is not really zero-config if you require API calls before it can receive data.

Also, there's nothing about TLS or access control. These will be required for any production deployment. At the minimum, let us specify a TLS key.pem and cert.pem file and create write-only and read-only access tokens.

karterk6y ago

If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense

maxpert6y ago

How does it compare to Sonic https://github.com/valeriansaliou/sonic

dzonga6y ago

looks really easy to use. will use this instead of resorting to Postgres full text search for my next app(s)

udfalkso6y ago

Sounds more like a potential alternative to Sphinx than Elastic Search.

sphinxsearch.com/

social_quotient6y ago

Thanks for including performance metrics right up front!

j / k navigate · click thread line to collapse

113 comments

93 comments · 28 top-level

heipei6y ago· 19 in thread

Please enlighten me how ElasticSearch is "a lot of work to operate" (heard that one multiple times), and what you're comparing it to.

jniedrauer6y ago

Another place where footguns abound is upgrading from one version to another, especially if you've got plugins installed. There are tricks that you have to learn the hard way.

natefox6y ago

> Once you put it in production, you can't just run it from docker on your workstation.

But that's true for any data store. This isn't any different. Nor is an RDBMS. They all need HA/replication. And that is rarely trivial.

Honestly, I think this is why managed/hosted solutions (AWS RDS for example) are so popular - they remove a large part of the complexity for you.

1 more reply

cormacrelf6y ago

amelius6y ago

My experience is roughly the same, unfortunately.

Personally I don't understand why there are so few search libraries/systems to choose from, given that "search" is one of the fundamental pillars of CS.

JeanMarcS6y ago

Just been bitten by the plugin issue after an apt upgrade.

First time it happened for me and I was pretty angry at it

Kerollmops6y ago

Here is a little comparison to enlighten your questions: https://docs.meilisearch.com/resources/comparison_to_alterna....

heipei6y ago

ksec6y ago

May be adding Vespa [1] to comparison?

[1] https://vespa.ai

dijit6y ago

Elasticsearch is easier than mongo in some ways and harder in others.

It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.

- if you consider that “it’s slow” is what you have to debug.

kstrauser6y ago

That's approximately how large our clusters are. Fortunately, ours are read-only, so our admin story is:

- Hey, a node died! - Run Terraform to stand up a whole new cluster and restore it from a snapshot. - Update the app to point at the new cluster. - Run Terraform to delete the old cluster.

I'm pretty happy with this arrangement.

jniedrauer6y ago

> if you consider that “it’s slow” is what you have to debug.

antpls6y ago

> I run a few 10TiB ES clusters

For information, what does "10TiB" refer to in this context?

Is it the size of what ES takes in RAM, or the size of ES' index, or is it the total size of the corpus that ES must index? Or corpus size + index ?

heipei6y ago

Would be interested to hear why you can't add another node in some cases.

1 more reply

fareesh6y ago

We got rid of it and went with some levenshtein distance based database query

I'd love to use it again sometime but the experience was not good, and Googling for information brought up all kinds of very complex use-cases shared by others

Kerollmops6y ago

Go on and try MeiliSearch, 50k documents are easily handled by the engine and with not much RAM usage.

It will take you something like 10 minutes to start and populate MeiliSearch, you will be able to test it just by going to the server HTTP url in no time!

specialist6y ago

Neat.

This was the olden days, when we just used Lucene directly.

atombender6y ago

rodgerd6y ago

> I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales.

The problem is that ES is deceptively simple to operate. As millions of people who have found things like their medical records shared with the world can attest.

winrid6y ago

I've had issues scaling writes to it. You can get around it, but maybe this would be better in a high write environment.

pqdbr6y ago· 12 in thread

I'm impressed.

I have a database with 15k documents, each with around 70 pages of text, HTML formatted.

I'm using ElasticSearch currently, with the Searchkick gem.

30 min playing with MeiliSearch. So far:

- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;

- Blazing fast to search, at least 3x faster in all my random tests so far;

- Literally zero config;

- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.

pqdbr6y ago

Since this got upvoted and I see the devs are replying to questions, here are some! I'm also going to point how ElasticSearch works for comparison.

[0] https://docs.meilisearch.com/references/ranking_rules.html#u...

qdequelenOP6y ago

- Currently, we only support single filters. The multiple filters option is coming soon. https://github.com/meilisearch/MeiliSearch/issues/425

Kerollmops6y ago

Just to add a little note here, we are currently working on the functionality of multi-filter queries, because we are aware of our community!

rubyn00bie6y ago

qdequelenOP6y ago

We are currently working on the sharding and the replication (Raft). Development is progressing well and the functionality should come out soon.

hashhar6y ago

I agree with all your points but a minor nit pick that Solr has been horizontally scalable for quite a while now.

natefox6y ago

> in production.

I went looking, but found nothing regarding any operations management.

* How does this scale?

* How is it monitored? Where do I get the metrics for it? (indexing performance, search performance, etc.. Stuff not found in the OS)

* Are there any kind of throttling or queueing capabilities?

* What's the redundancy/HA approach?

This might be a nice local dev tool for something, but I'm not sure how you run a business critical application with it? I'm wondering if I'm missing something.

Edit: formatting

Edit2: also wondering about security too

qdequelenOP6y ago

Hi, to answer your questions.

* 2 parts.

- Vertical scale: We use LMDB as a key-value store. This one uses the power of memory mapping. It made our search engine use mainly the disk and will do not need a machine that will have TB of RAM.

- Horizontal scale. We are working on sharding and replications (Raft). Development is progressing well, and the functionality should come out soon.

* Currently, it is not monitored at all. This feature is planned. https://github.com/meilisearch/MeiliSearch/issues/523

* We use a queue for updates. You can find here the complete guide https://docs.meilisearch.com/guides/advanced_guides/asynchro...

* As I said previously, we are working on HA with a raft consensus.

* We will add snapshots in no time (disk folder saved in s3). A little more time for backups (version agnostic, need indexing).

We are already working with Louis Vuitton on an application in production. The app is in production from 9 months, and there hasn't been a single problem.

karterk6y ago

If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense

It supports multiple filters and has HA for reads as well.

_8j506y ago

You can add more nodes to scale search speed with ES, can you do the same with this?

yazaddaruvala6y ago

More nodes is more throughput, not lower latency.

You’re always bounded by max single shard latency AND by coordination latency.

:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!

fnord1236y ago

That's marked for Q3: https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aiss...

But probably 99% of users using ES don't need sharding.

2 more replies

manigandham6y ago· 5 in thread

Awesome, glad to see all the competition in the search space now. There are other projects like Sonic, Tantivy, Toshi and more that have more functionality if you need alternatives.

Here's a public list of search projects (in rust, c, go): https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...

qdequelenOP6y ago

Take a look at our comparison page: https://docs.meilisearch.com/resources/comparison_to_alterna...

manigandham6y ago

That's a good overview of the alternatives. Nice work on this.

tmzt6y ago

Are there any that fit the log searching use case, apart from loki which doesn't do full text searching?

manigandham6y ago

Do you mean you want full-text search against logs? In that case they all work, you just have to ingest the logs as documents in each one.

Or try Seq which is a log-focused system: https://datalust.co/seq

DarkCrusader26y ago

I think vector[0] does what you are asking for.

[0] https://github.com/timberio/vector

2 more replies

beagle36y ago· 4 in thread

But other than that: Thanks! This looks like a great project!

[0] https://github.com/meilisearch/MeiliSearch#how-it-works

burntsushi6y ago

True, but there are significant components in pure Rust, such as `fst` (full disclaimer, I wrote it). Which is written in purely safe Rust.

> and safety implied by "made in Rust" is not, in fact, guaranteed

Just about every Rust program depends on some C code, usually at least in the form of a libc. So you could lodge this criticism against almost every Rust project.

> and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks

Rust's safety story covers neither race conditions nor leaks.

comex6y ago

> Rust's safety story covers neither race conditions nor leaks.

It covers a type of race condition, namely unsynchronized concurrent access to memory.

1 more reply

hyc_symas6y ago

pixiemaster6y ago

> You guys all need to study existing tech better before starting to write your own solutions

that would be a first

ghayes6y ago· 3 in thread

tpayet6y ago

rjammala6y ago

what is the size of the largest dataset that you have indexed with MeiliSearch?

1 more reply

nodesocket6y ago

Is it just replication (can sustain node failures) or also sharding the data?

1 more reply

otterley6y ago· 3 in thread

tpayet6y ago

Actually Lucene is the library for search that Elastic uses under the hood. Lucene does not provide any HTTP API, which Elastic does. Before using Lucene, you have to build the interface around it.

In this way MeiliSearch is comparable to ES, especially for site search and app search working out of the box as standard with its http api.

MeiliSearch does not offer distribution yet, but it is something the team is working on :)

otterley6y ago

jpgvm6y ago

You might be thinking of Solr. Which is the server developed by the Lucene team. Lucene is used in most full-text search systems written in Java.

Also for bonus points there is a distributed version of Solr called Solr Cloud.

Bedon2926y ago· 3 in thread

nicoburns6y ago

https://github.com/tantivy-search/tantivy is a Rust based Lucene-alike.

mst6y ago

I've seen ES used for meilisearch's precise use case quite a few times before now.

So it's not "an alternative to ES in general", it's "a thing designed to be an alternative for a subset of ES use cases", and the comparison document is pretty clear about this.

https://docs.meilisearch.com/resources/comparison_to_alterna...

ellimilial6y ago

eliseumds6y ago· 3 in thread

Pretty heavy user of ES here, and one cannot compare the two products.

the_arun6y ago

What is the rationale for not comparing the two?

wolco6y ago

Only one filter. Very fast limited search. Not great for anything remotely complex like searching with two conditions.

beastman826y ago

But Rust! Lol

seemslegit6y ago· 2 in thread

Hardly an "alternative to Elastic search" if only because the later is scalable beyond a single machine.

This overhyped description coupled with on-by-default analytics suggests to me MeiliSearch should be dismissed regardless of potential usefulness or technical merit.

greendave6y ago

The analytics seem pretty benign.

seemslegit6y ago

The practice itself is malignant, either explicitly ask upon first run or require a MEILI_YES_ANALYTICS env variable to enable it.

1 more reply

time0ut6y ago· 2 in thread

Nice. This looks promising. Very clean API. I like the focus on a narrow use case.

Do you have any information on security topics like using TLS, client authentication, etc?

Kerollmops6y ago

mst6y ago

I've no issue with offloading SSL to a different process though, I tend to prefer doing that anyway a lot of the time.

1 more reply

MuffinFlavored6y ago· 1 in thread

The real power of Elasticsearch for me is the ability to filter logs by:

1. exact match this nested JSON field (with support for lists of values)

2. negative match this nested JSON field (with support for lists of values)

coupled with the ability to filter by "timeframe", then pump it through to visualizations (tables/graphs) in Kibana

MeiliSearch would be cool if it spoke the API Kibana expects from Elasticsearch

mleonhard6y ago

If only one could set up Elasticsearch and Kibana using infrastructure-as-code (IaC). I spent several days trying and still haven't succeeded. Elasticsearch config is full of foot-guns.

I hope Meilisearch can someday help fill this gap in the free DevOps toolset.

nreece6y ago· 1 in thread

Looks pretty good. The single filter approach is restrictive though.

We're currently leaning towards Manticore Search[1], which is a fork of Sphinx Search[2].

[1] https://manticoresearch.com [2] http://sphinxsearch.com

qdequelenOP6y ago

We are working on multi filters and faceting. https://github.com/meilisearch/MeiliSearch/issues/424 https://github.com/meilisearch/MeiliSearch/issues/425

throw031720196y ago· 1 in thread

Are the documents stored on disk or only in memory?

tpayet6y ago

We are using LMDB as the key/value store, so the documents are memory-mapped (usually on disk, and in memory when needed)

niyazpk6y ago· 1 in thread

Does anyone know if this supports bulk indexing? My team has a lot of data in S3 in parquet format. (We could change the format to something else if that helps).

It would be really nice to be able to point tools like MeilliSearch or ElasticSearch to a data location and have it index all the data without me writing code to send individual records to the API.

Kerollmops6y ago

We will probably make MeiliSearch accept different indexable formats (i.e. CSV, JSON, JSON-lines) in a future version.

bradrobertson6y ago· 1 in thread

Kerollmops6y ago

I would say that you must add your own nginx (or else) in front of our HTTP only engine, in term of fault tolerance we are working on high availability.

throw031720196y ago· 1 in thread

We use Algolia and use the public API keys with search filters encoded so they can only search their data (I.e. account_id:123)

Is there anything similar here? Otherwise all the queries need to go through our servers first to ensure the filter is present.

Kerollmops6y ago

The current API key system is a simple and temporary solution.

We will work on a more feature-full API key system including the one you are talking about. This is on our roadmap IIRC.

kvz6y ago· 1 in thread

Is there already a browser library that can talk to MeiliSearch?

Kerollmops6y ago

Yes, there is, you can find all clients on this documentation page: https://docs.meilisearch.com/resources/sdks.html

Note that we are reworking the js library and there will probably be React integration too!

dhruvkar6y ago· 1 in thread

I've never used elasticsearch and only had a brief toy project with Algolia. The demo on the github repo looks awesome.

Can this run on top of my postgres database?

Kerollmops6y ago

For that you will need to also define the different attributes your document is composed of.

We thought about providing a simple tool to extract the documents from an SQL table into the MeiliSearch directly.

bberenberg6y ago· 1 in thread

Does any tool in this category (This, Elastic, or whatever else) support something like permissions on a per document level?

jschumacher6y ago

Hey B! Funny seeing you here. I'm now running product at http://sajari.com

Hope you're doing well!

ghh6y ago

I wanted to mention Sonic [1] as another lightweight document indexing alternative written in rust, when I found MeiliSearch to provide a thoughtful comparison page [2]

[1] https://github.com/valeriansaliou/sonic

[2] https://docs.meilisearch.com/resources/comparison_to_alterna...

bryanrasmussen6y ago

But if I have a domain with different or probably more advanced semantics what happens?

on preview: lots of other people with similar views it seems, I got maybe a bit ranty just because the title sets me off when it just is so wrong it even seems like lying.

dalore6y ago

Wow to see this popup is strange as I was just implementing this yesterday.

It is blindingly fast and easy to setup.

mleonhard6y ago

> MeiliSearch can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.

https://github.com/meilisearch/MeiliSearch#create-an-index-a...

Indexes are config. This is not really zero-config if you require API calls before it can receive data.

karterk6y ago

If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense

maxpert6y ago

How does it compare to Sonic https://github.com/valeriansaliou/sonic

dzonga6y ago

looks really easy to use. will use this instead of resorting to Postgres full text search for my next app(s)

udfalkso6y ago

Sounds more like a potential alternative to Sphinx than Elastic Search.

sphinxsearch.com/

social_quotient6y ago

Thanks for including performance metrics right up front!

j / k navigate · click thread line to collapse