Search Benchmarking: RediSearch vs. Elasticsearch (opens in new tab)

(redislabs.com)

144 pointssidi7y ago79 comments

79 comments

67 comments · 25 top-level

showerst7y ago· 23 in thread

I've seen a lot of ES competitor posts pop up on HN lately, and I think they're missing the point of Elastic.

If you only need very basic word search, ES is probably not worth the complexity in your stack, especially if you're already running a SQL database with decent plaintext search.

Where elasticsearch shines is in complex queries: "Show me every match where this field contains 'extinction' within 10 words of 'impact crater' but NOT containing 'oceanic' and the publish date is > last month and one of the subjects is anthropology"

atombender7y ago

Not to mention that Elasticsearch is excellent for non-text search.

One application I worked on indexes a Postgres database into Elasticsearch for live front-end queries. We index every single field, sometimes hundreds of fields in a single index. ES does this easily. Thanks to Lucene's quasi-columnar/quasi-LSM tree storage, new indexed fields aren't very expensive, and searches -- even fairly complicated ones -- are very fast.

ES is also extremely fast at aggregations. Even complex multi-level aggregations (e.g. group by date, then multiple nested buckets by different fields with "top k" results for each) take just a few hundred milliseconds for latge million-document datasets.

Where ES has problems are areas like replication, consistency and memory usage. It's very hard to tune ES; due to JVM GC and caches, it's basically impossible to predict how much RAM ES will need, and OOMs are common. There's also still no way to ask for a consistent index on query; the best you can do is use "waitfor=refresh" on indexing, which is the wrong time for it. I'd love a consistent Raft-based ES.

entangld7y ago

https://www.elastic.co/blog/a-new-era-for-cluster-coordinati...

sandGorgon7y ago

Could you talk about the usecase here ? This is very interesting from a db query tuning perspective. What kind of queries work well in scenarios like this ? I thought search engines are only useful in ranking based searches ...so you accept a degree of error margin wrt databases.

1 more reply

porker7y ago

This. The best benchmarking for search engines is:

    1. Does it return relevant results?
    2. Can it handle complex queries?

2) is only required in specific use-cases, but when it's needed it's _really needed_.

1) is the main measure users care about, and in my experience is best evaluated by building a search in each system with the same corpus and giving to subject-matter experts.

jillesvangurp7y ago

The classic metrics here are recall and precision. Does it return all of the results that it should and does it list the best results first.

Without a good search engine you might have the results you needed plus lots of other results. You'd have to scroll to page 20 of your results to actually see the result that you wanted, which means it wasn't very precise.

Think of internet search engines pre-google. With e.g. alta vista you had great recall but extremely poor precision. You'd often be scrolling multiple pages of results. Google turned that around by having great precision and similar recall. They made it so good that they implemented the "i feel lucky" button.

The trick with search is to have great precision and still good enough recall. That's super hard because what is precise is very subjective and highly dependent on your usecases, data, languages, etc.

This is why Elasticsearch is such a hugely complicated product: it includes a lot of solutions for essentially any use case you can imagine around search.

I have no experience with Redisearch; so I'll reserve my judgment. But this article is not doing it any favors.

There are competing things out there for Elasticsearch. Most of the serious ones also use Apache Lucene (e.g. Solr). Some of the upcoming ones are attempting to rebuild what Lucene does and may or may not be good enough depending on your use case. There have been some lucene ports over the years, including a C port. Most of those have fallen behind or are no longer maintained. The Java implementation is actually pretty good as is and has had a lot of performance and optimization work done to it over the years. You'd be hard pressed to build something as good and as fast without essentially using the same algorithms and reinventing a lot of the same wheels.

IMHO the current effort to build a search engine in Rust makes a lot of sense. The language is uniquely suited to doing the kinds of things Lucene does and they seem to be pretty serious about doing things properly.

showerst7y ago

Definitely. In my mind the very first questions you should ask when evaluating search are "Do i need faceted search?" "Do I need boolean logic? proximity? stemming?". Because the answers to those questions will cut the field way down.

That's why some of these benchmarks (redis and the go search engine posted last week) seem a little apples/oranges to me.

1 more reply

papito7y ago

Not to mention spelling correction, synonyms, nested taxonomies, etc. Search is incredibly complex, and I always snort when I see someone trying to create one from scratch.

onlyrealcuzzo7y ago

I was just going to ask: Will SQL work with spelling corrections?

I was under the impression that if you wanted to do auto-complete, you need to handle mis-spellings, and that ElasticSearch is one of the best options for this.

2 more replies

AznHisoka7y ago

This X 2. I feel all these people who feel anything else is a viable alternative to Elasticsearch have a dumb, simple, small-scale use case, where even full-text search over Postgres would suffice.

misterman07y ago

>> all these people who feel anything else is a viable alternative to Elasticsearch have a dumb, simple, small-scale use case

I have a search use case. I want to create a simple language model where each token in the lexicon gets a unique ID (or ordinal) that I can use to create a more sophisticated model where each document is represented as a vector as wide as there are unique tokens and use clustering and give each cluster a unique ID (or ordinal) so that I can create an even more sophisticated language model, one with built-in semantic understanding. A natural language data structure, if you will, with multiple layers. I want to store the entire WWW in such a model. So I'm building a language model framework that is not build on Lucene because I'm not obliged to use ES in that capacity.

I feel you are wrong to call my use case simple and small scale.

wyldfire7y ago

> Where elasticsearch shines is in complex queries ...

If the "Multi-tenant indexing benchmark" is accurate it seems like it might be a robustness concern for ES. "Elasticsearch crashed after 921 indices and just couldn’t cope with this load." -- does that mean memory exhaustion or some other crash? If it's the latter, it seems like a quality problem more than a performance one.

praseodym7y ago

This is exactly why Elasticsearch has a soft limit of 1000 shards per node since version 7.0: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/...

This benchmark used 4605 shards (5 per index) on a single node, which is way above the recommended number.

Also, to prevent oversharding, the default number of shards per index has been changed to 1 in 7.0.

yazaddaruvala7y ago

Yeah, anyone creating 921 indexes in the same cluster hasn't read the ES docs[0]. Utilizing aliases and possibly routing is a significantly better design.

I think we can all agree that misusing a tool, after appropriate documentation has been published, shouldn't be a considered a fault of the tool.

[0] https://www.elastic.co/guide/en/elasticsearch/guide/current/...

AznHisoka7y ago

Very very few customers actually have 921 indices in production. That is an insane amount.. by a large factor.

1 more reply

alkz7y ago

each ES shard is actually a lucene index, and it uses memory... why would anyone need thousand of indices on a single node?

1 more reply

sidiOP7y ago

In a production setting, I wouldn't recommend doing ElasticSearch multi-tenancy in this manner. Indexes aren't free.

manigandham7y ago

To counter some of the comments here, and after looking at the sources, the RediSearch module is pretty capable and matches a lot of the Elasticsearch features: https://github.com/RedisLabsModules/RediSearch

Agree with the general claim that this benchmark is poor though. A real study of complex searches with faceting, ranking and ordering against both databases in a distributed setup would be much more interesting.

orthecreedence7y ago

> "Show me every match where this field contains 'extinction' within 10 words of 'impact crater' but NOT containing 'oceanic' and the publish date is > last month and one of the subjects is anthropology"

...and then aggregate into time-based buckets, and within each bucket split the results by this field, and then...

dvirsky7y ago

RediSearch can do all of that.

dominotw7y ago

I've seen ppl use it as key value store, time series database( i think they have some apm support too) , nosql datastore.

anonu7y ago

What would be your go-to solution for a basic word search - lets say you only have a few MBs of data - not GBs...

rebelde7y ago

Full-text search from MySQL or other similar database. When that gets overloaded, then consider something like Elastic Search. That is my rule.

winrid7y ago

Few mbs - just use lucene in memory if you're using Java.

gt565k7y ago· 7 in thread

WOW. Hahahaha.

This is a massive misconfiguration of an elastic search cluster. 50k indices? 500 documents per index?

500 records per index at 5shards/index is 100 records per shard.

Yeah, let's shard our data so much that we introduce tremendous amounts of disk i/o overhead!!!

Author should learn how to properly configure an ES cluster before posting ridiculous benchmarks like this.

What an utter pile of garbage benchmark this is.

gt565k7y ago

Oh fuck me, I didn't even realize they used a single instance (node).

To expand a little bit, the whole point of using multiple shards per index in an ES cluster is so that the shards spread across multiple nodes (servers) and distribute the load (disk i/o) and handle redundancy. ES automatically scales and reshuffles its shards across multiple nodes in the cluster to handle fault-tolerance as well. If one or more nodes go down, the cluster still has all of the data through replica shards etc...

Either way, in this particular case, the data is so small, having 5 shards per index with 50k indices results in 250k shards for 5GBs of data.

5GB / 250k shards = 20kb per shard.

You have shards of size ~ 20kb ... total cluster misconfiguration.

jstarfish7y ago

Isn't that exactly what they're trying to demonstrate though? That all this arcana you have to invoke to get a stable ES cluster barely breaks a sweat on Redisearch?

The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index.

gt565k7y ago

I'm not familiar with RediSearch, but I'm just trying to point out that you can't misconfigure ES and then benchmark against a misconfigured cluster. This is comparing apples to oranges. Not to mention I'm not sure of the feature difference between the 2 search engines, but I'd bet ES is much more feature rich, thus its use cases are vastly different. If you are just comparing text search, sure, maybe redis is faster. But at that point, so is a simple sql database, when compared to a misconfigured ES cluster.

1 more reply

jimbokun7y ago

"The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index."

So in other words:

"If your specific use case is supporting 50,000 customers each having around 500 documents and only needing basic text search queries and relevance is not a major concern, RedisLabs Search might give you better performance than ElasticSearch!"

(This is assuming there isn't a different way to configure ElasticSearch to work for this scenario, that gives similar performance.)

alkz7y ago

I wouldn't call that "arcana", that's just ES and lucene basics

liveoneggs7y ago

this is how comparison benchmarks are done when you need to reach certain results. I've even had it done to me at my job!

When you point out the flawed methodology you come across like a luddite or sour grapes or whatever else.

papito7y ago

You just say "artificial tests produce artificial results, bye".

ademup7y ago· 4 in thread

I'm curious if this scales down well. The test was done on "One AWS c4.8xlarge with 36vCPU and 60GiB Memory". But could I run this on a tiny vps to index, search, catalog my million-odd documents?

Scaevolus7y ago

SQLite's FTS5 module works surprisingly well on that number of documents, and stores the search index on disk instead of RAM.

theblackcat10027y ago

You can check out Sonic[1] and Tantivy[2] both are lightweight search engine

[1] https://github.com/valeriansaliou/sonic

[2] https://github.com/tantivy-search/tantivy

showerst7y ago

I'd guess redis performs _better_ in that case since there minimum overhead for redis is much lower than elasticsearch.

weavie7y ago

Wouldn't Redis need to keep the whole dataset in memory?

1 more reply

coleifer7y ago· 3 in thread

How silly to emphasize things like "built as a C extension" and "uses modern data-structures" as if these were useful criteria for choosing a search engine.

It's about minimizing the effort needed to find what you're looking for. Speed of index construction time, unless we're talking orders of magnitude, isn't really meaningful. I don't know if this is just a really clumsy attempt at "marketing" or what, but I can't imagine this is going to convince anyone to drop es for this thing.

softwaredoug7y ago

Also complaining that "Lucene is 20 years old" is about the same as saying "Linux is ~30 years old"

Lucene is a pretty rock-solid open source project that has been battle tested over those 20 years and had some of the best engineers in the world improve over a long time frame. That's an asset for Lucene!

papito7y ago

That's a testament to how hard search engines are. Armies of engineers spent years getting to this point. This project just started as a POC for Redis modules, but this is really forced and futile. Trying to do this is like attaching wings to a motorcycle - it's not how search works. This falls apart beyond a primitive word lookup, which you can do with any SQL database. And it has been done, MANY times.

dominotw7y ago

> Lucene is 20 years old

Damn I feel old. I remember when Lucene was hot new kid in the block.

bigodines7y ago· 2 in thread

If 2-word queries is all you need, why would you even consider elasticsearch? This benchmark is pure marketing IMHO.

onlyrealcuzzo7y ago

RedisLabs seems to really be abusing Redis's popularity on HN.

I've seen a lot of posts like this easily make it to the front page only because a lot of HN-ers are Redis fanboys (rightfully so: Redis is great). But then you read the post and it _appears_ to be marketing garbage.

_Codemonkeyism7y ago

From the license fiasco to this. RedisLabs tries hard to be the new Microsoft.

jchw7y ago· 1 in thread

>Component: Search Engine

>RediSearch: Dedicated engine based on modern and optimized data-structures

>ElasticSearch: 20 years old Lucene engine

The implications made here make me actually angry.

softwaredoug7y ago

Lucene: over 20 years has been battle tested, optimized, and improved to the point where it’s running search almost everywhere

RedisSearch: new shiny thing built on top of Redis that is used in a couple of niche places.

I’ll take Lucene please

nathanaldensr7y ago· 1 in thread

Someone needs to edit this article. There are misspellings and typos all over the place.

rooam-dev7y ago

It's Friday, the task was to deliver the aritcle by the weekend :)

m3kw97y ago· 1 in thread

So according to HN, they’ve proved RediSearch is actually inferior

manigandham7y ago

No, they just haven't proven anything.

panarky7y ago

I love plain old Redis, but I'm not thrilled with the extension modules from Redis Labs.

I experimented with RediSearch using 20 GB of Reddit posts and I was very underwhelmed.

First, 20 GB of raw data explodes into 75 GB once it's in RediSearch with zero fault tolerance. While I'd expect some expansion with inverted indexes and word frequencies by document, a 3.75 multiple seems high.

And since this is Redis, it's all in RAM, including indexes and raw documents, all uncompressed. That's not cheap. Add replicas for fault tolerance and the RAM needed for a decent sized cluster could be 10x the size of the raw data.

Then the tooling and documentation is very limited. Redis Labs provides a Python client, but it doesn't support basic features like returning the score with each document, even though RediSearch provides this capability if you query it directly.

Finally, I found stability issues with Redis when the RediSearch module is installed. Using the Python client provided by RedisLabs, certain queries would predictably crash every node in the cluster.

Redis itself is rock solid, but Redis with the RediSearch module feels fragile.

Overall, interesting concept but not ready for production use by any means.

softwaredoug7y ago

In order for me to trust a benchmark, it needs to be a lot more transparent than this

- Show the code that runs the bench mark

- Give opportunities for everyone to recreate the benchmark

- Give opportunities for every technology to 'respond' and point out where the benchmark/tech configuration is wrong (ie "PRs welcome")

Otherwise, this just looks like cherry-picked data points, and even those I won't trust. Nor would I show this to any of my clients (whom I help select search engine technology). I dearly hope nobody makes real decisions based on this blog post until the code, and everything is opened up.

free6527y ago

The article is a mess of misspellings and misquotes. Also why two distributed search engine were tested on a single node? That's a a meaningless test.

simpsond7y ago

"Elasticsearch crashed after 921 indices" ... Shards: "20 for the multi-tenant benchmark".. 921 * 20 = 18420. Shards have state; they have overhead. Why wouldn't they pick one shard for that benchmark? It's either intentional misconfiguration, or poor understanding of sharding.

speedplane7y ago

From the article: "Here, we simulated a multi-tenant e-commerce application where each tenant represented a product category and maintained its own index. For this benchmark, we built 50K indices (or products), which each stored up to 500 documents (or items), for a total of 25 million indices. RediSearch built the indices in just 201 seconds, while running an average of 125K indices/sec. However, Elasticsearch crashed after 921 indices and clearly was not designed to cope with this load."

No sane elasticsearch engineer would make a new index for each product. They would just have a single index with a product_id field for each sub-item. If you needed product level information, you would create a second index for that. You'd use two indexes not O(#Product) indexes.

They just created a botched benchmark by using ES incorrectly. It's like driving a car backwards and then complaining it has poor max speed. ES could easily handle this type of problem if done correctly.

sidiOP7y ago

"The more advanced multi-tenant use case – where RediSearch was able to complete 25 million indices in just 201 seconds or ~125K indices/sec, while Elasticsearch crashed after it indexed 921 documents, showing that it was not designed to cope with this level of load." previously stated that "Elasticsearch crashed after 921 indices and just couldn’t cope with this load."

It's hard to mistake documents for indices. Both original and the currently edited statement sound strongly suspect and make me question the benchmarking methodology used. What caused the ES to crash after indexing 921 documents? Why is comparing indexing speeds on a 1-node setup even a legit benchmarking test?

alkz7y ago

I fail to see how the creation of 50K indices on elasticsearch is a meaningful benchmark, that's just not how it's supposed to be used. Also as others said, testing a distributed system on a single node makes little sense... as it is a benchmark which is not reproducible as we don't know how the data was queried and indexed

makkesk87y ago

This benchmark is pretty misleading. And not the mention that elasticsearch is free for multi node deployments while redis search is not.

overgard7y ago

The intent is nice, but the weird clippy-style avatar in the bottom right is kinda annoying. I'm just trying to read the article not engage in a conversation.

Scaevolus7y ago

> Dataset source: wikidump Date: Feb 7, 2019 docs: 5.6M size: 5.3 GB

"wikidump" links to https://dumps.wikimedia.org/enwiki/latest/ , which has thousands of files, none of which are 5GB and make sense. That's a very poor corpus link!

It says "Feb 7, 2019", so it probably means https://dumps.wikimedia.org/enwiki/20190120/ or https://dumps.wikimedia.org/enwiki/20190201/ ... maybe. They don't have any obvious 5.3GB files.

g1mp7y ago

If anyone is looking for real benchmarks of ES, check out this page and leave the BS benchmarks aside :-) https://elasticsearch-benchmarks.elastic.co/

siffland7y ago

A problem with RediSearch, at least for me is:

Note: clustering is only available in RediSearch’s Enterprise version

https://redislabs.com/redis-enterprise/technology/redis-sear...

At least with ES i can build and play with the clustering of the nodes. This is probably why they only made a 1 node ES, because they would have to push their Enterprise software to do make a cluster of RediSearch. Maybe i am wrong.

manigandham7y ago

RedisLabs has done great work in developing Redis but these extensions to retrofit Redis into a multi-model database have issues.

Raw latency is usually not the primary concern most of the time and having everything in RAM can be a major cost problem, further compounded by the lack of compression available as with other persistent stores. The RESP protocol is also overloaded and hard to work with when dealing with json and search queries.

DmitryOlshansky7y ago

Does not tell us the settings for text analysis done by two engines. Secondly on query side - again, scoring settings of RediSearch vs Elastic are not discussed.

With that it’s just 2 points in space which gives us little information to deduce 58% faster at X or whatever.

1024core7y ago

In the fine print: number of shards for the multi-tenant benchmark for Redisearch was increased from 5 to 20; but kept the same (5) for Elastisearch.

This is why the only reliable benchmark is the one you do on your data.

dumbfounder7y ago

Anyone can make a search engine fast. It's much harder to make it good.

rooam-dev7y ago

Is the RediSearch's aggregation comparable with ES's? Speed has lower priority when there are missing features.

PS: Crashes are never good though...

j / k navigate · click thread line to collapse

79 comments

67 comments · 25 top-level

showerst7y ago· 23 in thread

I've seen a lot of ES competitor posts pop up on HN lately, and I think they're missing the point of Elastic.

If you only need very basic word search, ES is probably not worth the complexity in your stack, especially if you're already running a SQL database with decent plaintext search.

atombender7y ago

Not to mention that Elasticsearch is excellent for non-text search.

entangld7y ago

https://www.elastic.co/blog/a-new-era-for-cluster-coordinati...

sandGorgon7y ago

1 more reply

porker7y ago

This. The best benchmarking for search engines is:

    1. Does it return relevant results?
    2. Can it handle complex queries?

2) is only required in specific use-cases, but when it's needed it's _really needed_.

1) is the main measure users care about, and in my experience is best evaluated by building a search in each system with the same corpus and giving to subject-matter experts.

jillesvangurp7y ago

The classic metrics here are recall and precision. Does it return all of the results that it should and does it list the best results first.

This is why Elasticsearch is such a hugely complicated product: it includes a lot of solutions for essentially any use case you can imagine around search.

I have no experience with Redisearch; so I'll reserve my judgment. But this article is not doing it any favors.

showerst7y ago

That's why some of these benchmarks (redis and the go search engine posted last week) seem a little apples/oranges to me.

1 more reply

papito7y ago

Not to mention spelling correction, synonyms, nested taxonomies, etc. Search is incredibly complex, and I always snort when I see someone trying to create one from scratch.

onlyrealcuzzo7y ago

I was just going to ask: Will SQL work with spelling corrections?

I was under the impression that if you wanted to do auto-complete, you need to handle mis-spellings, and that ElasticSearch is one of the best options for this.

2 more replies

AznHisoka7y ago

This X 2. I feel all these people who feel anything else is a viable alternative to Elasticsearch have a dumb, simple, small-scale use case, where even full-text search over Postgres would suffice.

misterman07y ago

>> all these people who feel anything else is a viable alternative to Elasticsearch have a dumb, simple, small-scale use case

I feel you are wrong to call my use case simple and small scale.

wyldfire7y ago

> Where elasticsearch shines is in complex queries ...

praseodym7y ago

This is exactly why Elasticsearch has a soft limit of 1000 shards per node since version 7.0: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/...

This benchmark used 4605 shards (5 per index) on a single node, which is way above the recommended number.

Also, to prevent oversharding, the default number of shards per index has been changed to 1 in 7.0.

yazaddaruvala7y ago

Yeah, anyone creating 921 indexes in the same cluster hasn't read the ES docs[0]. Utilizing aliases and possibly routing is a significantly better design.

I think we can all agree that misusing a tool, after appropriate documentation has been published, shouldn't be a considered a fault of the tool.

[0] https://www.elastic.co/guide/en/elasticsearch/guide/current/...

AznHisoka7y ago

Very very few customers actually have 921 indices in production. That is an insane amount.. by a large factor.

1 more reply

alkz7y ago

each ES shard is actually a lucene index, and it uses memory... why would anyone need thousand of indices on a single node?

1 more reply

sidiOP7y ago

In a production setting, I wouldn't recommend doing ElasticSearch multi-tenancy in this manner. Indexes aren't free.

manigandham7y ago

orthecreedence7y ago

...and then aggregate into time-based buckets, and within each bucket split the results by this field, and then...

dvirsky7y ago

RediSearch can do all of that.

dominotw7y ago

I've seen ppl use it as key value store, time series database( i think they have some apm support too) , nosql datastore.

anonu7y ago

What would be your go-to solution for a basic word search - lets say you only have a few MBs of data - not GBs...

rebelde7y ago

Full-text search from MySQL or other similar database. When that gets overloaded, then consider something like Elastic Search. That is my rule.

winrid7y ago

Few mbs - just use lucene in memory if you're using Java.

gt565k7y ago· 7 in thread

WOW. Hahahaha.

This is a massive misconfiguration of an elastic search cluster. 50k indices? 500 documents per index?

500 records per index at 5shards/index is 100 records per shard.

Yeah, let's shard our data so much that we introduce tremendous amounts of disk i/o overhead!!!

Author should learn how to properly configure an ES cluster before posting ridiculous benchmarks like this.

What an utter pile of garbage benchmark this is.

gt565k7y ago

Oh fuck me, I didn't even realize they used a single instance (node).

Either way, in this particular case, the data is so small, having 5 shards per index with 50k indices results in 250k shards for 5GBs of data.

5GB / 250k shards = 20kb per shard.

You have shards of size ~ 20kb ... total cluster misconfiguration.

jstarfish7y ago

Isn't that exactly what they're trying to demonstrate though? That all this arcana you have to invoke to get a stable ES cluster barely breaks a sweat on Redisearch?

The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index.

gt565k7y ago

1 more reply

jimbokun7y ago

"The specific test deployment was multitenant anyway-- you can't account or optimize for what tenants are going to index."

So in other words:

(This is assuming there isn't a different way to configure ElasticSearch to work for this scenario, that gives similar performance.)

alkz7y ago

I wouldn't call that "arcana", that's just ES and lucene basics

liveoneggs7y ago

this is how comparison benchmarks are done when you need to reach certain results. I've even had it done to me at my job!

When you point out the flawed methodology you come across like a luddite or sour grapes or whatever else.

papito7y ago

You just say "artificial tests produce artificial results, bye".

ademup7y ago· 4 in thread

I'm curious if this scales down well. The test was done on "One AWS c4.8xlarge with 36vCPU and 60GiB Memory". But could I run this on a tiny vps to index, search, catalog my million-odd documents?

Scaevolus7y ago

SQLite's FTS5 module works surprisingly well on that number of documents, and stores the search index on disk instead of RAM.

theblackcat10027y ago

You can check out Sonic[1] and Tantivy[2] both are lightweight search engine

[1] https://github.com/valeriansaliou/sonic

[2] https://github.com/tantivy-search/tantivy

showerst7y ago

I'd guess redis performs _better_ in that case since there minimum overhead for redis is much lower than elasticsearch.

weavie7y ago

Wouldn't Redis need to keep the whole dataset in memory?

1 more reply

coleifer7y ago· 3 in thread

How silly to emphasize things like "built as a C extension" and "uses modern data-structures" as if these were useful criteria for choosing a search engine.

softwaredoug7y ago

Also complaining that "Lucene is 20 years old" is about the same as saying "Linux is ~30 years old"

papito7y ago

dominotw7y ago

> Lucene is 20 years old

Damn I feel old. I remember when Lucene was hot new kid in the block.

bigodines7y ago· 2 in thread

If 2-word queries is all you need, why would you even consider elasticsearch? This benchmark is pure marketing IMHO.

onlyrealcuzzo7y ago

RedisLabs seems to really be abusing Redis's popularity on HN.

_Codemonkeyism7y ago

From the license fiasco to this. RedisLabs tries hard to be the new Microsoft.

jchw7y ago· 1 in thread

>Component: Search Engine

>RediSearch: Dedicated engine based on modern and optimized data-structures

>ElasticSearch: 20 years old Lucene engine

The implications made here make me actually angry.

softwaredoug7y ago

Lucene: over 20 years has been battle tested, optimized, and improved to the point where it’s running search almost everywhere

RedisSearch: new shiny thing built on top of Redis that is used in a couple of niche places.

I’ll take Lucene please

nathanaldensr7y ago· 1 in thread

Someone needs to edit this article. There are misspellings and typos all over the place.

rooam-dev7y ago

It's Friday, the task was to deliver the aritcle by the weekend :)

m3kw97y ago· 1 in thread

So according to HN, they’ve proved RediSearch is actually inferior

manigandham7y ago

No, they just haven't proven anything.

panarky7y ago

I love plain old Redis, but I'm not thrilled with the extension modules from Redis Labs.

I experimented with RediSearch using 20 GB of Reddit posts and I was very underwhelmed.

Finally, I found stability issues with Redis when the RediSearch module is installed. Using the Python client provided by RedisLabs, certain queries would predictably crash every node in the cluster.

Redis itself is rock solid, but Redis with the RediSearch module feels fragile.

Overall, interesting concept but not ready for production use by any means.

softwaredoug7y ago

In order for me to trust a benchmark, it needs to be a lot more transparent than this

- Show the code that runs the bench mark

- Give opportunities for everyone to recreate the benchmark

- Give opportunities for every technology to 'respond' and point out where the benchmark/tech configuration is wrong (ie "PRs welcome")

free6527y ago

The article is a mess of misspellings and misquotes. Also why two distributed search engine were tested on a single node? That's a a meaningless test.

simpsond7y ago

speedplane7y ago

sidiOP7y ago

alkz7y ago

makkesk87y ago

This benchmark is pretty misleading. And not the mention that elasticsearch is free for multi node deployments while redis search is not.

overgard7y ago

The intent is nice, but the weird clippy-style avatar in the bottom right is kinda annoying. I'm just trying to read the article not engage in a conversation.

Scaevolus7y ago

> Dataset source: wikidump Date: Feb 7, 2019 docs: 5.6M size: 5.3 GB

"wikidump" links to https://dumps.wikimedia.org/enwiki/latest/ , which has thousands of files, none of which are 5GB and make sense. That's a very poor corpus link!

It says "Feb 7, 2019", so it probably means https://dumps.wikimedia.org/enwiki/20190120/ or https://dumps.wikimedia.org/enwiki/20190201/ ... maybe. They don't have any obvious 5.3GB files.

g1mp7y ago

If anyone is looking for real benchmarks of ES, check out this page and leave the BS benchmarks aside :-) https://elasticsearch-benchmarks.elastic.co/

siffland7y ago

A problem with RediSearch, at least for me is:

Note: clustering is only available in RediSearch’s Enterprise version

https://redislabs.com/redis-enterprise/technology/redis-sear...

manigandham7y ago

RedisLabs has done great work in developing Redis but these extensions to retrofit Redis into a multi-model database have issues.

DmitryOlshansky7y ago

Does not tell us the settings for text analysis done by two engines. Secondly on query side - again, scoring settings of RediSearch vs Elastic are not discussed.

With that it’s just 2 points in space which gives us little information to deduce 58% faster at X or whatever.

1024core7y ago

In the fine print: number of shards for the multi-tenant benchmark for Redisearch was increased from 5 to 20; but kept the same (5) for Elastisearch.

This is why the only reliable benchmark is the one you do on your data.

dumbfounder7y ago

Anyone can make a search engine fast. It's much harder to make it good.

rooam-dev7y ago

Is the RediSearch's aggregation comparable with ES's? Speed has lower priority when there are missing features.

PS: Crashes are never good though...

j / k navigate · click thread line to collapse