Ask HN: Why should I use Elasticsearch instead of building from scratch

9 pointsxkbd8y ago38 comments

38 comments

32 comments · 15 top-level

dewey8y ago· 5 in thread

With that little information from the OP the answer is probably: Use ES or if it's a small side project use your DB's included full text search if it's good enough.

brightball8y ago

You can go far using purely Postgres full text search. I ported a site from Solr to PG full text because of strange syncing issues and it was just as fast.

Afterwards I never really saw the point of any of the search systems other than elastic search because the streaming capabilities that it gives you.

mistrial98y ago

agree, and add your own caching. Lots of pop in the web world comes from layers of well-built caching.

1 more reply

danielecook8y ago

When is a full text search within a DB not good enough? Is ES usually used along side a typical RDS or is it a replacement?

dewey8y ago

I don't have a lot of experience with PG full text search in production or a bigger scale. I'd just suspected that it doesn't perform that well if you need a lot of filters, range queries etc? Maybe someone with more experience can chime in.

At work we just materialize the data from PG into ES and take advantage of the powerful ES queries and redundancy. Scaling up by just adding nodes is easier.

1 more reply

rjkennedy988y ago

Depends on the use case. We use it alongside Cassandra and MySQL, but we also deal with tens of billions of records. If you just have a webblog, you could just use elastic.

ctvo8y ago· 3 in thread

When did this place become such a low effort brain dump? I can't believe I'm so annoyed by this question. Google. Make some educated trade-off decisions based on your context.

Use HN to poll for opinions and experiences from others, not for things that take 30 minutes to resolve.

dvdhnt8y ago

I tend to agree with you, however google isn’t as straightforward as it used to be. Every developer and their mom has a blog these days, or a brand, whatever. Sometimes just cross referencing fundamentals and bonafides is a time sink.

That said, this question probably works better somewhere like Reddit programming or some IIRC where Elasticsearch hangs out. After going that route, it’s probably fine to poll here based on that research.

elorm8y ago

As much OP did lazy work by posting this question, I don’t think your response was fair or helpful in any way. In the end you didn’t add anything to the conversation either.

ctvo8y ago

What can you add to this discussion? 0 context is provided. It's not even a properly formed question. As someone who's familiar with Lucene, Solr and ElasticSearch even if I wanted to help I can't.

Sometimes posts are shit, and it's OK to call it out to hopefully improve this site collectively.

tedmiston8y ago· 3 in thread

You should write one from scratch to get a deeper understanding of how hard it is to return highly relevant results quickly. Tokenizing, stemming, bag of words, and tf-idf for ranking get you to an MVP, but then you realize how good production grade search engines are today.

Solr is good. I've been wanting to try Lunr [1] for small sites.

[1]: https://github.com/olivernn/lunr.js

mlthoughts20188y ago

I worked in a company previously where Solr was used to scale the business, and was not performant for us after a while.

We wrote our own search engine at that point. You are right that there are a lot of little “devil in the details” issues. But overall it was a fun experience.

This was needed to support some specific machine learning workflows in the search ranking process — which could not be used if we paid the high latency cost to first get preliminary results in Solr.

So we took a “create your own index data structures” approach with index data (both the normalized bag of words vectors and companion data like boolean filters), which allowed us to highly optimize the initial broad ranking query. Latency was low enough that it allowed the time cost of calling follow-on machine learning services.

This was for a fairly high-traffic product search engine at an online retailer. It ended up working very well and over a span of about two years we eventually rolled all search traffic onto the in-house platform, even the parts not needing the machine learning services, and our query latencies went down across all our traffic, and we retired the original Solr implementation.

Wouldn’t be the right choice for everyone, but it informs my opinion a lot about the worthwhileness of creating an in-house search engine to specifically replace Solr. I’d suspect a lot of medium-sized or large companies running Solr should seriously consider it.

tedmiston8y ago

I realize my comment was a little vague around Solr — my experience is specifically more with Lucene than Solr itself, and I just think of Solr as an easier entry point to Lucene. Did you consider or end building your custom search engine on top of Lucene or something else? What about using a service like Algolia if something like that were available at the time?

sova8y ago

Could you tell us more about the kinds of indices you used, and what you mean by boolean filters? Thank you!

1 more reply

33degrees8y ago· 2 in thread

Elasticsearch is incredibly deep, and highly performant. If all you need is simple full text search then rolling your own can be an interesting exercise, but I can't imagine the amount of hours it would take to replicate the features I use on a daily basis.

dozzie8y ago

> Elasticsearch is incredibly deep, and highly performant

For some value of "highly performant". I remember its search (exact substring match) being significantly slower than simply running grep on the same data (JSON documents produced from syslog logs) stored in flat files.

It did have several advantages over grep in that scenario (e.g. having a structured language and being accessible for other programs through network), but performance was not one of them.

33degrees8y ago

Right: my experience is with a lot more complex scenarios, and in comparison to rdbms. Things that would take multiple queries, like aggregations, can be done in a single, fast, call. It does require a proper indexing setup and some fine tuning though.

mlevental8y ago· 2 in thread

what a weird question: who looks at a search engine and thinks yea hm that's trivial enough i could do it myself in a weekend?

rjkennedy988y ago

Seriously, creating an efficient scalable search engine is among the most difficult computer science problems. From stemming, to combined queries, to word tokenization, to handling various string collations, and language issues, and caching, and parallelization of work, and handling huge numbers of writes, there are so many tricky parts. I used to work for a search startup and I can answer the OP's question: do not try to write your own search engine. That work should be done by someone with a PhD and decades of experience. Even elastic which is great software has issues, such as not being transactional, having issues handling huge numbers of writes, ect.

sova8y ago

It depends on the complexity, which language(s) you are using, and how you will parse the search string (which Context Free Grammar it will obey). Just recently, this was a task for me that ended up landing me an interview. A take-home problem. Even a simple search engine needs some clever reverse-indexing for speed. Add any sort of logic like And or OR that not even Google implements and now your parser has to work, and you have to be able to translate from tokenized parse tree with operators to results. It's a great learning exercise for someone with experience, initiative, and enough background with CFG parsing, building a reverse index, and set logic -- but without some key Computer Science building blocks it would end up being quite a challenge.

ian13218y ago· 2 in thread

Tough to answer w/o more info. FWIW, I've used Lucene, Solr, and Elasticsearch and have ended up settling on Lucene being the best interface for me.

dajohnson898y ago

I thought Lucene was the underlying query language, whereas Solr & ES just utilized both...

rjkennedy988y ago

Yeah I'm not sure what the OP is talking about. Lucene is the java search library that Elastic uses. Elastic is a full clustered search engine with HA, sharding, and a rest api. They aren't exactly interchangeable.

jakelazaroff8y ago

Is this for work, and is search a competitive advantage for you? If not, snap in Elasticsearch and spend your time on your differentiators.

mikece8y ago

Have you written a search engine from scratch before? There's a reason this is still a primary field for PhD work.

smilesnd8y ago

The simple answer is because Elasticsearch has had thousand of hours already put into writing its code base. The real question should be why shouldn't you use Elasticsearch? Is the code base to large to fit where you need it to be? Will it be able to scale with your project? Is it efficient enough for your requirements? When looking to use a piece of technology the requirements and long term effects are what matter. Roll your own if that is what is required for you to reach your end goal.

anonfunction8y ago

To leverage the thousands of hours that went into it.

based28y ago

What do you want ?

https://db-engines.com/en/ranking

wallflower8y ago

Actually, the origin story of Elasticsearch started with Shay Banon attempting to build a cooking app for his wife who was a chef.

> JAXenter: You started Compass, your first Lucene-based technology, in 2004. Do you remember how and why you became interested in Lucene in the first place?

> Shay Banon: Reminiscing on Compass birth always puts a smile on my face. Compass, and my involvement with Lucene, started by chance. At the time, I was a newlywed that just moved to London to support my wife with her dream of becoming a chef. I was unemployed, and desperately in need of a job, so I decided to play around with “new age” technologies in order to get my skills more uptodate. Playing around with new technologies only works when you are actually trying to build something, so I decided to build an app that my wife could use to capture all the cooking knowledge she was gathering during her chef lessons.

> I picked many different technologies for this cooking app, but at the core of it, in my mind, was a single search box where the cooking knowledge experience would start a single box where typing a concept, a thought, or an ingredient would start the path towards exploring what was possible.

> This quickly led me to Lucene, which was the defacto search library available for Java at the time. I got immersed in it, and Compass was born out of the effort of trying to simplify using Lucene in your typical Java applications (conceptually, it simply started as a “Hibernate” (Java ORM library) for Lucene).

> I got completely hooked with the project, and was working on it more than the cooking app itself, up to a point where it was taking most of my time. I decided to open source it a few months afterwards, and it immediately took off. Compass basically allowed users to easily map their domain model (the code that maps app/business concepts in a typical program) to Lucene, easily index them, and then easily search them.

> That freedom caused people to start to use Compass, and Lucene, in situations that were wonderfully unexpected. Imagine already having the model of a Trade in your financial app, one could easily index that Trade using Compass into Lucene, and then search for it. The freedom of searching across any aspect of a Trade allowed users to convey this freedom to their users, which proved to be an extremely powerful concept.

> Effectively, this allowed me to be in the front seat of talking and working with actual users that were discovering, as was I, the amazing power that search can have when it comes to delivering business value to their users. Oh, and btw, my wife is still waiting for that cooking app. Now, 10 years later, it is the basis of Elasticsearch.

https://jaxenter.com/elasticsearch-founder-interview-112677....

xellisx8y ago

Sphinx is another full text search engine.

hiroshi31108y ago

How about implementing a search engine on top of key-value store like FoundationDB?

courtneycouch08y ago

Definitely build it completely from scratch. You should roll your own TCP libraries as well. Don't trust anything you didn't write yourself. Come to think of it, I'm not sure you should rely on someone else's hardware either.

j / k navigate · click thread line to collapse

38 comments

32 comments · 15 top-level

dewey8y ago· 5 in thread

With that little information from the OP the answer is probably: Use ES or if it's a small side project use your DB's included full text search if it's good enough.

brightball8y ago

You can go far using purely Postgres full text search. I ported a site from Solr to PG full text because of strange syncing issues and it was just as fast.

Afterwards I never really saw the point of any of the search systems other than elastic search because the streaming capabilities that it gives you.

mistrial98y ago

agree, and add your own caching. Lots of pop in the web world comes from layers of well-built caching.

1 more reply

danielecook8y ago

When is a full text search within a DB not good enough? Is ES usually used along side a typical RDS or is it a replacement?

dewey8y ago

At work we just materialize the data from PG into ES and take advantage of the powerful ES queries and redundancy. Scaling up by just adding nodes is easier.

1 more reply

rjkennedy988y ago

Depends on the use case. We use it alongside Cassandra and MySQL, but we also deal with tens of billions of records. If you just have a webblog, you could just use elastic.

ctvo8y ago· 3 in thread

When did this place become such a low effort brain dump? I can't believe I'm so annoyed by this question. Google. Make some educated trade-off decisions based on your context.

Use HN to poll for opinions and experiences from others, not for things that take 30 minutes to resolve.

dvdhnt8y ago

elorm8y ago

As much OP did lazy work by posting this question, I don’t think your response was fair or helpful in any way. In the end you didn’t add anything to the conversation either.

ctvo8y ago

What can you add to this discussion? 0 context is provided. It's not even a properly formed question. As someone who's familiar with Lucene, Solr and ElasticSearch even if I wanted to help I can't.

Sometimes posts are shit, and it's OK to call it out to hopefully improve this site collectively.

tedmiston8y ago· 3 in thread

Solr is good. I've been wanting to try Lunr [1] for small sites.

[1]: https://github.com/olivernn/lunr.js

mlthoughts20188y ago

I worked in a company previously where Solr was used to scale the business, and was not performant for us after a while.

We wrote our own search engine at that point. You are right that there are a lot of little “devil in the details” issues. But overall it was a fun experience.

tedmiston8y ago

sova8y ago

Could you tell us more about the kinds of indices you used, and what you mean by boolean filters? Thank you!

1 more reply

33degrees8y ago· 2 in thread

dozzie8y ago

> Elasticsearch is incredibly deep, and highly performant

It did have several advantages over grep in that scenario (e.g. having a structured language and being accessible for other programs through network), but performance was not one of them.

33degrees8y ago

mlevental8y ago· 2 in thread

what a weird question: who looks at a search engine and thinks yea hm that's trivial enough i could do it myself in a weekend?

rjkennedy988y ago

sova8y ago

ian13218y ago· 2 in thread

Tough to answer w/o more info. FWIW, I've used Lucene, Solr, and Elasticsearch and have ended up settling on Lucene being the best interface for me.

dajohnson898y ago

I thought Lucene was the underlying query language, whereas Solr & ES just utilized both...

rjkennedy988y ago

jakelazaroff8y ago

Is this for work, and is search a competitive advantage for you? If not, snap in Elasticsearch and spend your time on your differentiators.

mikece8y ago

Have you written a search engine from scratch before? There's a reason this is still a primary field for PhD work.

smilesnd8y ago

anonfunction8y ago

To leverage the thousands of hours that went into it.

based28y ago

What do you want ?

https://db-engines.com/en/ranking

wallflower8y ago

Actually, the origin story of Elasticsearch started with Shay Banon attempting to build a cooking app for his wife who was a chef.

> JAXenter: You started Compass, your first Lucene-based technology, in 2004. Do you remember how and why you became interested in Lucene in the first place?

https://jaxenter.com/elasticsearch-founder-interview-112677....

xellisx8y ago

Sphinx is another full text search engine.

hiroshi31108y ago

How about implementing a search engine on top of key-value store like FoundationDB?

courtneycouch08y ago

j / k navigate · click thread line to collapse