Elasticsearch 1.0.0 released (opens in new tab)

(elasticsearch.org)

471 pointsdakrone12y ago136 comments

136 comments

85 comments · 30 top-level

RyanZAG12y ago· 22 in thread

Elasticsearch is really awesome for searching, but what most people don't realize is that it makes a better MongoDB than MongoDB while giving you that searching too.

bilbo0s12y ago

This. A THOUSAND TIMES "This".

The one drawback ES had in the bad old days was that backup and restore was a nightmare... ESPECIALLY on AWS. The new system they introduced was so simple I was concerned about updating to it because I was SURE something would go south.

But it all just worked.

I still have the Couch to ES replication running because I'm anal like that... but really... yeah... you can do without Couchbase, Mongo et al... ES will probably do everything you need PLUS everything you can't do in the others.

diminish12y ago

As a proud user of Elastic search since the early days I'm happy to see so much progress. Never mind about the *search part of their naming it's really a database for all practical purposes, especially for web data.

rjzzleep12y ago

to be fair, the main selling point of mongodb is that developers can access it more easily. i haven't really touched mongodb in over a year and then only for playing, but have you tried the elasticsearch filter query syntax? have you compared mongodbs syntax?

also, i have the exact opposite nitpick. people want to use it to do everything, mail indexers, file system indexers. what's the matter with web developer folks? why is it that when the next database comes around they want to use it for everything?

1 more reply

AznHisoka12y ago

Just curious, if I'm using say version 0.92, how would I go about backing up my ElasticSearch instance. Besides creating a replica in a server, then "freezing" it by disconnecting the server?

1 more reply

kainosnoema12y ago

I'm surprised so many people miss this. Out of the box, Elasticsearch is a distributed NoSQL store with better write consistency (and arguably performance) than MongoDB offers in its default configuration. The major missing feature was backup snapshots and restores, which 1.0 delivers—along with aggregations that more than rival MongoDBs. The team has intentionally avoided marketing themselves as a NoSQL store (was told this directly by an employee), but they're aware of the potential and have customers using it as such.

nkoren12y ago

It's easy to miss. On the front page, the word "store" only occurs once, buried three page-scrolls down in the body text. Otherwise it very much gives the impression of being some kind of analytics dashboard for third-party datastores. And I didn't notice that until after I've visited the website, clicked through a few links trying to figure out what the fuss was about, then gave up and decided to read the comments here.

1 more reply

gibrown12y ago

While I agree with the sentiment, I think Shay (lead ES developer) has explicitly said that he does not consider ES to be a data store... yet. I think this is mostly due to maturity.

I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.

1 more reply

camus212y ago

did not know all that stuff, could Elasticsearch be the holy grail of document stores ?

3 more replies

sandGorgon12y ago

I had a live production logistics system running on top of Elasticsearch 0.6 (as a NoSQL database ) back in 2012. This powered one of India's largest ecommerce systems (at that time).

Elasticsearch is brilliant as a NoSQL - and if you were already using elasticsearch as a search system, you dont need to introduce yet another component into your stack.

axefrog12y ago

What limitations should one be aware of that would make ElasticSearch not a viable candidate where something like MongoDB would be a better fit?

RyanZAG12y ago

When running a search, ES by default will not show items that have been indexed in the last 1 second. Directly getting an item by its ID doesn't have that limit though, and you can optionally set a search to force a re-index and show all items.

Other than that (which is just performance tuning, really), ES matches mongodb feature for feature, and obviously has a lot of extra power from its search heritage such as facets and percolate.

So I can't actually think of any limitations, and it's why I said ES makes a better MongoDB than MongoDB.

alisson12y ago

On ElasticSearch you have to update the whole document, no commands to manipulate them. You don't have commands like: $set, $addToSet, $pop, etc..

You need to have a good understanding of how tokenizers and analyzers work to be able to create good results for your data. I have difficulties matching documents with the exact title being searched for. On MongoDB that just works, on ElasticSearch you need to configure it.

ElasticSearch has some advantages and MongoDB others. I think they are great together. One for storage and the other for searching.

3 more replies

brasetvik12y ago

I can't comment much on MongoDB, but I've written a bit things to keep in mind when considering Elasticsearch as a NoSQL store here: https://www.found.no/foundation/elasticsearch-as-nosql/

1 more reply

sjs38212y ago

I'm not sure if ElasticSearch does anything like this, but I make use of MongoDB's GeoJSON queries, namely the $geoIntersects operator.

http://docs.mongodb.org/manual/applications/geospatial-index...

1 more reply

abhirama12y ago

When I played around it, could not figure out a way to get the exact count of events in the datastore when the data was distributed in replicas. In fact, there was ticket open for this, not able to fish it out now.

ddorian4312y ago

presharding

You create a number of shards for each index(database) that you can't later expand.

2 more replies

ddorian4312y ago

also changing indexed-fields on the go

mtrn12y ago

True. I evaluated Mongo, Couch and a couple of similar solutions, but ES being a search engine from the start really convinced me, that it can be a viable database for loosely structured data.

g9yuayon12y ago

I don't know much about MongoDB, but it's true that Elasticsearch is a great NoSQL db with support of boolean search. Netflix has a number of use cases that use Elasticsearch as such NoSQL db: http://www.slideshare.net/g9yuayon/elasticsearch-in-netflix

ErrantX12y ago

Definitely! We are using it in production for storing monitoring data (via sensu, if anyone is interested). It's fantastic because you can shove data into the index with a ttl of 1 year. And have a x month archival strategy for cold storage.

It's search capabilities and scalability and fantastic - were throwing GB of data into it weekly and it just soaks it up.

tracker112y ago

I would suggest that everyone who is considering one, look at both... When I looked into both, about a year and a half ago, I found that geospacial searches worked better in MongoDB at the time, and shaping my data to fit was more awkward with ElasticSearch.

That said, it's definitely worth looking into both, depending on what your needs are.

obastemur12y ago

"most people don't realize is that it makes a better MongoDB than MongoDB "

(IMHO) Unfortunately for most of the people, old habits to be made. Indeed a nice project and great release.

m0th8712y ago· 6 in thread

It was two weeks ago, and our startup was on the precipice of a major launch. We had completely rewritten our online publication site, which drives the bulk of our traffic. The product had to be shipped on-time - we had press releases, eager investors and a launch party dependent on it.

A few days before launch, things were not looking good. As admins manipulated articles in preparation for the launch, the servers kept crashing.

In a time-constrained major launch like this, a lot of nasty little hacks build up in the codebase. Our search system for admins was a complete mess. It was a custom solution that worked fine when admins managed a handful of database records, but now that they were managing thousands of articles, it was not scaling at all.

At the 11th hour, we dropped elasticsearch into our infrastructure. It worked like a charm. The servers stopped crapping out, and we launched on time.

Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr), or build anything on top to interface between the index and the queries themselves (Lucene). Thanks elasticsearch, you saved us!

dc244712y ago

> Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr)

If you were using Solr there are a few operational modes to run in. Config file based or SolrCloud[0]. The latter is more akin the ES in terms of cluster management.

I agree though from an simplicity of deployment perspective at scale ES is has a much lighter learning curve.

[0] https://cwiki.apache.org/confluence/display/solr/SolrCloud

acdha12y ago

SolrCloud is nothing like ES in terms of management: you end up running a separate zookeeper service with even more files which all have to be configured correctly just to get it running and you have to micromanage shard allocation to ensure that you can add nodes in the future but also not have it intentionally deadlock when a server fails and you no longer have enough nodes for a quorum. All of this happens with the usual contempt for sysadmins where things you need to know (“refusing to process requests”) won't be logged but a bunch of startup boilerplate will be, and simply configuring logging correctly requires (IIRC) editing two XML files and a properties file.

`java -jar elasticsearch.jar` does a better job and that's basically all it takes. I'm planning to switch as soon as https://github.com/elasticsearch/elasticsearch/issues/256 lands.

1 more reply

troels12y ago

Did you try/consider Sphinx? It's simple and it's quite fast. I'm using that and I'm pretty happy with it, but I might investigate ES at some point to see if I can squeeze a bit more speed out of it.

rch12y ago

You might also take a look at the search functionality in Riak. I've run both Solr and ES, the latter at significant scale, and I'm leaning more towards Riak going forward. The difference is mainly convenience, so not a reason to switch off something that's working already.

1 more reply

m0th8712y ago

As far as I can tell, Sphinx has a more involved setup process. Also our search runs against JSON documents, which seems to suit Elasticsearch better than Sphinx. I might be wrong on both counts though, we really didn't look into Sphinx enough to give it a fair appraisal.

nasalgoat12y ago

Sphinx is a bit too 1:1 - it only works as a single server, not a cluster.

1 more reply

bryanh12y ago· 4 in thread

The thing that worried me the most about Elasticsearch was how fragile it got around the limits of its performance. Run out of memory because of a nasty query? Boom, data corrupted. I hope you weren't using it as your primary persistence layer...

Otherwise, we love ES. The other comment about it being a better Mongo than Mongo rings true. With the backup/restore API and the some of the circuit breakers, I'm hopeful that my fears will be abated.

polyfractal12y ago

FWIW, this is a place ES devs are spending a lot of time thinking about. For example, 1.0 introduces a new "Circuit Breaker" [1] feature which will help prevent over-eager facets from blowing out the heap. It's just one part of a very large effort to make ES handle exceptional events more gracefully (in particular, memory related).

Another example are disk-based doc values [2], which are essentially pre-computed field data structures that are stored on disk. This moves Field Data off heap and allows the OS to manage memory evictions, to help minimize GCs and OOM blowouts.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

[2] http://www.elasticsearch.org/blog/disk-based-field-data-a-k-...

nzadrozny12y ago

Ditto open file handles, which is easy to push when aggressively over-sharding. Not an uncommon mistake for the enthusiastic newbie.

Having supported Solr/ES/Lucene in production for 4+ years now (websolr.com / bonsai.io) I would be pretty hesitant to trust Lucene in general as a primary data store. Beautiful for secondary indexing, but otherwise, Why Not Postgres?™ ;)

RyanZAG12y ago

Complexity. Having two copies of the data means more dev time, more resources required to shift the data around, etc. Having just 1 data store that can also handle all your searching is like the holy grail. As you say, not sure if Solr/ES/Lucene are there yet - but they're definitely very very close. There is no theoretical barrier either - it just comes down to closing bugs, and the ES/Lucene team are very good at closing bugs.

EDIT: I don't think MongoDB is there yet either. There are definite benefits and drawbacks between Postgres and ES, tipping heavily towards Postgres for structured heavy write data. But for ES and MongoDB? I think MongoDB falls a bit short there.

2 more replies

wikyd12y ago

I think the CSS for bonsai.io is not loading.

1 more reply

NDizzle12y ago· 4 in thread

I also took a few days a few weeks ago to setup elastic search after my mysql full text search fell apart.

What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM table, the entire document in a text field.

What I'm afraid I'm not doing right is creating the web interface to search elasticsearch. What I'm using filters with the query string syntax[1] in the search box, pointing directly at that fulltext column. I'm also using the highlight functionality so that I can specify how many highlight blurbs to return with the result. The query string syntax works great with the OCR'd text, because most of it is near-garbage (as most ocr is) so you can search for something like "net sales"~50 to find those two terms within 50 words of each other. I think the results were something like: net sales 15,000 results "net sales" 120 results "net sales"~50 550 results

Can anyone point me at a good web based search implementation using elasticsearch that explains how they're doing it?

What I have works pretty good, I just want to... check my work, I guess.

[1]: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

nzadrozny12y ago

I host and support websolr.com and bonsai.io and have seen a lot of search implementations.

The main thing for good stability and performance is to be very good at batching your updates. You don't want to sling a ton of highly-parallel single-document updates at Lucene, lest you thrash the JVM and start garbage collecting like crazy.

From there, on the query side, you'll want to get a good working knowledge of the different tokenization and analysis options. There are a lot of subtle and interesting combinations to be had in there that influence performance and relevance of your search results.

NDizzle12y ago

Do you have a demo on either of those sites where I can input terms into a search box and look at results? What explanation do you give to users as to the options available when formatting the query?

1 more reply

dclara12y ago

May I ask what you meant about "web based search implementation using elasticsearch"?

Do you mean that you use ES to do indexing on the backend of your documents and make it available on the web? Or do you mean that you use ES to index documents available on the web and let people to search for them?

NDizzle12y ago

Sure. Your first guess is correct - I do indexing of backend documents.

I fetch a steady stream of FOIA documents, close to the maximum possible each week, and PDF/OCR them. I expose a web interface to the analysts I work with, to help them gather up documents for further analysis.

The second guess would probably be more interesting to most people.

1 more reply

karterk12y ago· 4 in thread

Elasticsearch mostly "just works". The latest version of Solr has made clustering easier (requires managing Zookeeper), but before that, it was either ES or nightmare.

Lucene is one of those projects which hardly has any real competition. That's surprising given how many real world software projects have a search requirement. While Lucene is excellent, it's not without flaws and competition is always great.

m0th8712y ago

FWIW, Elasticsearch builds on Lucene. It's just working at a much higher level of abstraction.

dclara12y ago

I agree with you, almost every website needs a search server on the backend for people to search their document base, especially for enterprise intranet. Maybe enterprises are using commercial products, such as SharePoint. How about the rest of the small businesses and websites? Maybe the learning curve is steep for every website to adopt so far.

swah12y ago

Hmm, could that be because they have to compete with free?

malaporte12y ago

Lucene does have competition, mostly in the commercial world. I know, since I work for one of those companies :p

Solr, ElasticSearch, etc. are mostly concerned about the index/search features, and they do quite a good job there. But this still leaves a huge amount of space for commercial offerings, as core search is only a part of the problem. I'm thinking about connectivity with complex enterprise systems, support for the specific security models of those systems, integration in other systems, etc. Believe me, those problems are not easy to solve.

So, even if we have an index that can most probably match Lucene's feature for feature and quite a lot of things beside, we typically won't go after deals where simple search is the only requirement. Instead we focus on larger deals with more complex requirements. And we're doing quite well, thank you :)

alecco12y ago· 3 in thread

Why is it awesome? Why "it just works"? Is it just a mongodb-kind document store over Hadoop+Lucene?

What makes it so special to have hundreds of votes and tweets all around within 2 hours?

I don't understand. A DB engine engineer.

gibrown12y ago

There are a lot of features thoughtfully combined that make ES great. Top of my list would be:

1. It handles human written language. Any language. The same technology that let's it handle strings written in human language provides a lot of flexibility in handling string in other applications. Particular when handling logs.

2. Non-string data it also handles very fast and cleanly (numbers, dates, geo).

3. Lucene has an inverted index that has been optimized over many years. ES scales that pretty seamlessly across many servers. All decisions in the project seem to be made around whether a feature can scale to 100s of nodes.

The devs have also been really smart to focus on the "out of box experience". Very well thought out defaults.

More on our experience with ES at scale: http://gibrown.wordpress.com/2014/01/09/scaling-elasticsearc...

buckbova12y ago

Is this accurate to elastic search since it is build on Lucene?

https://lucene.apache.org/core/

"index size roughly 20-30% the size of text indexed"

That seems excessive for an index.

1 more reply

ddorian4312y ago

distributed/full-text-search(many-many-options)/highlighter/compressed/geo-queries/searching on multiple indexes(databases)|types(tables)/distributed-aggregation/distributed faceting/very-fast-in-memory-suggester/inverse-query(percolator)where you register queries(like rows), and then test documents if they match queries

and many other stuff

Zilog12y ago· 3 in thread

Too bad they have yet to address the split brain issue.

chriscareycode12y ago

I haven't had a split brain on my 15 node cluster in over 6 months even though the cluster is split among multiple data centers which do drop connectivity from time to time. When the setting was wrong, it happened constantly. Tune it properly and it won't happen. n/2+1

r00fus12y ago

Link for the curious: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-br...

AznHisoka12y ago

True, that's a valid issue. For me, it's not as I end up indexing the same document multiple times over the course of 2-3 days.

Argorak12y ago· 2 in thread

Beyond the technology, Elasticsearch has a very mature, active and helpful community with users groups all over the world. We're well connected.

Pick your favourite users group here: http://elasticsearch.meetup.com/

Full disclosure: I started and run the Berlin UG. We set ourselves apart by always providing a small introduction into ES for those that are completely new and would have a hard time following the main talk.

shurane12y ago

Intros to ES and other technologies are useful.

I don't see many tutorials covering usage of ES here: http://www.elasticsearch.org/tutorials/

Could you maybe provide a link to yours?

Argorak12y ago

The introduction is in person, at the users group.

Yep, tutorials is a huge problem, but there are people working on that.

axionike12y ago· 2 in thread

ES has performed very well for us as the backbone for the solution we deployed for a large government-sector customer. Had some GC issues initially, and were worried about user concurrency, especially since we were not restricting queries (i.e. users can do full-scale wildcard searches against the entire data set of 1BN+ records). But ES continues to shine.

Congrats to the ElasticSearch team, and all the supporters around it. Once I get back into more of a coding role, I'll definitely be contributing back to the ES project.

room27112y ago

This may require a bit more lengthy answer than makes sense here, but I'm curious about what was causing your GC issues and how you fixed them (we have GC issues at the moment).

polyfractal12y ago

Not the OP, but GC issues in Elasticsearch basically boil down to memory pressure (obviously), which is usually caused by facets. Facets eat a lot of memory, especially if you are faceting high-cardinality fields - think fields like "tags" or any analyzed field. High cardinality, analyzed strings is the easiest way to blow out the heap.

There are other reasons, but that is like 90% of GC issues. To solve it, you need to make sure your faceted fields are configured well (usually not_analyzed) and assess how much memory is available. You may be able to index and even full-text search ten billion docs on a single machine, but faceting it may just be too much to ask for a single node.

Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.

Other GC culprits can be: too large bulk requests, unbounded threadpool queues, or something like parent/child/scripts/filter cache keys eating all your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)

mavelikara12y ago· 1 in thread

ES seems to have ability to run analytic queries. I have read about people using it as an OLAP solution [1], although I have not yet read anyone describe their experience. In that respect how does ES analytics capabilities compare against:

1) Dremel clones [2] like Impala & Presto (for near real-time, ad hoc analytic queries over large datasets)

2) Lambda Architecture [3] systems (where queries are known up- front, but need to run against a large dataset)

Does anyone here have experience ES in such usecases, beyond the free text searching one ES is well-known for?

[1]: https://groups.google.com/forum/#!topic/elasticsearch/iTy9IY...

[2]: http://static.googleusercontent.com/media/research.google.co...

[3]: http://jameskinley.tumblr.com/post/37398560534/the-lambda-ar...

zcrar7012y ago

I would also be interested in this.

sandstrom12y ago· 1 in thread

This gem is from the 'breaking changes' list:

  “Geo queries used to use miles as the default unit. And we 
  all know what happened at NASA because of that decision. The
  new default unit is meters.”

I like this release already.

roryokane12y ago

Link to that page: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

xutopia12y ago· 1 in thread

I love when something I've been using in production for what seems like years just announces now that they've reached 1.0.

brickcap12y ago

Well does it not make you feel glad that you took the risk? After all version is just a number :)

pron12y ago· 1 in thread

What does Elasticsearch add on top of Lucene?

lobster_johnson12y ago

A lot. Lucene is basically the inverted indexes, providing on-disk structures and a mechanism to query, as well as assorted bits like tokenization.

ES adds distribution (multimaster-replicated cluster of nodes connected via a gossip protocol), sharding, defines a document model and schema (the mapping of arbitrary JSON documents to index structures), faceting, aggregation (ie., roll-up-type calculations), various types of scoring (eg., geographic distance), ETL ("rivers"), backup/restore, performance metrics, a plugin system (eg., for indexing different file formats) and a bunch of other things -- and of course a REST-based API on top of the whole thing.

vhost-12y ago· 1 in thread

I'd be curious to see how well Elastic Search holds up to Endeca. I'm currently stuck maintaining some Endeca instances and it's a nightmare. I wish I could go back to ES.

At my last place of work, ES was beautiful and required little work to get a very fast, workable search in place.

quicksilver0312y ago

FYI, at my shop we use Oracle Commerce (ATG) and we've seen Oracle's salespeople pushing Endeca to all current and new customers.

For our current project we went with ElasticSearch and we're quite happy. One of the contributing factors was that one of our most experienced guys was unable to get the damn thing installed, even with the help of one Endeca consultant.

lflux12y ago

> Easy to read, console-based insight into what is happening in your cluster. Particularly useful to the sysadmin when the alarm goes off at 3am and JSON is too difficult to read.

It's these little details I love, when a project actually cares about operations and not just "well here's the API"

I've been using ElasticSearch only for Logstash, but i've been blown away so far as how easy it is to deal with.

dabeeeenster12y ago

ES is a fantastic project. Thank you thank you thank you for your contribution; truly standing on the shoulders...

jonhmchan12y ago

Congrats to the team - absolutely love elasticsearch. Having a lot of fun with it here at Stack Overflow.

buckbova12y ago

I didn't know what this was and looking at this link it was tough to tell.

The github lays it out well.

https://github.com/elasticsearch/elasticsearch

philfreo12y ago

We wrote a tutorial about how we wrote our search for Close.io using elasticsearch and pyparsing:

"Sales data search: Writing a query parser / AST using pyparsing + elasticsearch"

Part 1: http://blog.close.io/sales-data-search-writing-a-query-parse...

Part 2: http://blog.close.io/sales-data-search-writing-a-query-parse...

hungryblank12y ago

At Contentful in Berlin (Germany) we're looking for an elasticsearch/lucene expert, if you're excited by this tool and want to work full time with it get in touch.

https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7I...

capkutay12y ago

I was vetting ES for a business critical search platform, had some concerns about write/read performance and how the lucene indexes are handled on disk. I read that it doesn't really perform as well a splunk...Instead of ES, I'm considering a solution using HBase to shard lucene indexes on HDFS.

gane5h12y ago

Really impressed with the pace of innovation in the last few months: cat api, aggregations, snapshots. The unfortunate side effect is that books and stack overflow posts written before 1.0 are outdated.

Disclaimer: I’m the founder of a hosted Search As A Service and we use ES in a few critical parts of our infrastructure.

mtrn12y ago

Elasticsearch is a really great piece of software because it makes the simple easy and the complicated possible.

pyotrgalois12y ago

Great news. In every new project that we create (in general REST JSON APIs made with nodejs, erlang or rails that are consumed by iOS and android clients) we always finish using postgresql, redis and elasticsearch. Great tools.

kailuowang12y ago

Congratulations to the team. This is a great library that we really appreciate.

willcodeforfoo12y ago

Congrats! Elasticsearch is one of my favorite recent pieces of technology.

rartichoke12y ago

ES is one of the few techs that I seriously love.

The rails support for it is amazing too. The guy creating the rails integration lib is really talented and active.

elchief12y ago

Anybody know if elasticsearch does multiword synonyms properly? (Solr doesn't). Thx

skarnik12y ago

congrats to the team!

dreamdu5t12y ago

We recently switched from using MixPanel + Crittercism + Sphinx to using qbox.io (hosted elasticsearch) and Kibana to do all our analytics, crash reporting, and search.

I can't recommend qbox.io enough! Point-and-click scaling of managed elasticsearch clusters + Kibana == bliss.

j / k navigate · click thread line to collapse

136 comments

85 comments · 30 top-level

RyanZAG12y ago· 22 in thread

Elasticsearch is really awesome for searching, but what most people don't realize is that it makes a better MongoDB than MongoDB while giving you that searching too.

bilbo0s12y ago

This. A THOUSAND TIMES "This".

But it all just worked.

diminish12y ago

rjzzleep12y ago

1 more reply

AznHisoka12y ago

Just curious, if I'm using say version 0.92, how would I go about backing up my ElasticSearch instance. Besides creating a replica in a server, then "freezing" it by disconnecting the server?

1 more reply

kainosnoema12y ago

nkoren12y ago

1 more reply

gibrown12y ago

While I agree with the sentiment, I think Shay (lead ES developer) has explicitly said that he does not consider ES to be a data store... yet. I think this is mostly due to maturity.

I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.

1 more reply

camus212y ago

did not know all that stuff, could Elasticsearch be the holy grail of document stores ?

3 more replies

sandGorgon12y ago

I had a live production logistics system running on top of Elasticsearch 0.6 (as a NoSQL database ) back in 2012. This powered one of India's largest ecommerce systems (at that time).

Elasticsearch is brilliant as a NoSQL - and if you were already using elasticsearch as a search system, you dont need to introduce yet another component into your stack.

axefrog12y ago

What limitations should one be aware of that would make ElasticSearch not a viable candidate where something like MongoDB would be a better fit?

RyanZAG12y ago

Other than that (which is just performance tuning, really), ES matches mongodb feature for feature, and obviously has a lot of extra power from its search heritage such as facets and percolate.

So I can't actually think of any limitations, and it's why I said ES makes a better MongoDB than MongoDB.

alisson12y ago

On ElasticSearch you have to update the whole document, no commands to manipulate them. You don't have commands like: $set, $addToSet, $pop, etc..

ElasticSearch has some advantages and MongoDB others. I think they are great together. One for storage and the other for searching.

3 more replies

brasetvik12y ago

I can't comment much on MongoDB, but I've written a bit things to keep in mind when considering Elasticsearch as a NoSQL store here: https://www.found.no/foundation/elasticsearch-as-nosql/

1 more reply

sjs38212y ago

I'm not sure if ElasticSearch does anything like this, but I make use of MongoDB's GeoJSON queries, namely the $geoIntersects operator.

http://docs.mongodb.org/manual/applications/geospatial-index...

1 more reply

abhirama12y ago

ddorian4312y ago

presharding

You create a number of shards for each index(database) that you can't later expand.

2 more replies

ddorian4312y ago

also changing indexed-fields on the go

mtrn12y ago

True. I evaluated Mongo, Couch and a couple of similar solutions, but ES being a search engine from the start really convinced me, that it can be a viable database for loosely structured data.

g9yuayon12y ago

ErrantX12y ago

It's search capabilities and scalability and fantastic - were throwing GB of data into it weekly and it just soaks it up.

tracker112y ago

That said, it's definitely worth looking into both, depending on what your needs are.

obastemur12y ago

"most people don't realize is that it makes a better MongoDB than MongoDB "

(IMHO) Unfortunately for most of the people, old habits to be made. Indeed a nice project and great release.

m0th8712y ago· 6 in thread

A few days before launch, things were not looking good. As admins manipulated articles in preparation for the launch, the servers kept crashing.

At the 11th hour, we dropped elasticsearch into our infrastructure. It worked like a charm. The servers stopped crapping out, and we launched on time.

dc244712y ago

> Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr)

If you were using Solr there are a few operational modes to run in. Config file based or SolrCloud[0]. The latter is more akin the ES in terms of cluster management.

I agree though from an simplicity of deployment perspective at scale ES is has a much lighter learning curve.

[0] https://cwiki.apache.org/confluence/display/solr/SolrCloud

acdha12y ago

`java -jar elasticsearch.jar` does a better job and that's basically all it takes. I'm planning to switch as soon as https://github.com/elasticsearch/elasticsearch/issues/256 lands.

1 more reply

troels12y ago

Did you try/consider Sphinx? It's simple and it's quite fast. I'm using that and I'm pretty happy with it, but I might investigate ES at some point to see if I can squeeze a bit more speed out of it.

rch12y ago

1 more reply

m0th8712y ago

nasalgoat12y ago

Sphinx is a bit too 1:1 - it only works as a single server, not a cluster.

1 more reply

bryanh12y ago· 4 in thread

polyfractal12y ago

[1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

[2] http://www.elasticsearch.org/blog/disk-based-field-data-a-k-...

nzadrozny12y ago

Ditto open file handles, which is easy to push when aggressively over-sharding. Not an uncommon mistake for the enthusiastic newbie.

RyanZAG12y ago

2 more replies

wikyd12y ago

I think the CSS for bonsai.io is not loading.

1 more reply

NDizzle12y ago· 4 in thread

I also took a few days a few weeks ago to setup elastic search after my mysql full text search fell apart.

What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM table, the entire document in a text field.

Can anyone point me at a good web based search implementation using elasticsearch that explains how they're doing it?

What I have works pretty good, I just want to... check my work, I guess.

[1]: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

nzadrozny12y ago

I host and support websolr.com and bonsai.io and have seen a lot of search implementations.

NDizzle12y ago

Do you have a demo on either of those sites where I can input terms into a search box and look at results? What explanation do you give to users as to the options available when formatting the query?

1 more reply

dclara12y ago

May I ask what you meant about "web based search implementation using elasticsearch"?

NDizzle12y ago

Sure. Your first guess is correct - I do indexing of backend documents.

The second guess would probably be more interesting to most people.

1 more reply

karterk12y ago· 4 in thread

Elasticsearch mostly "just works". The latest version of Solr has made clustering easier (requires managing Zookeeper), but before that, it was either ES or nightmare.

m0th8712y ago

FWIW, Elasticsearch builds on Lucene. It's just working at a much higher level of abstraction.

dclara12y ago

swah12y ago

Hmm, could that be because they have to compete with free?

malaporte12y ago

Lucene does have competition, mostly in the commercial world. I know, since I work for one of those companies :p

alecco12y ago· 3 in thread

Why is it awesome? Why "it just works"? Is it just a mongodb-kind document store over Hadoop+Lucene?

What makes it so special to have hundreds of votes and tweets all around within 2 hours?

I don't understand. A DB engine engineer.

gibrown12y ago

There are a lot of features thoughtfully combined that make ES great. Top of my list would be:

2. Non-string data it also handles very fast and cleanly (numbers, dates, geo).

The devs have also been really smart to focus on the "out of box experience". Very well thought out defaults.

More on our experience with ES at scale: http://gibrown.wordpress.com/2014/01/09/scaling-elasticsearc...

buckbova12y ago

Is this accurate to elastic search since it is build on Lucene?

https://lucene.apache.org/core/

"index size roughly 20-30% the size of text indexed"

That seems excessive for an index.

1 more reply

ddorian4312y ago

and many other stuff

Zilog12y ago· 3 in thread

Too bad they have yet to address the split brain issue.

chriscareycode12y ago

r00fus12y ago

Link for the curious: http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-br...

AznHisoka12y ago

True, that's a valid issue. For me, it's not as I end up indexing the same document multiple times over the course of 2-3 days.

Argorak12y ago· 2 in thread

Beyond the technology, Elasticsearch has a very mature, active and helpful community with users groups all over the world. We're well connected.

Pick your favourite users group here: http://elasticsearch.meetup.com/

shurane12y ago

Intros to ES and other technologies are useful.

I don't see many tutorials covering usage of ES here: http://www.elasticsearch.org/tutorials/

Could you maybe provide a link to yours?

Argorak12y ago

The introduction is in person, at the users group.

Yep, tutorials is a huge problem, but there are people working on that.

axionike12y ago· 2 in thread

Congrats to the ElasticSearch team, and all the supporters around it. Once I get back into more of a coding role, I'll definitely be contributing back to the ES project.

room27112y ago

This may require a bit more lengthy answer than makes sense here, but I'm curious about what was causing your GC issues and how you fixed them (we have GC issues at the moment).

polyfractal12y ago

Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.

mavelikara12y ago· 1 in thread

1) Dremel clones [2] like Impala & Presto (for near real-time, ad hoc analytic queries over large datasets)

2) Lambda Architecture [3] systems (where queries are known up- front, but need to run against a large dataset)

Does anyone here have experience ES in such usecases, beyond the free text searching one ES is well-known for?

[1]: https://groups.google.com/forum/#!topic/elasticsearch/iTy9IY...

[2]: http://static.googleusercontent.com/media/research.google.co...

[3]: http://jameskinley.tumblr.com/post/37398560534/the-lambda-ar...

zcrar7012y ago

I would also be interested in this.

sandstrom12y ago· 1 in thread

This gem is from the 'breaking changes' list:

  “Geo queries used to use miles as the default unit. And we 
  all know what happened at NASA because of that decision. The
  new default unit is meters.”

I like this release already.

roryokane12y ago

Link to that page: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

xutopia12y ago· 1 in thread

I love when something I've been using in production for what seems like years just announces now that they've reached 1.0.

brickcap12y ago

Well does it not make you feel glad that you took the risk? After all version is just a number :)

pron12y ago· 1 in thread

What does Elasticsearch add on top of Lucene?

lobster_johnson12y ago

A lot. Lucene is basically the inverted indexes, providing on-disk structures and a mechanism to query, as well as assorted bits like tokenization.

vhost-12y ago· 1 in thread

I'd be curious to see how well Elastic Search holds up to Endeca. I'm currently stuck maintaining some Endeca instances and it's a nightmare. I wish I could go back to ES.

At my last place of work, ES was beautiful and required little work to get a very fast, workable search in place.

quicksilver0312y ago

FYI, at my shop we use Oracle Commerce (ATG) and we've seen Oracle's salespeople pushing Endeca to all current and new customers.

lflux12y ago

> Easy to read, console-based insight into what is happening in your cluster. Particularly useful to the sysadmin when the alarm goes off at 3am and JSON is too difficult to read.

It's these little details I love, when a project actually cares about operations and not just "well here's the API"

I've been using ElasticSearch only for Logstash, but i've been blown away so far as how easy it is to deal with.

dabeeeenster12y ago

ES is a fantastic project. Thank you thank you thank you for your contribution; truly standing on the shoulders...

jonhmchan12y ago

Congrats to the team - absolutely love elasticsearch. Having a lot of fun with it here at Stack Overflow.

buckbova12y ago

I didn't know what this was and looking at this link it was tough to tell.

The github lays it out well.

https://github.com/elasticsearch/elasticsearch

philfreo12y ago

We wrote a tutorial about how we wrote our search for Close.io using elasticsearch and pyparsing:

"Sales data search: Writing a query parser / AST using pyparsing + elasticsearch"

Part 1: http://blog.close.io/sales-data-search-writing-a-query-parse...

Part 2: http://blog.close.io/sales-data-search-writing-a-query-parse...

hungryblank12y ago

At Contentful in Berlin (Germany) we're looking for an elasticsearch/lucene expert, if you're excited by this tool and want to work full time with it get in touch.

https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7I...

capkutay12y ago

gane5h12y ago

Disclaimer: I’m the founder of a hosted Search As A Service and we use ES in a few critical parts of our infrastructure.

mtrn12y ago

Elasticsearch is a really great piece of software because it makes the simple easy and the complicated possible.

pyotrgalois12y ago

kailuowang12y ago

Congratulations to the team. This is a great library that we really appreciate.

willcodeforfoo12y ago

Congrats! Elasticsearch is one of my favorite recent pieces of technology.

rartichoke12y ago

ES is one of the few techs that I seriously love.

The rails support for it is amazing too. The guy creating the rails integration lib is really talented and active.

elchief12y ago

Anybody know if elasticsearch does multiword synonyms properly? (Solr doesn't). Thx

skarnik12y ago

congrats to the team!

dreamdu5t12y ago

We recently switched from using MixPanel + Crittercism + Sphinx to using qbox.io (hosted elasticsearch) and Kibana to do all our analytics, crash reporting, and search.

I can't recommend qbox.io enough! Point-and-click scaling of managed elasticsearch clusters + Kibana == bliss.

j / k navigate · click thread line to collapse