RethinkDB 2.0 is now production ready (opens in new tab)

(rethinkdb.com)

407 pointshopeless11y ago152 comments

152 comments

115 comments · 37 top-level

richardwigley11y ago· 15 in thread

Is Rethink going to stay in the community? Or is there a chance that it could be bought out? I don't want to spend time learning something and have it go private like FoundationDB. I'm assuming GNU and Apache is a good thing?

How is RethinkDB licensed?

The RethinkDB server is licensed under the GNU Affero General Public License v3.0. The client drivers are licensed under the Apache License v2.0. http://rethinkdb.com/faq/

coffeemug11y ago

Slava, CEO @ Rethink here. There are two aspects that you should consider.

Firstly, as Daniel pointed out, RethinkDB is licensed under AGPL. An acquirer wouldn't have the legal means to close the source code, and with over 700 forks on GitHub they also couldn't do it practically.

But beyond licensing, consider our personal motivations. We've been working on RethinkDB for five years, and had quite a few opportunities to sell the company. We turned them all down because we really believe in the product. The world is clearly moving towards realtime apps, and we feel it's extremely important for open realtime infrastructure to exist. It's easy for people to make promises about the future, but consider this from a game-theoretic point of view. If we wanted to sell, we could have done it long ago. I know it's not a guarantee, but hopefully it's a strong signal to help with your decision.

(Also, there are lots of really interesting companies building products on RethinkDB that we can't talk publicly about yet. It would be silly to sell given that momentum)

jwr11y ago

I hear you talk a lot about real-time — I guess that's a niche that you noticed. But let me add to that: being "distributed" without major pain is also big. There is a niche to be filled on the (loosely defined) "distributed" spectrum between, say, Redis and Cassandra, and so far you seem to be heading right for that place. I like that a lot and plan to use RethinkDB for a number of projects in the near future.

exelius11y ago

What about Couchbase? Specifically for the niche on the spectrum between Redis and Cassandra.

2 more replies

TkTech11y ago

In the unlikely chance that you can talk about future plans, is the idea to commercialize by offering "enterprise" support on top of Rethink, or an extended-feature closed source version, etc...?

At some point people need bread and butter, so I'm curious where that's going to come from :)

<3 Rethink.

coffeemug11y ago

Our business plans are all about a subscription support model (see http://rethinkdb.com/services/) and enterprise services on top of RethinkDB. The product will always be open source (hopefully the OSS community at large is past the world of closed source "extensions").

ifcologne11y ago

FoundationDB was a closed-source database. It was never open-source.

They had only open-sourced the SQL-Layer on top of their key/value store and it's still available on Github. The reason: They build it based on open-source code

When someone deletes a public repository on Github, one fork remains as the new master. (Here's FoundationDB's SQL-Layer: https://github.com/louisrli/sql-layer)

So: RethinkDB will stay, even if someone tries to pull the plug. Just fork them on Github. :)

danielmewes11y ago

Daniel @ RethinkDB here. As you mention, RethinkDB is fully open source so RethinkDB is always going to remain freely available.

vezzy-fnord11y ago

This does not discount the possibility of a license change after a hypothetical acquisition, however. Though in that case you'll likely get a community fork branching off the upstream proprietary base.

Shamanmuni11y ago

No, the AGPL uses strong copyleft, so any future derivative work must be released under the same terms (and the same license or later versions if I'm not mistaken). The only possibility is to start a closed source clone that doesn't use any of the original code from zero.

The cases in which the community forks a project licensed with a copyleft license (like LibreOffice) has to do with insatisfaction with the direction in which the company that owns the original trademarks is leading said project. There's no risk of closing the source code.

1 more reply

richardwigley11y ago

Authoritative source of information, thanks ;-) I will evaluate it on my next project.

Apofis11y ago

Quick question: How would RethinkDB benefit an E-Commerce shop?

coffeemug11y ago

It depends on what you're trying to accomplish. Shoot me an email to slava@rethinkdb.com -- I'd be happy to help!

benatkin11y ago

I don't consider the AGPL to be fully open source. To me, it isn't in the spirit of open source.

(BTW the Open Source Institute was unable to get a trademark for "open source" so it doesn't matter that they approved it.)

e12e11y ago

I hesitate to comment (we've had a few copyleft vs BSD etc-discussions...) -- still, I think the best way to look at the AGPL is as the GPL patched to work around the move from software distribution to software as a service: the end user no longer gets a copy of the software, and so the GPL doesn't protect the end user any more (which is who the GPL is for, incidentally the end user might also be a developer -- but that is incidental: the first Freedom (Freedom 0) is the freedom to run code. You don't have that freedom with SaaS -- if the service provider goes away, so does your ability to run the software).

Now, one can be in agreement with the idea that the four freedoms are important, especially as we increasingly live in a world where software is not only convenient, but necessary in our daily lives -- but the idea with the AGPL, and why it is needed for server software -- is pretty clear.

eternalban11y ago

What does "fully open source" mean?

mping11y ago· 11 in thread

Anyone has some numbers on performance? I tried RethinkDB 1.x and the performance wasn't quite there yet, specially bulk import and aggregations.

coffeemug11y ago

We'll be publishing a performance report soon (we didn't manage to get it out today).

Rough numbers you can expect for 1KB size documents, 25M document database: 40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across nodes.

We should be able to get the report out in a couple of days.

lobster_johnson11y ago

Any work done in 2.0 for improving aggregation performance?

The last time I tried with 1.16, I gave up my testing when even the simplest aggregation query (count + group by with what should be a sequential, streaming scan) took literally minutes with RethinkDB, compared to <1s with PostgreSQL. Rethink coredumped before I gave it enough RAM, after which it blew up to around 7GB, whereas Postgres uses virtually no RAM, mostly OS buffers.

danielmewes11y ago

We did a couple of scalability improvements in 2.0, but didn't optimize groups and counts specifically.

Would you mind writing me an email with your query or opening an issue at https://github.com/rethinkdb/rethinkdb/issues (unless you have already?)? I'd like to look into it to see how we can best improve this.

We're planning to implemented a faster count algorithm that might help with this (https://github.com/rethinkdb/rethinkdb/issues/3949), but it's not completely trivial and will take us slightly longer to implement.

1 more reply

e12e11y ago

Rough numbers indeed - you forgot to define what a "server" is -- dedicated hw 16 core xeon with 4xssd in hw raid0 or a Digital Ocean vps with 512MB ram? ;-)

danielmewes11y ago

Daniel @ RethinkDB here. We'll release the details shortly. This was running on 12 core Xeon servers with 2 SSDs each in software RAID 0. There were also additional read queries running at the same time as the write queries, and the read throughput that coffeemug posted is the sustainable increase in reads/s that you get when adding an additional server to a cluster. Single-server performance is much higher due to missing network / message encoding overhead.

I realize these numbers alone are still not very meaningful and there are many remaining questions (size and structure of the data set, exact queries performed etc). Rest assured that all of these details will be mentioned in the actual performance report that should be up soon.

1 more reply

javajosh11y ago

Thank you for providing some quick back-of-the-envelope numbers here, which is exactly what most people are looking for at a first pass. One question though - do those numbers change considerably between disk vs SSD?

coffeemug11y ago

The numbers were measured on SSD. If the active/hot dataset fits into RAM, the numbers between SSD and rotational don't change much. If the active dataset doesn't fit into RAM, RethinkDB performs significantly worse on rotational.

cachvico11y ago

Can you give a ballpark on how many nodes Rethink can scale up to, and any future roadmap in that direction?

Thank you for the fantastic product, by the way! :)

coffeemug11y ago

We often run tests scaling up to ~40 nodes without problems. You could probably push Rethink quite further than that, but I think over 100 nodes would be hard. The goal is to keep pushing the boundary indefinitely.

1 more reply

GordyMD11y ago

Looking forward to seeing the report!

jfolkins11y ago

I contributed the benchmarks to Dan's gorethink driver. Dan is great to collaborate with so if you want to hack on Go and contribute to OSS, consider giving his project a look.

One way to improve writes is to batch them, an example is here.

https://github.com/dancannon/gorethink/blob/master/benchmark...

I believe rethinkdb docs state that 200 is the optimum batch size.

Another way is to enable the soft durability mode.

http://rethinkdb.com/api/javascript/insert/

"In soft durability mode RethinkDB will acknowledge the write immediately after receiving and caching it, but before the write has been committed to disk."

https://github.com/dancannon/gorethink/blob/master/benchmark...

Obviously your business requirements come into play. I prefer the Hard writes because my data is important to me but I do insert debug messages using soft writes in one application I have.

*Edit: Heh I forgot to mention, on my Macbook Pro I was getting 20k w/s while batching and using soft writes.

Individual writes for me are hovering around 10k w/s on the 8 cpu 24gb instance i have. But yeah, define your business reqs then write your own benchmarks and see if the need is met.

Many devs write benchmarks in order to be the fastest and not the correctest. Super lame.

geddski11y ago· 8 in thread

I've been using RethinkDB for a while now and I really enjoy working with it. It's a great fit for React and Angular 2 apps with their one-way data flow through the application. Hook up a store or a model to an event source (server-sent events) that streams the RethinkDB changes feed and it's just awesome and simple. Realtime shouldn't be this easy, totally feels like cheating. Love it.

I also really like the ability to do joins, where before in Mongo I would have to handle data joins in the app level.

e12e11y ago

How do you deal with user authentication, authorization and data encryption? Do you have a web server/application server or do you just combine static js/html/css resources and RethinkDB?

I'm kind of enamoured with the idea of couchapps -- but I'm still not entirely comfortable with having my db be my web and app server, as well as having it manage passwords etc... as I'm reading up, I'm slowly convincing myself it's possible to both make it work, be easy, support a sane level of TLS, load balance and be secure with proper ACL support... but very few tutorials/books seem to really deal with that to a level that brings me confidence.

jkarneges11y ago

By "an event source [...] that streams the RethinkDB changes feed", the parent is implying a separate web service layer that consumes data from RethinkDB and sends it out to clients. RethinkDB is not meant for direct access by clients. More about RethinkDB access here: http://rethinkdb.com/docs/security/ (TL;DR: plaintext shared key or ssh tunnel)

nileshtrivedi11y ago

Do you have any project on github that works like that?

geddski11y ago

Not that's open source, but I can do a little write up article and share a sample app that shows how to do it.

kclay11y ago

Would love to see this as well. I'm the creator of the Scala driver and always looking for ways to improve the api.I know you may not be using Scala but having insight on how other devs would use it always helps. Plus I'm still trying to figure out the best way to do change feeds hehe.

GordyMD11y ago

I'd personally love to see this. Think it would be very valuable to the community.

1 more reply

sterlingross11y ago

That would be awesome, thank you for considering it!

nileshtrivedi11y ago

Please do. :)

Xorlev11y ago· 5 in thread

Congrats on the 2.0! It's been interesting to watch as a project.

Do you expect that as you stabilize you'll officially support more drivers? Or are you going to leave that as a community effort?

coffeemug11y ago

Slava @ Rethink here.

We're planning to take the most well-supported community drivers under the RethinkDB umbrella (assuming the authors agree, of course). It will almost certainly be a collaboration with the community, but we'll be contributing much more to the community drivers, supporting the authors, and offering commercial support for these drivers to our customers.

tshannon11y ago

That's good to hear, because the Go driver has been exhaustively maintained by a single developer (https://github.com/dancannon/gorethink) Dan Cannon, and I'm sure he (as well as the Go community) would love to see some support.

thethimble11y ago

Ditto regarding a java driver. It seems crazy for a database to not provide native java support.

1 more reply

jwr11y ago

Very glad to hear that. I tried using RethinkDB with Clojure recently, but there are two drivers. Both are mentioned on your pages. Figuring out which driver I should use isn't a great start — so even if you don't do a lot of development, just pointing to the drivers you consider "canonical" would help.

mmrmissing11y ago

I had to create a small project for programming class. I settled on clojure and revise(bitemyapp's driver). Avoid this one. I hit on driver bug today, but didn't have the time to fix/report it. Just switched to the other one...

1 more reply

notdonspaulding11y ago· 4 in thread

Cool!

I've started to look into RethinkDB in the past, and I'm very interested in the features it claims. However, I only have so much time to investigate new primary storage solutions, and our team has been burned in the past by jumping too quickly on a DB's bandwagon when the reliability, performance, or tooling just wasn't there.

As of late, we've come to rely on Aphyr's wonderful Call Me Maybe series[0] as a guide for which of a DB's claims are to be trusted and which aren't. But even when Aphyr hasn't tested a particular DB himself, some projects choose to use his tool Jepsen to verify their own claims. According to at least 1 RethinkDB issue on Github, RethinkDB still hasn't done that[1].

Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR is NJ;DU (No Jepsen, Didn't Use)

[0] https://aphyr.com/tags/jepsen

[1] https://github.com/rethinkdb/rethinkdb/issues/1493

coffeemug11y ago

Slava @ Rethink here.

This is a great point, and we're on it! We have a Raft implementation that unfortunately didn't make it into 2.0 (these things require an enormous amount of patient testing). The implementation is designed explicitly to support robust automatic failover, no interruptions during resharding, and all the edge cases exposed in the Jepsen tests (and many issues that aren't).

This should be out in a few months as we finish testing and polish, and will include the results of the Jepsen tests. (It's kind of unfortunate this didn't make it into 2.0, but distributed systems demand conservative treatment).

sixdimensional11y ago

This conservative/consistent/responsible approach is one of the reasons I have faith in RethinkDB. You always seem to be taking the time to build it right and that is priceless.

wspeirs11y ago

Another good one for testing distributed systems/databases is blockade: http://blockade.readthedocs.org/en/latest/

danielmewes11y ago

Understood. We're planning to test with Jepsen soon. This will happen once we have implemented fully automatic failover (at the moment it still requires manual intervention, even though it's usually straight forward). We have a first working implementation, but are still working on the details. It should become ready in the next ~2 months.

See the issue you mentioned https://github.com/rethinkdb/rethinkdb/issues/1493 for progress on this.

evo_911y ago· 4 in thread

Now if only Meteor would support this all would be good in the world.

GordyMD11y ago

RethinkDB's realtime capabilities would fit perfectly with Meteor.

nileshtrivedi11y ago

How? Meteor's server-side architecture is still oriented around polling the DB, and I believe that's because many apps are still explicit request-response oriented.

GordyMD11y ago

As @imslavko said, when using Meteor with MongoDB (which I believe is the only production ready DB driver) it observes the oplog [1] for changes. You can use the polling observer too though.

You can find out more about the LiveUpdate core project of Meteor on their site [2] - it basically says the implementation of Live Updates for each db driver is independent to what the db is capable of. Specific mention of RethinkDB and Firebase is made as DBs that are built with making realtime data something that you get for relatively little work.

[1] https://github.com/meteor/meteor/blob/devel/packages/mongo/o...

[2] https://www.meteor.com/livequery

imslavko11y ago

No, Meteor's server-side architecture uses MongoDB's replication log that is analyzed to get updates.

xtrumanx11y ago· 4 in thread

Lots of congratulating on this thread and a hell of a lot of points for a software release. I've been on HN consistently for a long while and I didn't realize there was so much love and hype for RethinkDB here.

Have I missed something?

andrewflnr11y ago

I guess you have. There are a lot of us into alternative databases that are hoping for Rethink to fulfill the original promise of MongoDB. That said, I can't blame you for not devoting a bunch of attention to it. :)

jasondc11y ago

Can you be more specific on the original promise MongoDB didn't fulfill?

shockzzz11y ago

It's a nightmare to scale and has performance quirks that are really unexpected. Many, many companies have had to spend enormous amounts of developer time to migrate off of MongoDB to something else.

1 more reply

robertfw11y ago

I've been following RethinkDB on HN for quite a while now and have been eagerly awaiting them to make a production-ready statement. Everything I have read has sounded very promising and I am excited to try it out!

_dancannon11y ago· 4 in thread

Congratulations, been looking forward to this release for a while!

straik11y ago

I think this a good place to say thank you for you're work on the Go Rethink driver. This is a clear written easy to follow and effective peace of code.

_dancannon11y ago

Thank you very much! I hope to have an update to the Go driver which supports RethinkDB v2.0 within a couple of hours.

nulltype11y ago

I would like to thank you as well! I didn't really have any time to work on rethinkgo after I made the first version, thanks for doing such a good job with gorethink.

_dancannon11y ago

Just released the latest update to the Go driver, it has some pretty big changes including the ability to connect to a RethinkDB cluster + automatic host discovery.

For more information check out https://github.com/dancannon/gorethink/releases/tag/v0.7.0.

expando11y ago· 3 in thread

Selling support is a great non-intrusive business model.

ThinkBeat11y ago

Except that it incentivises a company to build a product that requires continuing support.

That can be a good thing or a bad thing.

coffeemug11y ago

> Except that it incentivises a company to build a product that requires continuing support.

People say this a lot, but in our case we really haven't seen this incentive for a couple of reasons.

Large organizations are more than happy to pay for training and development support to accelerate their time to market. It doesn't matter how polished your product is -- databases are complex enough that people are willing to pay for best practices, training, and support.

Similarly, databases are pretty critical pieces of the infrastructure. If anything goes wrong, it can significantly impact the business, so people always want operational/production support.

There are many enterprise services that can be built on top of the product that can be very valuable. You don't have to build a crappy product -- there are plenty of ways to monetize with a great product.

Finally, a bad product will significantly limit growth of the company in the long term. There are lots of options now -- you can't get away with building a crappy product and an artificial monopoly.

If you see a crappy product from a company that offers subscription support, it's probably not because of misaligned incentives. Building databases is really hard, I don't think the business model has much to do with it.

giaour11y ago

Selling support for terrible (but free!) software is usually known as the "MongoDB model," so it's a proven path to riches in the database market.

dorfsmay11y ago· 3 in thread

Does RethinDB has a concept of transaction? My question is actually about restoring a lost node... If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

coffeemug11y ago

> If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Just the delta. We built an efficient, distributed BTree diff algorithm. When a node goes offline and comes back up, the cluster only sends a diff that the node missed.

> Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

You don't have to do that, it happens automatically. You can have full visibility and control into what's happening in the cluster -- check out http://rethinkdb.com/docs/system-tables/ for details on how this works.

dorfsmay11y ago

> You don't have to do that, it happens automatically

Well, in a past life, I used another store that did that automatically, the issue with that is that EITHER it kills the cluster because of read-congestion as it re-builds the "new" node, OR, if you limit the bandwidth for node-building, it takes for ever and a half to rebuild a node which means that you are exposed with one less shard of what was on that node.

What are the chances of a filesystem snapshot to be consistent enough to be used to prime a crashed node? What about restoring backup files from other nodes?

coffeemug11y ago

Congestion vs. time is definitely a hard problem. We've done an enormous amount of tuning to make this work, and the upcoming Raft release does even more. This part has been quite solid for a while, so I think you might have a better experience with RethinkDB than what you're used to.

There is currently no other way to prime the node -- I hope we don't have to add it. This sort of functionality should work out of the box.

dorfsmay11y ago· 3 in thread

Why would I use RethinkDB instead of OrientDB?

coffeemug11y ago

Check out http://rethinkdb.com/faq/ for details on when RethinkDB is a great choice. The short version is that if you're building realtime apps, RethinkDB is an awesome choice because it pushes data to the application (which makes building and scaling realtime apps dramatically easier).

ScottBurson11y ago

Hi Slava, the FAQ has a typo in the second sentence: "architecutre".

coffeemug11y ago

Thanks -- fixed. Will take a little bit to push the site update live.

1 more reply

mberning11y ago· 2 in thread

For the rubyists out there check out http://nobrainer.io/

sandstrom11y ago

Is anyone using nobrainer in production?

We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for RethinkDB is the main thing holdings us back.

I don't have great insight into nobrainer, but last I checked it seemed like joins wheren't implemented (but on the roadmap).

vdaniuk11y ago

I like rethinkdb and have been successfully using their official ruby and js libraries for some time.

Nobrainer orm wasn't fun though, too many edge cases that interfere with activerecord and rails conventions. Going a bit on a tangent, after many experiments I've developed a strong conviction that pg is the best database choice for rails, especially with the jsonb datatype included in 9.4. It is the best of two worlds: reliable, proven sql db that plays really well with Rails and has nosql capabilities, including indexing and quering. So good. Ymmv.

wilsonfiifi11y ago· 2 in thread

Well done guys! Have been wanting to use rethinkdb for my project but it didn't have the "production ready" tag, so Mongodb was chosen instead. Now I can confidently switch! It's a pity the Go driver isn't quite there yet though.

_dancannon11y ago

I hope to have a "production ready" version of the driver ready in about a month. I know its slow but currently I am the only dev working on maintaining this project and all work is done in my free time.

If you have any further questions I would be more than happy to answer them on https://gitter.im/dancannon/gorethink. Thanks!

wilsonfiifi11y ago

Hey no worries I absolutely understand. Apologies for lamenting on the pace/state of your contribution and thanks for your time and effort.

Fauntleroy11y ago· 2 in thread

Now that 2.0 is production ready, will we be seeing some RethinkDB providers? A simple Heroku integration would be amazing for quickly prototyping apps with a new database technology.

jkarneges11y ago

As Slava mentioned, you can use Compose.io. It requires using an SSH tunnel, though, which is a little tricky in Heroku. Here's a tunnel script I made to simplify this:

https://github.com/fanout/leaderboard/blob/master/tunnel.py

In particular, it reads the entire SSH private key as an environment variable, so you don't need to commit the key to the git repository.

coffeemug11y ago

You can spin up RethinkDB today with https://www.compose.io/ (it's surprisingly easy, and their support is awesome). It should be pretty easy to get a RethinkDB Heroku plugin based on Compose. If the community doesn't get around it, we can probably do it internally pretty easily.

Ciantic11y ago· 2 in thread

I wish they did official TypeScript definition files. I'm a bit wary to rely on huge DB API with community definitions only.

There are reasons to write TypeScript definitions for documentation generation too, if not for the code as TS.

coffeemug11y ago

There is an official spec here: http://rethinkdb.com/docs/writing-drivers/ Not quite TS, but it's well defined and new releases of the spec are carefully managed.

Ciantic11y ago

For me the TS is a tool to ensure my code is not using deprecated API. This is a partly reason why Facebook is also pushing typing to JS with Flow.

Edit: And Guido is pushing it to Python with PEP 484: https://www.python.org/dev/peps/pep-0484/

It's inherent problem with dynamic languages, you have to read all new release documents and migrate your code. With typed code I at least can be somewhat sure I'm not using deprecated calls and such just by compiling.

cachvico11y ago· 2 in thread

Any thoughts about multi-doc transactions?

danielmewes11y ago

It's not currently on our road map.

Even though there are some well-researched algorithms for it, actually implementing transactions in a distributed system is pretty hard. It also comes at significant performance costs, which would interfere with our goal of easy and efficient scalability.

cachvico11y ago

Thank you for the comment. I was wondering if something along the lines of http://blog.labix.org/2012/08/22/multi-doc-transactions-for-... would be feasible.

gauravphoenix11y ago· 1 in thread

any plans of releasing officially supported Java driver? For most enterprise oriented apps, having officially supported Java driver will be great.

coffeemug11y ago

Yes! No ETA yet, but we're on it.

thoughtpolice11y ago· 1 in thread

I've updated NixOS to include 2.0.0-1: https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e64... - any way we can get it mentioned on the website?

coffeemug11y ago

Could you suggest a pull request in docs? (https://github.com/rethinkdb/docs)

nickstinemates11y ago· 1 in thread

Big fan of RethinkDB. Use it in all of my projects these days.

vonklaus11y ago

What were you using before? What are the pros and cons of the switch?

babo11y ago· 1 in thread

Looking forward to install it from homebrew but it's not there yet. Good to see that for python drivers PIP is already updated!

coffeemug11y ago

It should be out later today. We're working on it now.

cdnsteve11y ago

I'm going to give this a spin out of pure respect for the team that's dedicated 5 years to a product without cashing out. Hats off. Your CEO has some respectable... anatomy.

dkhenry11y ago

Awesome news. I have used Rethink for a few internal projects and while I don't think it has that one "killer feature" that other DB's don't it is such a painless experience in development and deployment that makes just worlds better then trying to set up and scale some of the other solutions.

BZ rethinkdb team.

kolencherry11y ago

Congrats on the 2.0 release! Changefeeds are an incredibly powerful feature. We're looking forward to the next release with automagic failover!

billclerico11y ago

congrats Slava, Mike & team. in an age of thin apps getting shipped in weeks or months, the patience you showed in spending 5 years developing some pretty hard-core technology is amazing. really excited for you guys!

cookiecat11y ago

Congrats guys, RethinkDB has been a joy to use so far, but the 3rd party .net driver needs some help. I filed an issue here: https://github.com/rethinkdb/rethinkdb/issues/3931

DAddYE11y ago

I'm very happy to see this milestone, even tho I haven't used it recently I remember 2/3 years ago we tried it (adtech) for some heavy production workload. Even if we chosen another product (cassandra) I was literally surprised how well performed! Congrats!

aioprisan11y ago

the commercial services launch is critical and will speed adoption from large players

nviennot11y ago

Lots of hard work has been poured into this release :)

Congrats to the RethinkDB team!

jmtame11y ago

Congrats Slava, Mike and the rest of the folks at RethinkDB!

covi11y ago

Brilliant name (Yojimbo) and great cover photo there...

ataussig11y ago

Congrats to the RethinkDB team on this huge milestone!

jessejhernandez11y ago

Congrats Mike & Team!

jjsalamon11y ago

Congrats guys! I've been looking forward to using Rethink.

Is windows support coming anytime?

jkot11y ago

Congratulations!

weixiyen11y ago

Congratulations guys! Amazing update :D

hemantv11y ago

This is awesome :)

thomcrowe11y ago

Congrats guys!

j / k navigate · click thread line to collapse

152 comments

115 comments · 37 top-level

richardwigley11y ago· 15 in thread

How is RethinkDB licensed?

The RethinkDB server is licensed under the GNU Affero General Public License v3.0. The client drivers are licensed under the Apache License v2.0. http://rethinkdb.com/faq/

coffeemug11y ago

Slava, CEO @ Rethink here. There are two aspects that you should consider.

(Also, there are lots of really interesting companies building products on RethinkDB that we can't talk publicly about yet. It would be silly to sell given that momentum)

jwr11y ago

exelius11y ago

What about Couchbase? Specifically for the niche on the spectrum between Redis and Cassandra.

2 more replies

TkTech11y ago

In the unlikely chance that you can talk about future plans, is the idea to commercialize by offering "enterprise" support on top of Rethink, or an extended-feature closed source version, etc...?

At some point people need bread and butter, so I'm curious where that's going to come from :)

<3 Rethink.

coffeemug11y ago

ifcologne11y ago

FoundationDB was a closed-source database. It was never open-source.

They had only open-sourced the SQL-Layer on top of their key/value store and it's still available on Github. The reason: They build it based on open-source code

When someone deletes a public repository on Github, one fork remains as the new master. (Here's FoundationDB's SQL-Layer: https://github.com/louisrli/sql-layer)

So: RethinkDB will stay, even if someone tries to pull the plug. Just fork them on Github. :)

danielmewes11y ago

Daniel @ RethinkDB here. As you mention, RethinkDB is fully open source so RethinkDB is always going to remain freely available.

vezzy-fnord11y ago

Shamanmuni11y ago

1 more reply

richardwigley11y ago

Authoritative source of information, thanks ;-) I will evaluate it on my next project.

Apofis11y ago

Quick question: How would RethinkDB benefit an E-Commerce shop?

coffeemug11y ago

It depends on what you're trying to accomplish. Shoot me an email to slava@rethinkdb.com -- I'd be happy to help!

benatkin11y ago

I don't consider the AGPL to be fully open source. To me, it isn't in the spirit of open source.

(BTW the Open Source Institute was unable to get a trademark for "open source" so it doesn't matter that they approved it.)

e12e11y ago

eternalban11y ago

What does "fully open source" mean?

mping11y ago· 11 in thread

Anyone has some numbers on performance? I tried RethinkDB 1.x and the performance wasn't quite there yet, specially bulk import and aggregations.

coffeemug11y ago

We'll be publishing a performance report soon (we didn't manage to get it out today).

Rough numbers you can expect for 1KB size documents, 25M document database: 40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across nodes.

We should be able to get the report out in a couple of days.

lobster_johnson11y ago

Any work done in 2.0 for improving aggregation performance?

danielmewes11y ago

We did a couple of scalability improvements in 2.0, but didn't optimize groups and counts specifically.

1 more reply

e12e11y ago

Rough numbers indeed - you forgot to define what a "server" is -- dedicated hw 16 core xeon with 4xssd in hw raid0 or a Digital Ocean vps with 512MB ram? ;-)

danielmewes11y ago

1 more reply

javajosh11y ago

coffeemug11y ago

cachvico11y ago

Can you give a ballpark on how many nodes Rethink can scale up to, and any future roadmap in that direction?

Thank you for the fantastic product, by the way! :)

coffeemug11y ago

1 more reply

GordyMD11y ago

Looking forward to seeing the report!

jfolkins11y ago

I contributed the benchmarks to Dan's gorethink driver. Dan is great to collaborate with so if you want to hack on Go and contribute to OSS, consider giving his project a look.

One way to improve writes is to batch them, an example is here.

https://github.com/dancannon/gorethink/blob/master/benchmark...

I believe rethinkdb docs state that 200 is the optimum batch size.

Another way is to enable the soft durability mode.

http://rethinkdb.com/api/javascript/insert/

"In soft durability mode RethinkDB will acknowledge the write immediately after receiving and caching it, but before the write has been committed to disk."

https://github.com/dancannon/gorethink/blob/master/benchmark...

Obviously your business requirements come into play. I prefer the Hard writes because my data is important to me but I do insert debug messages using soft writes in one application I have.

*Edit: Heh I forgot to mention, on my Macbook Pro I was getting 20k w/s while batching and using soft writes.

Individual writes for me are hovering around 10k w/s on the 8 cpu 24gb instance i have. But yeah, define your business reqs then write your own benchmarks and see if the need is met.

Many devs write benchmarks in order to be the fastest and not the correctest. Super lame.

geddski11y ago· 8 in thread

I also really like the ability to do joins, where before in Mongo I would have to handle data joins in the app level.

e12e11y ago

How do you deal with user authentication, authorization and data encryption? Do you have a web server/application server or do you just combine static js/html/css resources and RethinkDB?

jkarneges11y ago

nileshtrivedi11y ago

Do you have any project on github that works like that?

geddski11y ago

Not that's open source, but I can do a little write up article and share a sample app that shows how to do it.

kclay11y ago

GordyMD11y ago

I'd personally love to see this. Think it would be very valuable to the community.

1 more reply

sterlingross11y ago

That would be awesome, thank you for considering it!

nileshtrivedi11y ago

Please do. :)

Xorlev11y ago· 5 in thread

Congrats on the 2.0! It's been interesting to watch as a project.

Do you expect that as you stabilize you'll officially support more drivers? Or are you going to leave that as a community effort?

coffeemug11y ago

Slava @ Rethink here.

tshannon11y ago

thethimble11y ago

Ditto regarding a java driver. It seems crazy for a database to not provide native java support.

1 more reply

jwr11y ago

mmrmissing11y ago

1 more reply

notdonspaulding11y ago· 4 in thread

Cool!

Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR is NJ;DU (No Jepsen, Didn't Use)

[0] https://aphyr.com/tags/jepsen

[1] https://github.com/rethinkdb/rethinkdb/issues/1493

coffeemug11y ago

Slava @ Rethink here.

sixdimensional11y ago

This conservative/consistent/responsible approach is one of the reasons I have faith in RethinkDB. You always seem to be taking the time to build it right and that is priceless.

wspeirs11y ago

Another good one for testing distributed systems/databases is blockade: http://blockade.readthedocs.org/en/latest/

danielmewes11y ago

See the issue you mentioned https://github.com/rethinkdb/rethinkdb/issues/1493 for progress on this.

evo_911y ago· 4 in thread

Now if only Meteor would support this all would be good in the world.

GordyMD11y ago

RethinkDB's realtime capabilities would fit perfectly with Meteor.

nileshtrivedi11y ago

How? Meteor's server-side architecture is still oriented around polling the DB, and I believe that's because many apps are still explicit request-response oriented.

GordyMD11y ago

As @imslavko said, when using Meteor with MongoDB (which I believe is the only production ready DB driver) it observes the oplog [1] for changes. You can use the polling observer too though.

[1] https://github.com/meteor/meteor/blob/devel/packages/mongo/o...

[2] https://www.meteor.com/livequery

imslavko11y ago

No, Meteor's server-side architecture uses MongoDB's replication log that is analyzed to get updates.

xtrumanx11y ago· 4 in thread

Have I missed something?

andrewflnr11y ago

jasondc11y ago

Can you be more specific on the original promise MongoDB didn't fulfill?

shockzzz11y ago

It's a nightmare to scale and has performance quirks that are really unexpected. Many, many companies have had to spend enormous amounts of developer time to migrate off of MongoDB to something else.

1 more reply

robertfw11y ago

_dancannon11y ago· 4 in thread

Congratulations, been looking forward to this release for a while!

straik11y ago

I think this a good place to say thank you for you're work on the Go Rethink driver. This is a clear written easy to follow and effective peace of code.

_dancannon11y ago

Thank you very much! I hope to have an update to the Go driver which supports RethinkDB v2.0 within a couple of hours.

nulltype11y ago

I would like to thank you as well! I didn't really have any time to work on rethinkgo after I made the first version, thanks for doing such a good job with gorethink.

_dancannon11y ago

Just released the latest update to the Go driver, it has some pretty big changes including the ability to connect to a RethinkDB cluster + automatic host discovery.

For more information check out https://github.com/dancannon/gorethink/releases/tag/v0.7.0.

expando11y ago· 3 in thread

Selling support is a great non-intrusive business model.

ThinkBeat11y ago

Except that it incentivises a company to build a product that requires continuing support.

That can be a good thing or a bad thing.

coffeemug11y ago

> Except that it incentivises a company to build a product that requires continuing support.

People say this a lot, but in our case we really haven't seen this incentive for a couple of reasons.

Similarly, databases are pretty critical pieces of the infrastructure. If anything goes wrong, it can significantly impact the business, so people always want operational/production support.

Finally, a bad product will significantly limit growth of the company in the long term. There are lots of options now -- you can't get away with building a crappy product and an artificial monopoly.

giaour11y ago

Selling support for terrible (but free!) software is usually known as the "MongoDB model," so it's a proven path to riches in the database market.

dorfsmay11y ago· 3 in thread

Does RethinDB has a concept of transaction? My question is actually about restoring a lost node... If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

coffeemug11y ago

> If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Just the delta. We built an efficient, distributed BTree diff algorithm. When a node goes offline and comes back up, the cluster only sends a diff that the node missed.

> Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

dorfsmay11y ago

> You don't have to do that, it happens automatically

What are the chances of a filesystem snapshot to be consistent enough to be used to prime a crashed node? What about restoring backup files from other nodes?

coffeemug11y ago

There is currently no other way to prime the node -- I hope we don't have to add it. This sort of functionality should work out of the box.

dorfsmay11y ago· 3 in thread

Why would I use RethinkDB instead of OrientDB?

coffeemug11y ago

ScottBurson11y ago

Hi Slava, the FAQ has a typo in the second sentence: "architecutre".

coffeemug11y ago

Thanks -- fixed. Will take a little bit to push the site update live.

1 more reply

mberning11y ago· 2 in thread

For the rubyists out there check out http://nobrainer.io/

sandstrom11y ago

Is anyone using nobrainer in production?

We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for RethinkDB is the main thing holdings us back.

I don't have great insight into nobrainer, but last I checked it seemed like joins wheren't implemented (but on the roadmap).

vdaniuk11y ago

I like rethinkdb and have been successfully using their official ruby and js libraries for some time.

wilsonfiifi11y ago· 2 in thread

_dancannon11y ago

If you have any further questions I would be more than happy to answer them on https://gitter.im/dancannon/gorethink. Thanks!

wilsonfiifi11y ago

Hey no worries I absolutely understand. Apologies for lamenting on the pace/state of your contribution and thanks for your time and effort.

Fauntleroy11y ago· 2 in thread

Now that 2.0 is production ready, will we be seeing some RethinkDB providers? A simple Heroku integration would be amazing for quickly prototyping apps with a new database technology.

jkarneges11y ago

As Slava mentioned, you can use Compose.io. It requires using an SSH tunnel, though, which is a little tricky in Heroku. Here's a tunnel script I made to simplify this:

https://github.com/fanout/leaderboard/blob/master/tunnel.py

In particular, it reads the entire SSH private key as an environment variable, so you don't need to commit the key to the git repository.

coffeemug11y ago

Ciantic11y ago· 2 in thread

I wish they did official TypeScript definition files. I'm a bit wary to rely on huge DB API with community definitions only.

There are reasons to write TypeScript definitions for documentation generation too, if not for the code as TS.

coffeemug11y ago

There is an official spec here: http://rethinkdb.com/docs/writing-drivers/ Not quite TS, but it's well defined and new releases of the spec are carefully managed.

Ciantic11y ago

For me the TS is a tool to ensure my code is not using deprecated API. This is a partly reason why Facebook is also pushing typing to JS with Flow.

Edit: And Guido is pushing it to Python with PEP 484: https://www.python.org/dev/peps/pep-0484/

cachvico11y ago· 2 in thread

Any thoughts about multi-doc transactions?

danielmewes11y ago

It's not currently on our road map.

cachvico11y ago

Thank you for the comment. I was wondering if something along the lines of http://blog.labix.org/2012/08/22/multi-doc-transactions-for-... would be feasible.

gauravphoenix11y ago· 1 in thread

any plans of releasing officially supported Java driver? For most enterprise oriented apps, having officially supported Java driver will be great.

coffeemug11y ago

Yes! No ETA yet, but we're on it.

thoughtpolice11y ago· 1 in thread

I've updated NixOS to include 2.0.0-1: https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e64... - any way we can get it mentioned on the website?

coffeemug11y ago

Could you suggest a pull request in docs? (https://github.com/rethinkdb/docs)

nickstinemates11y ago· 1 in thread

Big fan of RethinkDB. Use it in all of my projects these days.

vonklaus11y ago

What were you using before? What are the pros and cons of the switch?

babo11y ago· 1 in thread

Looking forward to install it from homebrew but it's not there yet. Good to see that for python drivers PIP is already updated!

coffeemug11y ago

It should be out later today. We're working on it now.

cdnsteve11y ago

I'm going to give this a spin out of pure respect for the team that's dedicated 5 years to a product without cashing out. Hats off. Your CEO has some respectable... anatomy.

dkhenry11y ago

BZ rethinkdb team.

kolencherry11y ago

Congrats on the 2.0 release! Changefeeds are an incredibly powerful feature. We're looking forward to the next release with automagic failover!

billclerico11y ago

cookiecat11y ago

Congrats guys, RethinkDB has been a joy to use so far, but the 3rd party .net driver needs some help. I filed an issue here: https://github.com/rethinkdb/rethinkdb/issues/3931

DAddYE11y ago

aioprisan11y ago

the commercial services launch is critical and will speed adoption from large players

nviennot11y ago

Lots of hard work has been poured into this release :)

Congrats to the RethinkDB team!

jmtame11y ago

Congrats Slava, Mike and the rest of the folks at RethinkDB!

covi11y ago

Brilliant name (Yojimbo) and great cover photo there...

ataussig11y ago

Congrats to the RethinkDB team on this huge milestone!