How is RethinkDB licensed?
The RethinkDB server is licensed under the GNU Affero General Public License v3.0. The client drivers are licensed under the Apache License v2.0. http://rethinkdb.com/faq/
Firstly, as Daniel pointed out, RethinkDB is licensed under AGPL. An acquirer wouldn't have the legal means to close the source code, and with over 700 forks on GitHub they also couldn't do it practically.
But beyond licensing, consider our personal motivations. We've been working on RethinkDB for five years, and had quite a few opportunities to sell the company. We turned them all down because we really believe in the product. The world is clearly moving towards realtime apps, and we feel it's extremely important for open realtime infrastructure to exist. It's easy for people to make promises about the future, but consider this from a game-theoretic point of view. If we wanted to sell, we could have done it long ago. I know it's not a guarantee, but hopefully it's a strong signal to help with your decision.
(Also, there are lots of really interesting companies building products on RethinkDB that we can't talk publicly about yet. It would be silly to sell given that momentum)
At some point people need bread and butter, so I'm curious where that's going to come from :)
<3 Rethink.
They had only open-sourced the SQL-Layer on top of their key/value store and it's still available on Github. The reason: They build it based on open-source code
When someone deletes a public repository on Github, one fork remains as the new master. (Here's FoundationDB's SQL-Layer: https://github.com/louisrli/sql-layer)
So: RethinkDB will stay, even if someone tries to pull the plug. Just fork them on Github. :)
The cases in which the community forks a project licensed with a copyleft license (like LibreOffice) has to do with insatisfaction with the direction in which the company that owns the original trademarks is leading said project. There's no risk of closing the source code.
(BTW the Open Source Institute was unable to get a trademark for "open source" so it doesn't matter that they approved it.)
Now, one can be in agreement with the idea that the four freedoms are important, especially as we increasingly live in a world where software is not only convenient, but necessary in our daily lives -- but the idea with the AGPL, and why it is needed for server software -- is pretty clear.
Rough numbers you can expect for 1KB size documents, 25M document database: 40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across nodes.
We should be able to get the report out in a couple of days.
The last time I tried with 1.16, I gave up my testing when even the simplest aggregation query (count + group by with what should be a sequential, streaming scan) took literally minutes with RethinkDB, compared to <1s with PostgreSQL. Rethink coredumped before I gave it enough RAM, after which it blew up to around 7GB, whereas Postgres uses virtually no RAM, mostly OS buffers.
Would you mind writing me an email with your query or opening an issue at https://github.com/rethinkdb/rethinkdb/issues (unless you have already?)? I'd like to look into it to see how we can best improve this.
We're planning to implemented a faster count algorithm that might help with this (https://github.com/rethinkdb/rethinkdb/issues/3949), but it's not completely trivial and will take us slightly longer to implement.
I realize these numbers alone are still not very meaningful and there are many remaining questions (size and structure of the data set, exact queries performed etc). Rest assured that all of these details will be mentioned in the actual performance report that should be up soon.
Thank you for the fantastic product, by the way! :)
One way to improve writes is to batch them, an example is here.
https://github.com/dancannon/gorethink/blob/master/benchmark...
I believe rethinkdb docs state that 200 is the optimum batch size.
Another way is to enable the soft durability mode.
http://rethinkdb.com/api/javascript/insert/
"In soft durability mode RethinkDB will acknowledge the write immediately after receiving and caching it, but before the write has been committed to disk."
https://github.com/dancannon/gorethink/blob/master/benchmark...
Obviously your business requirements come into play. I prefer the Hard writes because my data is important to me but I do insert debug messages using soft writes in one application I have.
*Edit: Heh I forgot to mention, on my Macbook Pro I was getting 20k w/s while batching and using soft writes.
Individual writes for me are hovering around 10k w/s on the 8 cpu 24gb instance i have. But yeah, define your business reqs then write your own benchmarks and see if the need is met.
Many devs write benchmarks in order to be the fastest and not the correctest. Super lame.
I also really like the ability to do joins, where before in Mongo I would have to handle data joins in the app level.
I'm kind of enamoured with the idea of couchapps -- but I'm still not entirely comfortable with having my db be my web and app server, as well as having it manage passwords etc... as I'm reading up, I'm slowly convincing myself it's possible to both make it work, be easy, support a sane level of TLS, load balance and be secure with proper ACL support... but very few tutorials/books seem to really deal with that to a level that brings me confidence.
Do you expect that as you stabilize you'll officially support more drivers? Or are you going to leave that as a community effort?
We're planning to take the most well-supported community drivers under the RethinkDB umbrella (assuming the authors agree, of course). It will almost certainly be a collaboration with the community, but we'll be contributing much more to the community drivers, supporting the authors, and offering commercial support for these drivers to our customers.
I've started to look into RethinkDB in the past, and I'm very interested in the features it claims. However, I only have so much time to investigate new primary storage solutions, and our team has been burned in the past by jumping too quickly on a DB's bandwagon when the reliability, performance, or tooling just wasn't there.
As of late, we've come to rely on Aphyr's wonderful Call Me Maybe series[0] as a guide for which of a DB's claims are to be trusted and which aren't. But even when Aphyr hasn't tested a particular DB himself, some projects choose to use his tool Jepsen to verify their own claims. According to at least 1 RethinkDB issue on Github, RethinkDB still hasn't done that[1].
Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR is NJ;DU (No Jepsen, Didn't Use)
This is a great point, and we're on it! We have a Raft implementation that unfortunately didn't make it into 2.0 (these things require an enormous amount of patient testing). The implementation is designed explicitly to support robust automatic failover, no interruptions during resharding, and all the edge cases exposed in the Jepsen tests (and many issues that aren't).
This should be out in a few months as we finish testing and polish, and will include the results of the Jepsen tests. (It's kind of unfortunate this didn't make it into 2.0, but distributed systems demand conservative treatment).
See the issue you mentioned https://github.com/rethinkdb/rethinkdb/issues/1493 for progress on this.
RethinkDB's realtime capabilities would fit perfectly with Meteor.
You can find out more about the LiveUpdate core project of Meteor on their site [2] - it basically says the implementation of Live Updates for each db driver is independent to what the db is capable of. Specific mention of RethinkDB and Firebase is made as DBs that are built with making realtime data something that you get for relatively little work.
[1] https://github.com/meteor/meteor/blob/devel/packages/mongo/o...
Have I missed something?
For more information check out https://github.com/dancannon/gorethink/releases/tag/v0.7.0.
That can be a good thing or a bad thing.
People say this a lot, but in our case we really haven't seen this incentive for a couple of reasons.
Large organizations are more than happy to pay for training and development support to accelerate their time to market. It doesn't matter how polished your product is -- databases are complex enough that people are willing to pay for best practices, training, and support.
Similarly, databases are pretty critical pieces of the infrastructure. If anything goes wrong, it can significantly impact the business, so people always want operational/production support.
There are many enterprise services that can be built on top of the product that can be very valuable. You don't have to build a crappy product -- there are plenty of ways to monetize with a great product.
Finally, a bad product will significantly limit growth of the company in the long term. There are lots of options now -- you can't get away with building a crappy product and an artificial monopoly.
If you see a crappy product from a company that offers subscription support, it's probably not because of misaligned incentives. Building databases is really hard, I don't think the business model has much to do with it.
Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?
Just the delta. We built an efficient, distributed BTree diff algorithm. When a node goes offline and comes back up, the cluster only sends a diff that the node missed.
> Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?
You don't have to do that, it happens automatically. You can have full visibility and control into what's happening in the cluster -- check out http://rethinkdb.com/docs/system-tables/ for details on how this works.
Well, in a past life, I used another store that did that automatically, the issue with that is that EITHER it kills the cluster because of read-congestion as it re-builds the "new" node, OR, if you limit the bandwidth for node-building, it takes for ever and a half to rebuild a node which means that you are exposed with one less shard of what was on that node.
What are the chances of a filesystem snapshot to be consistent enough to be used to prime a crashed node? What about restoring backup files from other nodes?
There is currently no other way to prime the node -- I hope we don't have to add it. This sort of functionality should work out of the box.
We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for RethinkDB is the main thing holdings us back.
I don't have great insight into nobrainer, but last I checked it seemed like joins wheren't implemented (but on the roadmap).
Nobrainer orm wasn't fun though, too many edge cases that interfere with activerecord and rails conventions. Going a bit on a tangent, after many experiments I've developed a strong conviction that pg is the best database choice for rails, especially with the jsonb datatype included in 9.4. It is the best of two worlds: reliable, proven sql db that plays really well with Rails and has nosql capabilities, including indexing and quering. So good. Ymmv.
If you have any further questions I would be more than happy to answer them on https://gitter.im/dancannon/gorethink. Thanks!
https://github.com/fanout/leaderboard/blob/master/tunnel.py
In particular, it reads the entire SSH private key as an environment variable, so you don't need to commit the key to the git repository.
There are reasons to write TypeScript definitions for documentation generation too, if not for the code as TS.
Edit: And Guido is pushing it to Python with PEP 484: https://www.python.org/dev/peps/pep-0484/
It's inherent problem with dynamic languages, you have to read all new release documents and migrate your code. With typed code I at least can be somewhat sure I'm not using deprecated calls and such just by compiling.
Even though there are some well-researched algorithms for it, actually implementing transactions in a distributed system is pretty hard. It also comes at significant performance costs, which would interfere with our goal of easy and efficient scalability.
BZ rethinkdb team.
Congrats to the RethinkDB team!
Is windows support coming anytime?