Skip to content

Top Best Ask Show New Jobs

Migrating From MongoDB To Riak At Bump (opens in new tab)

(devblog.bu.mp)

131 pointstimdoug14y ago131 comments

131 comments

49 comments · 8 top-level

salsakran14y ago· 14 in thread

I was reading along and nodding my head until I got to the 1000 line haskell program that handles issues stemming from a lack of consistency.

I'm not exactly a SQL fanboy, but maybe ACID is kinda useful in situations like this and having to write your own application land 1000 liners for stuff that got solved in SQL land decades ago isn't the best use of time?

aphyr14y ago

Indeed, these problems were solved years ago, for single servers. Conflict resolution in distributed systems is a significantly more complex problem, and invariably requires tradeoffs specific to the app. Those 1000 lines are likely declarations like "Merge changes to this list via set union", "Merge changes to this set using 2P-Set CRDTs", and so forth.

cbsmith14y ago

The RDBMS space has been addressing how to do this with distributed systems for quite some time as well (a couple of decades at least), and at least the more sophisticated systems tend to support a fairly broad set of approaches to addressing this problem (and means of expressing those choices quite succinctly).

timdougOP14y ago

It's a development vs. operations tradeoff; we'd rather write code during the day so that when a database machine falls over at night we don't get paged.

salsakran14y ago

I sorta get that, but it's not like Postgres/MySQL/etc are operational nightmares. It's just that going from one barely proven DB to another barely proven DB in a migration that involves a boatload of application side code seems heavy handed from the outside.

coops14y ago

from this answer and your blog post it appears you were not using mongodb replica sets. is this true?

lwat14y ago

Do you mean the physical server crashing? You should have a hot spare replicated machine for that. If you mean the RDBMS failing randomly at night, that really doesn't happen on mature systems like PostgreSQL or MS SQL Server. I mean it could happen but it's as rare as hen's teeth.

wpietri14y ago

MySQL is circa 1 million lines of code.

I like SQL engines for moderate data sets that fit nicely on one machine and well within the normal performance envelope. But even there I will often have to try a few different incantations and cross my fingers that one of them will perform reasonably because that's easier than trying to figure out what that 1 MLOC engine is up to. And I don't know anybody who does very large MySQL setups without a lot more hassle than that.

For some things I'd much rather deal with 1KLOC that I had to write myself than the 1 MLOC that I'm scared to even start digging through.

salsakran14y ago

The question isn't 1MLOC vs 1KLOC.

It's a stable, well understood DB vs an immature, not well understood DB AND 1KLOC to deal with not being consistent.

To be clear, I'm not saying any given DB is the OneTrueWay, just that people seem to be a bit cavalier in regards to some of this crap and chasing the newest shiny thing while rediscovering why some of the braindamage in those 1MLOC was put there in the first place.

tveita14y ago

The Linux kernel has over 15 million lines of code, people normally don't hold it against it. Judging a piece of software by its LOC count is a fallacy.

A project with rigorous error handling and testing will have more LOCs than a corresponding project without.

Some problems are just hard, and you'll want as much code as is necessary to make it secure and performant. Some parts of the code you will never run, but inactive code seldom hurts you.

MySQL has its issues, but none of them would be fixed just by having less code.

gbog14y ago

"MySQL is to database what PHP is to programming languages". Use PostgreSQL.

bfrog14y ago

MySQL is 1 million lines of code and isn't even ACID

See ALTER.

WALoeIII14y ago

Eventual Consistency is a feature, not a limitation of Riak (and friends).

It requires you to think about your application different, but it enables things that you could not do before.

For example, you can now handle databases in multiple datacenters, reducing latency to the client.

salsakran14y ago

Uhm..... no.

This is backwards. Multi-DC capability is a feature. Eventual Consistency is an explicit tradeoff in a desired characteristic (Consistency) to allow other features.

gbog14y ago

It its entertaining to see those weekly stories about NoSQL disasters. Hopefully one or two people will learn one or two things in three process. Let's try one: don't judge technologies on their sex appeal: the SQL old lady will take better care of your data than these young siliconed dolls.

timhaines14y ago· 9 in thread

If you're thinking about using Riak, make sure you benchmark the write (put) throughput for a sustained period before you start coding. I got burnt with this.

I was using the LevelDB backend with Riak 1.1.2, as my keys are too big to fit in RAM.

I ran tests on a 5 node dedicated server cluster (fast CPU, 8GB ram, 15k RPM spinning drives), and after 10 hours Riak was only able to write 250 new objects per second.

Here's a graph showing the drop from 400/s to 300/s: http://twitpic.com/9jtjmu/full

The tests were done using Basho's own benchmarking tool, with the partitioned sequential integer key generator, and 250 byte values. I tried adjusting the ring_size (1024 and 128), and tried adjusting the LevelDB cache_size etc and it didn't help.

Be aware of the poor write throughput if you are going to use it.

makmanalp14y ago

That's strange, that doesn't look like a normal graph to me, it looks like a cache or queue of some sort is backed up. Did you try to use dtrace / iosnoop / iostat etc to see what might be the bottleneck?

For average commodity hardware I found something like 400 reqs/s/node was normalish, even sustained. Yours looks like about 2 minutes in it dies. Come to think of it, could you have your open file descriptors limited in the OS settings? That looks just like pattern I'd expect to see from that.

Might be unrelated but common pitfalls I had were: - Using the HTTP proto. Protobuf is way faster. - You can tweak the r and w values to get less read and write consensus when you can afford to, depending on the task and data. - ulimit open file descriptors might be too low.

In any case, if you were to do a short writeup, I'm sure the basho guys at the mailing list would be interested.

timhaines14y ago

Hey - the Basho guys were aware and reproduced it pretty quickly. They saw the same response from their new bloom filter branch they're introducing soon too.

I was monitoring with iostat and a couple of other tools. It was certainly very heavy on io, with 80% util, 20% iowait, and that increased as the currency went up.

I was using protobuf, and a w value of 1, so I was out of things to optimize.

When I was inserting objects already in Riak's cache, it ran about 3 times faster, but of course that's not possible with new objects.

bfrog14y ago

Riak loves random read/writes, spinny discs do not, try things out with a SSD sometime and watch things go from a shoddy XXX ops/sec to XXXX(X) ops/sec.

As a simple remark on this, I've gotten 1000+ ops/sec on a single machine operating as 3 nodes (equating to about 3000 ops/sec per node) when using an SSD and a measly 150 ops/sec with a spinny disc in the same setup (equating to about 450 ops/sec per node)

AaronBBrown14y ago

Bitcask is specifically designed around not doing random I/O, particularly for writes. A bitcask back end is essentially a gigantic sequential transaction log.

moonboots14y ago

While SSDs will undoubtedly be faster that spinning disk, LevelDB is designed to address slow random writes by batching and writing sequentially.

fsckin14y ago

Thanks for mentioning Basho Bench. Looks slick. For anyone else interested, it's at: http://wiki.basho.com/Benchmarking.html

timhaines14y ago

The benchmarking tool is very slick. Easy to configure for a variety of scenarios, and once you figure out how to install R it produces those pretty graphs.

rb2k_14y ago

I had the same experience about throughput being a bit sub-par. For me it was a test on a single macbook pro with a regular 2.5" hdd. Which client did you use to write to riak? protobuf or http? Also: which language? did you use threading? Did you enable search?

timhaines14y ago

Well, for the benchmark, I was using Basho's benchmarking tool which is erlang, and I was testing with protobuf. I had 5 concurrent clients running for the benchmark, but also tried with more and less, and got about the same results.

Search wasn't in use on the test bucket.

For my app, I'd integrated Riak using ruby.

tlianza14y ago· 7 in thread

I find these kinds of stories interesting, but without some feel for the size of the data, they're not very useful/practical.

I've heard of Bump, and used it once or twice, but I don't actually know how big or popular it is. If we're talking about a database for a few million users, only a tiny percentage of which are actively "bumping" at any time, it's really hard for me to imagine this is an interesting scaling problem.

Ex. If I just read an article about a "data migration" who's scale is something a traditional DBMS would yawn at, the newsworthiness would have to be re-evaluated.

yahelc14y ago

They celebrated 80 million installations as of 2 months ago (up from 50 million 8 months ago). http://blog.bu.mp/introducing-bump-pay-a-new-project-and-app...

That's a growth rate of 5 million installs a month; if they kept up that pace, they're at 90 million installs.

To put that in perspective, Instagram "only" has 50 million users. http://www.quora.com/Instagram/How-many-users-does-Instagram...

More bump data here: http://bu.mp/static/images/infographic_9-2011_6.pdf

I'm not a user, but it seems like they have serious data.

heretohelp14y ago

Even at 90 million users, with anything approaching a reasonable level of activity, we're not talking about serious data.

90 million rows of denormalized data isn't a big deal, and if I had to guess, their ops per second is probably no higher than what a dedicated single, or maybe a small master-slave postgres deployment could handle.

Again, something a DBA would yawn at.

And I say this as someone who scaled up an API for a service that plugged into multiple ad networks concurrently for a total of billions of impressions per month with a high level of reliability. Using NoSQL and an RDBMS combined.

People who want to preach the NoSQL message should probably have some actual experience. Otherwise, it just makes very viable NoSQL solutions look really bad.

timdougOP14y ago

I'm not sure what exactly qualifies as respectable scale, but the Mongo master was running out of space and IO capacity with 24 SSDs and 90 million user records, and was replaced by a sixteen-node Riak cluster.

I'll happily share any other statistics you're interested in.

Edit: the Riak cluster actually contains lots of other data (communications, object metadata, etc.); we didn't need sixteen boxes for the user records.

tlianza14y ago

90 million users is a great datapoint, yes! In my book that's more than respectable.

The only other stat that I'm curious about is the total size of the DB. Certainly databases with tens of millions of records can be held completely in RAM these days... but that also depends on how big each record is.

jonshea14y ago

To be clear, you are comparing a sharded Riak cluster against an unsharded MongoDB installation?

cies14y ago

i was sort of hoping you'd tell us why you wanted to move away from mongo.. but it was your mongo master deamon that was the bottle neck; right?

polynomial14y ago

I was actually going to make a joke about "if the number of people I know who actually use Bump is any indication, it's not clear they even need a large data store."

stephen14y ago· 6 in thread

> During the migration, there were a number of fields that should have been set in Mongo but were not

Imagine that...this fascination with schema-less datastores just baffles me:

http://draconianoverlord.com/2012/05/08/whats-wrong-with-a-s...

I'm sure schema-less datastores are a huge win for your MVP release when it's all greenfield development, but from my days working for enterprises, it seems like you're just begging for data inconsistencies to sneak into your data.

Although, in the enterprise, data actually lives longer than 6 months--by which time I suppose most start ups are hoping to have been bought out.

(Yeah, I'm being snarky; none of this is targeted at bu.mp, they obviously understand pros/cons of schemas, having used pbuffers and mongo, I'm more just talking about how any datastore that's not relational these days touts the lack of a schema as an obvious win.)

timdougOP14y ago

Yeah -- that's one of the huge benefits of marking a field as ``required'' in a protobuf. The ability to enforce a contract prevents a ton of unexpected and incomplete data making it on disk (and also, e.g., across the wire to clients). Having strict types represented in the serialization format is also handy; when one pulls out an int32 from a protobuf it's going to be an int32, and not an integer that somehow found its way into being a string.

lucaspiller14y ago

Could you elaborate more on how you have used Protobuffs, as I'm not sure I fully understand. I've previously used Riak in Erlang and Ruby projects, so am fairly familiar with how it works. It exposes a HTTP and Protobuffs API which allows you to store objects of arbitrary types (JSON, Erlang binary terms, Images, Word Documents, etc). From the sounds of it you are serialising a Protobuffs packet and sending this as the content of the object. Why did you choose this, over say JSON, which MongoDB uses?

gizzlon14y ago

Sure, but it could still be wrong in sooo many other ways.. (not arguing either way, just saying ;)

wpietri14y ago

Data inconsistencies will always sneak in unless you're vigilant.

SQL data stores provide a way to limit certain kinds of inconsistency, but a) I rarely see a system that uses all of that power, and b) there are plenty of inconsistencies that you can't prevent with standard SQL features.

Personally, I'm ok with schema-less stores in the same way I'm ok with saving files on disk. I don't expect my filesystem to enforce application-level file format quality. I just expect it to store things and give them back when I ask. That doesn't mean I don't care about data integrity, it just means I solve the problem somewhere else in the system.

wslh14y ago

Are NoSQL databases to SQL as dynamic to static typing?

bsg7514y ago

Perhaps in some contexts: http://momjian.us/main/blogs/pgblog/2012.html#January_27_201...

_Lemon_14y ago· 4 in thread

I have decided on wanting to use riak as well. I was wondering if anyone had examples of how they used it with their data model?

For example this article mentions "With appropriate logic (set unions, timestamps, etc) it is easy to resolve these conflicts" however timestamps are not an adequate way to do this due to distributed systems having partial ordering. The magicd may be serialising all requests to riak to mitigate this (essentially using the time reference of magicd) in which case they're losing out on the distributed nature of riak (magicd becomes a single point of failure / bottleneck).

Insight into how others have approached this would be awesome.

reiddraper14y ago

There are a several ways to approach this. The simplest is to just take last-write-wins, which is the only option some distributed databases give you. For cases where this isn't ideal, you resolve write-conflicts in a couple ways.

One way is to write domain-specific logic that knows how to resolve your values. For example, your models might have some state that only happen-after another state, so conflicts of this nature resolve to the 'later' state.

Another approach is to use data-structures or a library designed for this, like CRDTs. Some resources below:

A comprehensive study of Convergent and Commutative Replicated Data Types http://hal.archives-ouvertes.fr/inria-00555588/

https://github.com/reiddraper/knockbox https://github.com/aphyr/meangirls https://github.com/ericmoritz/crdt https://github.com/mochi/statebox

stock_toaster14y ago

Are there any connector libs that provide "simple" last-write-wins out of the box?

biot14y ago

Unless I'm missing something, I would assume they run magicd on all servers that run the application. Thus Riak's degree of redundancy is independent of magicd's degree of redundancy since each instance of magicd can communicate to the entire Riak pool.

timdougOP14y ago

Yep! This is exactly how it works. Each app node runs a magicd which connects to an haproxy instance on localhost (connected to every machine in the database cluster), so when a Riak node goes down we don't miss a beat.

clu314y ago· 1 in thread

@timdoug, could you share specific problems with Mongo that made|forced you switch to Riak please? "Operational qualities" are little vague

timdougOP14y ago

We experienced some significant difficulties with sharding; the automatic methods of doing so only seemed to shard a single-digit percentage of our data. We've also encountered some wildly unexpected issues with master/slave replication and related nomination procedures.

You're right that this post is vague with regard to those details; they would be a good candidate for a future blog post, but the desired takeaway from this one is that we're quite pleased with the performance and scalability that Riak provides.

gizzlon14y ago

Would be interesting to see a follow up in 6 months or so..

It doesn't seem fair to compare [old tech] with [new tech] when you've felt all the pitfalls with one but not the other.

supo14y ago

Random thought on proto buffers: OP is advocating using the "required" modifier for fields and touting it as an advantage in comparison to JSON. I would move the field value verification logic to the client, because it can cause backwards compatibility problems if you un-require it.

j / k navigate · click thread line to collapse