Redis as the primary data store (opens in new tab)

(moot.it)

106 pointscourtneycouch013y ago38 comments

38 comments

34 comments · 16 top-level

macspoofing13y ago· 3 in thread

Total overkill, especially given that replicated RAM storage is quite pricey when compared to traditional alternatives. Ideally, you just use Redis to store (cough cache cough) latest forum threads/comments and have everything else on disk (maybe in a RDBMS?). You get performance and price.

EDIT:

>Another of those tricks has to do with the fact that nearly half of our code is also written in LUA running directly on Redis.

Urgh. Ugly.

courtneycouch0OP13y ago

Agreed that it's overkill for most use cases (in fact I mention that it's not ideal for most people and we're likely an edge case).

As far as putting a great deal of code into Lua scripts, we'll be putting together a few blog posts explaining how we make this manageable. We were able to double our API throughput by shifting this direction so for us this is worth the difficulty we faced in finding ways to modularize and reuse code in Redis Lua scripts.

macspoofing13y ago

What's the edge case with you guys? It is cool to use Redis and come up with exotic architectures for common use cases (is there anything more common than forum threads and comments?). However, there will come a point where it just won't make sense to pay 10x more (in hardware and maintenance) than you need to, to store old, rarely read posts.

1 more reply

dscrd13y ago

Wait, does that mean that 50% of their code has been implemented twice? Why would they need to do that? Why not just keep the Lua version?

roskilli13y ago· 3 in thread

I'd be really interested in seeing if you benchmarked fsync at every query vs your fsync every second policy.

My main beef with fsync every second is just that you will never, ever get a "this is what the server looked like when it went down". If the fsync at every query was only worse by a relatively small factor, and if you used write transactions for the majority of your writes (meaning fewer times needed to fsync for every query) which I'm guessing you are to protect integrity on writes, I don't see why this wouldn't be more appealing than fsync at every second?

anveo13y ago

Both of those settings might be somewhat problematic with stock EBS volumes. We run everysecond and very frequently see "Asynchronous AOF fsync is taking too long" because EBS can't keep up. The problem is when that happens Redis is blocking connections and exceptions pop up from the clients.

A work around so far is to sync every 60 seconds on the master, and more frequently on the slaves. Another option might be to bump up the IOPS on the volume, but I believe that still isn't available on medium instances (which we are using as well).

devd13y ago

Just use the instance storage instead of EBS. Then write a script to move the AOF file from instance storage to EBS/S3

courtneycouch0OP13y ago

To be honest, we have not played with fsync every second. Redundant persistence servers across availability zones gives me enough comfort that my worry about losing that 60 seconds of data is rather low.

I suspect we will play with this over time and honestly you have me curious now too what the throughput differential would be for individual persistence servers.

sehugg13y ago· 2 in thread

We've been using Redis as a primary data store for about two years, and it works great. We have a simple master-slave config and do periodic RDB snapshots -- no EBS. We manually failover from master to slave, but in practice this is rarely necessary (our current master has been up for 9 months).

iampims13y ago

Do you mind sharing how much data you are storing in Redis?

sehugg13y ago

At our peak we stored up to 7 GB on Redis 2.4 (using up to 20 GB or so RAM due to paging). Redis 2.6 reduced that figure by a couple GB or so.

1 more reply

raverbashing13y ago· 2 in thread

Redis as primary data source? Good

If it fits your ram of course (and no, swap space is not RAM, just don't)

But you can organize yourself, putting bigger data in the FS for example, and it should be ok.

The only issue with Redis is that it's much 'lower level' than other DBs so don't expect to do a 'SELECT * where condition' out of the box.

devd13y ago

Well, Redis data structures like lists, sets, hashes and sorted sets are great! You don't have to switch context and think in terms of SQL while programming.

raverbashing13y ago

Yes, they are great.

But someone may initially think that Redis can find all items that have 'cactus' in the text, which is not true.

One could say Redis is 'almost' a DB (as per the common conception of a db like MySQL, etc) and more like a 'build your own db' kit.

mjs13y ago· 2 in thread

Interesting, this is the first technology blog-like site I've seen that does not have an RSS feed...

JLehtinen13y ago

A couple of days back that was just a page with a story, now there's two, so I guess an RSS is now a must. Coming soon...

mjs13y ago

Yes please! I'd like to subscribe.

rb2k_13y ago· 1 in thread

> The API servers that we are able to push this load with cost a mere $90/month

That would be about an EC2 m1.medium

It would be interesting to see how much it costs to run the whole cluster. I'd also love to know which part of the redis featureset they really use. Redis is great, but I think a lot of other database backends will give comparable performance when allowed to store all their data in RAM (MongoDB, Postgres, Riak, Cassandra, ...). The advantage of these (especially Riak/Cassandra) would be that for pure key/value semantics, they take care of all of the annoying operations overhead like rolling updates.

courtneycouch0OP13y ago

Your post made me notice an error on there. I was using our monthly cost and forgot to add in the reserved instance one time cost. Just added an edit to the post.

We are big fans of Lua (corrected typo) and much of the load is set, zset and bit operations. It's all the multikey operations (and the throughput we can push of those) that make Redis work for us.

Since we are pretty familiar/comfortable with automating infrastructure, dealing with clusters wasn't a big hurdle for us. This really wasn't a major deciding factor on our infrastructure design. Good tools and good Chef cookbooks make managing pools of servers relatively straightforward.

aphyr13y ago· 1 in thread

If you are envisioning network partition scenarios where perhaps the master is isolated from the slaves, this is minimized by replication checks to slaves (set an arbitrary key to an arbitrary value and check if the slaves update). If a master is isolated we block writes: Consistent and Partition tolerant at the cost of Availability.

Could you describe these checks in more detail, please?

courtneycouch0OP13y ago

Basically it goes something like this (oversimplifying):

hset('shard.healthcheck', checkId, token) wait 500ms on every slave hget('shard.healthcheck', checkId)

Verify the tokens match.

Slaves are removed from the pool when the tokens don't, and allowed back in when the tokens match. Writes to master is disabled if a x/2 slaves are not available (where x is the number of slaves).

DoubleCluster13y ago· 1 in thread

How much does maintaining this infrastructure cost? I'd guess that spending a bit more on (virtual) hardware would have been a much cheaper solution. Only when the service proves to be popular and you want to save money you should invest in an architecture like this.

courtneycouch0OP13y ago

I agree that for most people a setup like this would not be cost effective. Our model only works at scale so it made sense to build for scale from the beginning. This kind of setup is likely cost prohibitive for most without some funding behind them.

The drawback as I mentioned is that the patterns we found necessary are hard to retrofit onto an existing application... therein lies the rub.

devd13y ago· 1 in thread

How do you delete data from redis ? Eg. Let's say that a customer no longer wants an account, and you want to delete all the keys related to an account. Do you manually write delete key statements ?

pjscott13y ago

That depends on how you've laid out your data in Redis.

yyqux13y ago· 1 in thread

I wonder how this will work long-term as you accumulate an archive of old rarely-accessed data.

raverbashing13y ago

What you could do is set the expiration time on the keys, then periodically move the ones with a small time to live to the disk

luser00113y ago· 1 in thread

Can anybody explain if SSDs would have worked for this case?

luney13y ago

Short Answer: No

Here is the redis author's blog post about that: http://antirez.com/news/52

My guess is that SSD's would be beneficial in ensuring the Append-Only-File and RDB snapshots are faster.

davecap113y ago

I've been using Redis Cloud by Garantia Data for the past few months. They have automated infrastructure for (what I assume are) sharded Redis DBs, and they have a super reasonable pay-as-you-grow plan. http://redis-cloud.com/

They claim to support Redis DBs of "unlimited size"... until they run out of ram in the cloud :)

jamescun13y ago

If the dataset fits in ram, doesn't need advanced querying and there is a certain level of guaranteed persistence, then why not?

1 more reply

dev36013y ago

Interesting. How do you rebalance your shards in the event that a new server is added to the keyring and your keys shift?

rustyrazorblade13y ago

I built out a connection pool for the Python redis client for these types of setups where you'd want master / slave failover. It's meant to be paired with something like sentinel or custom failover scripts. https://github.com/StartTheShift/jondis

sebastianavina13y ago

talk about abusing a platform

j / k navigate · click thread line to collapse

38 comments

34 comments · 16 top-level

macspoofing13y ago· 3 in thread

EDIT:

>Another of those tricks has to do with the fact that nearly half of our code is also written in LUA running directly on Redis.

Urgh. Ugly.

courtneycouch0OP13y ago

Agreed that it's overkill for most use cases (in fact I mention that it's not ideal for most people and we're likely an edge case).

macspoofing13y ago

1 more reply

dscrd13y ago

Wait, does that mean that 50% of their code has been implemented twice? Why would they need to do that? Why not just keep the Lua version?

roskilli13y ago· 3 in thread

I'd be really interested in seeing if you benchmarked fsync at every query vs your fsync every second policy.

anveo13y ago

devd13y ago

Just use the instance storage instead of EBS. Then write a script to move the AOF file from instance storage to EBS/S3

courtneycouch0OP13y ago

I suspect we will play with this over time and honestly you have me curious now too what the throughput differential would be for individual persistence servers.

sehugg13y ago· 2 in thread

iampims13y ago

Do you mind sharing how much data you are storing in Redis?

sehugg13y ago

At our peak we stored up to 7 GB on Redis 2.4 (using up to 20 GB or so RAM due to paging). Redis 2.6 reduced that figure by a couple GB or so.

1 more reply

raverbashing13y ago· 2 in thread

Redis as primary data source? Good

If it fits your ram of course (and no, swap space is not RAM, just don't)

But you can organize yourself, putting bigger data in the FS for example, and it should be ok.

The only issue with Redis is that it's much 'lower level' than other DBs so don't expect to do a 'SELECT * where condition' out of the box.

devd13y ago

Well, Redis data structures like lists, sets, hashes and sorted sets are great! You don't have to switch context and think in terms of SQL while programming.

raverbashing13y ago

Yes, they are great.

But someone may initially think that Redis can find all items that have 'cactus' in the text, which is not true.

One could say Redis is 'almost' a DB (as per the common conception of a db like MySQL, etc) and more like a 'build your own db' kit.

mjs13y ago· 2 in thread

Interesting, this is the first technology blog-like site I've seen that does not have an RSS feed...

JLehtinen13y ago

A couple of days back that was just a page with a story, now there's two, so I guess an RSS is now a must. Coming soon...

mjs13y ago

Yes please! I'd like to subscribe.

rb2k_13y ago· 1 in thread

> The API servers that we are able to push this load with cost a mere $90/month

That would be about an EC2 m1.medium

courtneycouch0OP13y ago

Your post made me notice an error on there. I was using our monthly cost and forgot to add in the reserved instance one time cost. Just added an edit to the post.

We are big fans of Lua (corrected typo) and much of the load is set, zset and bit operations. It's all the multikey operations (and the throughput we can push of those) that make Redis work for us.

aphyr13y ago· 1 in thread

Could you describe these checks in more detail, please?

courtneycouch0OP13y ago

Basically it goes something like this (oversimplifying):

hset('shard.healthcheck', checkId, token) wait 500ms on every slave hget('shard.healthcheck', checkId)

Verify the tokens match.

Slaves are removed from the pool when the tokens don't, and allowed back in when the tokens match. Writes to master is disabled if a x/2 slaves are not available (where x is the number of slaves).

DoubleCluster13y ago· 1 in thread

courtneycouch0OP13y ago

The drawback as I mentioned is that the patterns we found necessary are hard to retrofit onto an existing application... therein lies the rub.

devd13y ago· 1 in thread

How do you delete data from redis ? Eg. Let's say that a customer no longer wants an account, and you want to delete all the keys related to an account. Do you manually write delete key statements ?

pjscott13y ago

That depends on how you've laid out your data in Redis.

yyqux13y ago· 1 in thread

I wonder how this will work long-term as you accumulate an archive of old rarely-accessed data.

raverbashing13y ago

What you could do is set the expiration time on the keys, then periodically move the ones with a small time to live to the disk

luser00113y ago· 1 in thread

Can anybody explain if SSDs would have worked for this case?

luney13y ago

Short Answer: No

Here is the redis author's blog post about that: http://antirez.com/news/52

My guess is that SSD's would be beneficial in ensuring the Append-Only-File and RDB snapshots are faster.

davecap113y ago

They claim to support Redis DBs of "unlimited size"... until they run out of ram in the cloud :)

jamescun13y ago

If the dataset fits in ram, doesn't need advanced querying and there is a certain level of guaranteed persistence, then why not?

1 more reply

dev36013y ago

Interesting. How do you rebalance your shards in the event that a new server is added to the keyring and your keys shift?

rustyrazorblade13y ago

sebastianavina13y ago

talk about abusing a platform

j / k navigate · click thread line to collapse