Having fun with Redis Replication between Amazon and Rackspace (opens in new tab)

(3scale.github.com)

79 pointssolso13y ago38 comments

This is the story of the latest and biggest issue that we have had with Redis. 3scale has a three layer fail-over on different data-centers (Amazon and Rackspace). Turns out that if your Redis write throughput is above 20 Mbps you better watch out or your master will silently crash due to an out of memory. Redis fault? Nope, a simple network I/O bottleneck that was quite a pain to track down. The solution? Compression. Clean, simple and it will save you some (hundreds) of bucks on your Amazon bill.

Having fun with Redis Replication between Amazon and Rackspace

(3scale.github.com)

79 pointssolso13y ago38 comments

38 comments

29 comments · 8 top-level

semiquaver13y ago· 6 in thread

It's hard to see running a critical service over an SSH tunnel as a long-term solution. Even with monit and autossh guarding the connection, SSH isn't exactly rock-solid when used this way.

If the redis protocol is so compressible, maybe this behavior should be integrated as a protocol option like [1]?

[1] http://dev.mysql.com/doc/refman/5.5/en/replication-options-s...

antirez13y ago

A lot of redundancy can be eliminated, and we have good past experience (and results) with this approach because we do it in the RDB format used for storage.

For the replication link it's easy to imagine a simple way to encode command names as operation IDs, the same for prefixed length stuff. So now you have:

    *3
    $3
    SET
    $3
    foo
    $3
    bar

But this may become like: <set-opcode><2><3>foo<3>bar where the opcode is 1 byte, and the rest of the prefixed lengths are variable depending on the argument size, but most of the times 1 or two bytes. This of course is much more compact.

However it's not bad to have the current format to be exactly like the AOF, and the client protocol itself, so everything is the same currently... and is future/backward compatible without issues. But well performances always have some price.

pepve13y ago

It's hard to see running a critical service over an SSH tunnel as a long-term solution.

Since I'm tempted to set this up myself (not for Redis though), what is the alternative?

__alexs13y ago

vtund http://vtun.sourceforge.net/

I've only used it for tunnelling some traffic out of my home network to a VPS but it's been rock solid for me over several years of frequent use.

1 more reply

jevinskie13y ago

OpenVPN advertises LZO compression but it seems to be poorly documented (just an --comp-lzo argument in the manpage). Is OpenVPN considered "enterprise" enough for this sort of application, even disregarding the compression support?

1 more reply

solsoOP13y ago

that's a good point... as mentioned in the post we use ssh tunnels and have experienced quite a few glitches, but in this case it has been running without a hiccup for 3 weeks (proof of nothing of course).

I suspect (guess) that not having an idle connection help the tunnel to not randomly drop.

semiquaver13y ago

Can you talk more about what the glitches were like? We're a few months away from production with our first product that uses redis, so this is very interesting.

Have you simulated a network partition? How does redis handle the reconnect?

2 more replies

peterwwillis13y ago· 4 in thread

If you're running out of memory, one would think adding more swap would be the best way to offload the overhead. Hacks aside, if you run into a similar problem again, letting the VM take over will save you time to keep replicating until your load settles down.

Also, if you're using ssh and you need maximum performance you should probably be using an arcfour cipher (`ssh -c arcfour`, alternatives are arcfour128 and arcfour256). I often see a 10x increase in bandwidth using these potentially less-secure algorithms, and slightly lower latency. If you need to save even more bandwidth, a UDP-based SSL VPN may improve things (depending on distance; long links are of course notoriously horrible) or SCTP patched into SSH as a compromise (I haven't played with this yet).

If you're not CPU bound, lzma might gain you bigger savings with marginally higher CPU if you need it down the road.

(I'd also like to take this moment to point out to the Redis devs that 'optimization' of the protocol could have prevented this hack from being necessary until much later. And to the admins of the systems, that monitoring metrics of network bandwidth could have shown the immensity of the traffic for replication. But hindsight is 20/20; let's just remember these examples!)

alexkus13y ago

Surely the fastest encryption scheme would be none at all? It's times like this that it would be nice if ssh still supported the equivalent of "-c none" (or was it "-e none" in v1 ssh?).

Of course, this is based on the assumption that the OP was happy to run their redis sync over the Internet without any other form of encryption. Either the underlying protocol provides encryption or they aren't concerned about their traffic being snooped or MITM'd (albeit very unlikely between two large providers) and so adding extra encryption on top is just an unnecessary CPU overhead.

I've run plenty of (production) stuff over ssh tunnels but it was always nervy with it and happy when the deployment was revamped to avoid the use of ssh tunnels completely.

I'd much rather each underlying application was able to provide tunable encryption and compression itself (using existing trusted libraries such as openssl) as part of its protocol. ssh tunnels are a kludge not a viable production solution (IMHO).

peterwwillis13y ago

Yes, for a real production solution you'd want some kind of hardened VPN, though I recommended UDP or SCTP based transport for less overhead. But hacks are hacks, and since it's just replication, ssh won't break anything that isn't already broken when it fails.

I would never recommend anyone tunnel anything unencrypted over the internet, especially a database. Arcfour is incredibly fast, comparable to 'none' encryption. RFC4345 specifies the main known attack on arcfour is password auth, so using keys may keep it relatively safe (and using arcfour256 for the longest key).

It's best to assume developers will not implement encryption correctly no matter what library they link to. Always consider how highly vetted the application has been first, and how long it's been around without major issues.

jevinskie13y ago

HPN-SSH patches -c none back into the server and client, among other patches that help if you have a long and fat pipe. [0]

[0]: http://www.psc.edu/index.php/hpn-ssh

sjs13y ago

Adding more swap may not help depending in the workload. If you're using Resque then you need to process jobs faster than they are queued. In that case you also need to ensure jobs are queued on a new instance while the old overloaded one drains (very slowly).

antirez13y ago· 3 in thread

With a 2.6 system this problems will be simpler to detect, because at this point we have explicit (configurable) limits for clients, of type normal, pubsub, slaves, so the connection is dropped if the amount of pending data goes over a given limit (and the event logged).

This added a bit of complexity to a few code paths that now require to estimate the size of objects, data, and so forth, but I think this will pay back in the long run.

This blog post makes me wonder if we should implement compression in the replication link as a feature in the future, or at least binary encoding. In short there is a plan for a binary version of AOF / replication-link that can save a lot of bandwidth.

solsoOP13y ago

woops! @antirez in person :-) amazing job you have done with Redis.

Initially we didn't add compression to save bandwidth but to solve the network bandwidth issues between amazon and rackspace. However, it turns out that you can save on the bill quite a lot, in our case about ~ $700.

Regarding the binary, would be very nice to have. The first though is that compression would work better for us than binary since our keys are quite long (50/100 bytes) and very regular on their patterns, so compression works very well in our case. Probably can be extrapolated for other cases where the keys are typically heavier than the values.

Smrchy13y ago

Wouldn't compression between redis and hiredis make sense as well?

How much load would compression put on a Redis instance that talks to n clients where each client sends/receives medium sized JSON docs. And how much would be gained in terms of faster responses from Redis.

stock_toaster13y ago

Something like lzf or snappy maybe?

1 more reply

njyx13y ago· 3 in thread

Isn't there a way to get guaranteed bandwidth / throughput / speed from both parties in this? Not that compression isn't great but this seems to upper-bound pretty much anything.

stingraycharles13y ago

In the pure theoretical definition, packet switched networks (the internet) are not capable of giving you any guarantees on the throughput. Because of this, parties such as Amazon / Rackspace are unlikely to give you these guarantees either.

The closest thing you can get to guaranteed bandwidth today is to rent (dark) fiber between multiple DC's yourself.

jeffbarr13y ago

You can use AWS Direct Connect (http://aws.amazon.com/directconnect/) to set up a dedicated 1Gbit or 10 Gbit network connection from an AWS region to the location of your choice.

1 more reply

njyx13y ago

True - although you can get pretty good approximations. Still though, the point is valid: it would be very expensive to guarantee. Performance profile could drop badly pretty much any time.

politician13y ago· 2 in thread

I'm trying to understand 3scale's product. On one of their landing pages they state, "Handling 10’s or 100’s of millions of API calls per day? – No problem. We scale 3scale’s management systems so you don’t have to – our infrastructure is highly distributed and scalable on demand for your needs."

Now, their architecture diagrams show that that developers directly connect to the API servers. So the origin servers must be scaled out to handle "10's or 100's of millions of API calls per day" anyway.

So 3scale provides some kind of external dashboard that you can point an event stream at and get analytics? That is, something like New Relic for APIs? Is that correct?

njyx13y ago

The infrastructure handles rate limits, access keys, analytics, dev portal, billing etc. for the API but does it out of band. So yes, the traffic does go to the API origin and then control agents there enforce the policies, do the tracking etc. (indeed a bit like New Relic).

It's used for some very big APIs but yes, the origin sees the traffic. The system can offload it though - there are agents for Varnish for example which you can sit in front of the app or integration with Akamai, in which case policies are enforced at CDN edge nodes.

If the API is handling high volumes you still have to plan for that - but the point here is that the management elements scale with it. You're not stuck with a black box in front of the API which might blow up at inconvenient moments.

politician13y ago

That does sound quite useful; thank you for the explanation.

otterley13y ago· 2 in thread

Compressing the replication stream will likely buy you time (at the cost of added complexity resulting from having to maintain and monitor the ssh tunnel), but it won't solve your problem in the long term.

If your replication stream bandwidth continues to grow, the returns on compression will eventually diminish. As the cloud providers can't guarantee network throughput, it might be a good time to start planning to evacuate the cloud (or at least this part of the architecture).

If your business is growing that fast, this is a good problem to have. :)

solsoOP13y ago

Indeed it's a nice problem to have :-) we are not yet there, we double every 3 months so far, so we still have 9 months to go :-)

however, getting out of cloud does not help. The problem is the high-availability. To maintain 99.9% we cannot rely on a single data-center, no matter what the claims on availability zones say. We have even seen network partition which are the worst of the worst. Once you go on the "Internet" there is no guarantees (unless dedicated lines).

otterley13y ago

Of course you'll have to have presence in multiple DCs. I didn't mean to imply that one would suffice.

I'm not sure whether you've analyzed the bandwidth of the replication stream, but my experience is that if you've got a good circuit provider, you should have adequate bandwidth for 30Mbps+ streams, except under extremely unusual conditions, even over the open Internet.

1 more reply

raverbashing13y ago· 1 in thread

Caveats:

I've used Redis once over an ssh tunnel (over a domestic internet connection that's everything but fast, it's a long story...). It is slow (as in agonizingly slow), even for a couple of queries, probably not redis fault, still.

One thing you really want to do in this case (described in the article) is to carefully select the encryption protocol for one that uses the least CPU or has the best data rate.

See: http://blog.famzah.net/2010/06/11/openssh-ciphers-performanc...

antirez13y ago

Here we are not talking about the client on the other end, but just the replication link over ssh that's not an issue because replication in Redis is just an (async) stream so latency is not an issue.

mnutt13y ago

I don't know what sort of data they're storing in redis, but it seems like you would want any over-the-internet replication of any kind to be encrypted via ssh tunnel or some other means.

j / k navigate · click thread line to collapse

38 comments

29 comments · 8 top-level

semiquaver13y ago· 6 in thread

It's hard to see running a critical service over an SSH tunnel as a long-term solution. Even with monit and autossh guarding the connection, SSH isn't exactly rock-solid when used this way.

If the redis protocol is so compressible, maybe this behavior should be integrated as a protocol option like [1]?

[1] http://dev.mysql.com/doc/refman/5.5/en/replication-options-s...

antirez13y ago

A lot of redundancy can be eliminated, and we have good past experience (and results) with this approach because we do it in the RDB format used for storage.

For the replication link it's easy to imagine a simple way to encode command names as operation IDs, the same for prefixed length stuff. So now you have:

    *3
    $3
    SET
    $3
    foo
    $3
    bar

pepve13y ago

It's hard to see running a critical service over an SSH tunnel as a long-term solution.

Since I'm tempted to set this up myself (not for Redis though), what is the alternative?

__alexs13y ago

vtund http://vtun.sourceforge.net/

I've only used it for tunnelling some traffic out of my home network to a VPS but it's been rock solid for me over several years of frequent use.

1 more reply

jevinskie13y ago

1 more reply

solsoOP13y ago

I suspect (guess) that not having an idle connection help the tunnel to not randomly drop.

semiquaver13y ago

Can you talk more about what the glitches were like? We're a few months away from production with our first product that uses redis, so this is very interesting.

Have you simulated a network partition? How does redis handle the reconnect?

2 more replies

peterwwillis13y ago· 4 in thread

If you're not CPU bound, lzma might gain you bigger savings with marginally higher CPU if you need it down the road.

alexkus13y ago

Surely the fastest encryption scheme would be none at all? It's times like this that it would be nice if ssh still supported the equivalent of "-c none" (or was it "-e none" in v1 ssh?).

I've run plenty of (production) stuff over ssh tunnels but it was always nervy with it and happy when the deployment was revamped to avoid the use of ssh tunnels completely.

peterwwillis13y ago

jevinskie13y ago

HPN-SSH patches -c none back into the server and client, among other patches that help if you have a long and fat pipe. [0]

[0]: http://www.psc.edu/index.php/hpn-ssh

sjs13y ago

antirez13y ago· 3 in thread

This added a bit of complexity to a few code paths that now require to estimate the size of objects, data, and so forth, but I think this will pay back in the long run.

solsoOP13y ago

woops! @antirez in person :-) amazing job you have done with Redis.

Smrchy13y ago

Wouldn't compression between redis and hiredis make sense as well?

stock_toaster13y ago

Something like lzf or snappy maybe?

1 more reply

njyx13y ago· 3 in thread

Isn't there a way to get guaranteed bandwidth / throughput / speed from both parties in this? Not that compression isn't great but this seems to upper-bound pretty much anything.

stingraycharles13y ago

The closest thing you can get to guaranteed bandwidth today is to rent (dark) fiber between multiple DC's yourself.

jeffbarr13y ago

You can use AWS Direct Connect (http://aws.amazon.com/directconnect/) to set up a dedicated 1Gbit or 10 Gbit network connection from an AWS region to the location of your choice.

1 more reply

njyx13y ago

True - although you can get pretty good approximations. Still though, the point is valid: it would be very expensive to guarantee. Performance profile could drop badly pretty much any time.

politician13y ago· 2 in thread

So 3scale provides some kind of external dashboard that you can point an event stream at and get analytics? That is, something like New Relic for APIs? Is that correct?

njyx13y ago

politician13y ago

That does sound quite useful; thank you for the explanation.

otterley13y ago· 2 in thread

If your business is growing that fast, this is a good problem to have. :)

solsoOP13y ago

Indeed it's a nice problem to have :-) we are not yet there, we double every 3 months so far, so we still have 9 months to go :-)

otterley13y ago

Of course you'll have to have presence in multiple DCs. I didn't mean to imply that one would suffice.

1 more reply

raverbashing13y ago· 1 in thread

Caveats:

One thing you really want to do in this case (described in the article) is to carefully select the encryption protocol for one that uses the least CPU or has the best data rate.

See: http://blog.famzah.net/2010/06/11/openssh-ciphers-performanc...

antirez13y ago

mnutt13y ago

I don't know what sort of data they're storing in redis, but it seems like you would want any over-the-internet replication of any kind to be encrypted via ssh tunnel or some other means.

j / k navigate · click thread line to collapse