Distributed Locks with Redis (2014) (opens in new tab)

(redis.io)

84 pointsthenewwazoo1y ago37 comments

37 comments

30 comments · 8 top-level

zinodaur1y ago· 10 in thread

Martin Kleppmann has some interesting thoughts on Redlock:

> I think the Redlock algorithm is a poor choice because it is “neither fish nor fowl”: it is unnecessarily heavyweight and expensive for efficiency-optimization locks, but it is not sufficiently safe for situations in which correctness depends on the lock.

https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

cjk1y ago

Salvatore Sanfilippo (author of Redis and Redlock) wrote a response to Martin Kleppmann's analysis that is worth a read (though it is a bit dense and hard to follow at times): http://antirez.com/news/101

I think I agree with Kleppmann's analysis, though.

dang1y ago

Discussed at the time:

Is Redlock Safe? Reply to Redlock Analysis - https://news.ycombinator.com/item?id=11065933 - Feb 2016 (135 comments)

zinodaur1y ago

> The algorithm's goal was to move away people that were using a single Redis instance, or a master-slave setup with failover, in order to implement distributed locks, to something much more reliable and safe, but having a very low complexity and good performance.

I think this is good perspective. More reliable + more safe + good performance - Fine, its not perfect, but I bet if you are currently using a single node redis lock and keep running into problems when it goes down, these improvements sound nice.

Some of antirez's comments surprise me a bit though

> A distributed lock without an auto release mechanism, where the lock owner will hold it indefinitely, is basically useless.

I have found durable locks very practical and useful

1 more reply

wesselbindt1y ago

I think this is a good read, but not so much a rebuttal. He never really addresses the following scenario:

1. Get the current time.

2. … All the steps needed to acquire the lock …

3. Get the current time, again.

4. Check if we are already out of time, or if we acquired the lock fast enough.

4.5. client pauses for whatever reason (the example Kleppmann gives is a GC pause), long enough for the lock to expire

4.75. another client acquires the lock

5. Two clients simultaneously hold the lock

Which is the core of Kleppmann's argument against Redlock's correctness. I think the conclusion Sanfilippo can arrive at is that the algorithm is safer than the single node locking algorithm.

pram1y ago

I've personally used Redis locks in my personal projects. The real motivation isn't because it's good or correct but because: Redis is already there in my env.

gabetax1y ago

For basic SETNX single instance redis instances, sure.

But for the Redlock algorithm, I've never encountered anyone that was already running 5 redis masters to use out of convenience.

You're much more likely to have an etcd, consul, Zookeeper, etc cluster that you could use for coarse-grained distributed locking.

pookybear2231y ago

i think this type of ‘just tacking on’ to projects, since it is so low friction to do so, is part of how we got to where we are today

HideousKojima1y ago

"Those who don't understand ACID are doomed to reimplement it... poorly"

wesselbindt1y ago

This is a very nice read, as Kleppmann always is. Thanks for the recommendation!

Axsuul1y ago

What's a good alternative?

alexey-salmin1y ago· 6 in thread

Something I don't enjoy about remote/distributed locks is that unlike distributed transactions they're usually unable to provide any strict guarantees about things they protect.

E.g. if you algorithm is:

1) Hold the distributed lock

2) Do the thing

3) Release the lock

And the node goes dark for a while between steps 1 and 2 (e.g. 100% CPU load), by the time it reaches 2 the lock may have already expired and another node is holding it, resulting in a race. Adding steps like "1.1 double/triple check the lock is still held" obviously doesn't help because the node can go dark right after these and resume operation at 2. The probability of these is not too high, but still: no guarantees. Furthermore at a certain scale you do actually start seeing rogue nodes deemed dead hours ago suddenly coming back to life and doing unpleasant things.

The rule of thumb usually is "keep locks within the same transaction space as the thing they protect", and often you don't even needs locks in that case, just transactions can be enough by themselves. If you're trying to protect something that inherently un-transactional then, well, good luck because these efforts are always probabilistic in nature.

A good use-case for a remote lock would be when it's not actually used to guarantee consistency or avoid races, but merely tries to prevent duplicate calculations for cost/performance considerations. For all other cases I outright recommend avoiding them.

imor801y ago

A lot of what you say is explained in detail in Martin Kleppmann's article[0]. As you said, there's no guarantee about when the lock will expire. The proper solution for this is a fencing token. The idea is similar to how people have used optimistic locking when updating data in a db to avoid two users overwriting other's work.

[0]: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

rand_r1y ago

Yes, exactly! We found out the hard way just how unreliable Redis-based locks are, and switched to Postgres locks. It works reliably since our code is already in a Postgres transaction.

Created a “lock” table with a single string key column, so you can “select key for update” on an arbitrary string key (similar UX to redis lock). I looked at advisory locks, but they don’t work when the lock key needs to be dynamically generated.

quectophoton1y ago

After reading the current[1] top comment about Redlock, this was literally the next low-effort thing that came to mind, so I'm glad to find some else's experiences with using a PostgreSQL table as a lock.

I will need a distributed lock soon, but I've never used one before so I'm taking this chance to learn about them.

[1]: https://news.ycombinator.com/item?id=41315621

hinkley1y ago

If it goes dark a microsecond after #3 you might have an ambiguous success. Transaction processed but you didn't get a confirmation.

A lot of robust systems end up implementing their own bespoke WAL semantics on top of the system of record. It's like we should have a formal solution for doing that by now.

qaq1y ago

We do we have globally distributed ACID DBs like Spanner, CockroachDB, FoundationDB etc.

alexey-salmin1y ago

True. Even simple scenarios like "save a file in s3 IFF the s3 link is saved in postgres" which are seen in virtually any application are rarely handled well.

1 more reply

awinter-py1y ago· 4 in thread

redis is the easiest-to-host lock server and that's worth the risk in some applications (depending on consequence of errors obv)

inspiring + slightly terrifying that rather than a single server-side implementation, every client is responsible for its own implementation

if postgres provided fast kv cache and a lock primitive it would own

skyde1y ago

Redis is a very bad store for a distributed lock but Postgres is only slightly better.

What you truly need is something like ZooKeeper and etcd that are designed to achieve distributed consensus using algorithms like Paxos or Raft.

This ensures strong consistency and reliability in a distributed system, making them ideal for tasks like leader election, configuration management, and lease management where consistency across nodes is critical.

skyde1y ago

Paxos and Raft are consensus algorithms that provide certain guarantees and capabilities that a master-slave system with synchronous replication, such as PostgreSQL, cannot offer.

These algorithms ensure that a majority of nodes (a quorum) must agree on any proposed chAnge. This agreement guarantees that once a decision is made (e.g., to commit a transaction), it is final and consistent across all nodes. This strong consistency is critical in distributed systems to avoid split-brain scenarios.

This is easily caused by :

1-network partition

2-latency issues.

3-Async failover (2 nodes think they are the master)

4-replica lag (some but not all replica acknowledged the write) while master send confirmation to client

silverwind1y ago

Redis can achieve the same with `redis-sentinel` and `min-replicas-to-write`.

1 more reply

awinter-py1y ago

a single redis instance has better consistency + partition tolerance than any raft implementation

AtlasBarfed1y ago· 2 in thread

Where's the Jepsen suite tests?

Without it this is alphaware at best

Groxx1y ago

It's an out-of-band locking mechanism. In particular, one which mentions "you should consider fencing tokens" merely in passing instead of baking them in from the start.

The only reasonable correctness-oriented view of that is "lol". It's not worth throwing a Jepsen-like test at it, the fundamentals aren't even slightly sound, merely "usually good enough". Whether that's worth it for [use X] depends on that use - often yes!

hinkley1y ago

It's like a 50's ad slogan. If it doesn't _____, then I won't _____.

eurleif1y ago

I helped! :) (A little.)

http://antirez.com/news/77

>The Hacker News user eurleif noticed how it is possible to reacquire the lock as a strategy if the client notices it is taking too much time in order to complete the operation. This can be done by just extending an existing lock, sending a script that extends the expire of the value stored at the key is the expected one. If there are no new partitions, and we try to extend the lock enough in advance so that the keys will not expire, there is the guarantee that the lock will be extended.

jetru1y ago

Famously broken badly for anything mission critical.

Bogdanp1y ago

To counter some of the hate in this thread: I have used this to great success as an "opportunistic" locking mechanism to, for example, reduce load on a Postgres database. The winner of the race to acquire the lock would run the (expensive) query then cache the result. On lock release, the nodes waiting on the lock would then try to read the cache before trying to acquire the lock again.

anonzzzies1y ago

I have seen (ad hoc) implementations go quite bad many times. I encounter them quite a bit as a distributed replacement for some type of db transaction where the db is something like rds; someone thought to be smart and write 'things at scale' (they read on reddit etc) while a db transaction would've been the correct solution and they didn't need scale anyway (or underestimated the current db capabilities for mysql/postgres).

1 more reply

j / k navigate · click thread line to collapse

37 comments

30 comments · 8 top-level

zinodaur1y ago· 10 in thread

Martin Kleppmann has some interesting thoughts on Redlock:

https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

cjk1y ago

I think I agree with Kleppmann's analysis, though.

dang1y ago

Discussed at the time:

Is Redlock Safe? Reply to Redlock Analysis - https://news.ycombinator.com/item?id=11065933 - Feb 2016 (135 comments)

zinodaur1y ago

Some of antirez's comments surprise me a bit though

> A distributed lock without an auto release mechanism, where the lock owner will hold it indefinitely, is basically useless.

I have found durable locks very practical and useful

1 more reply

wesselbindt1y ago

I think this is a good read, but not so much a rebuttal. He never really addresses the following scenario:

1. Get the current time.

2. … All the steps needed to acquire the lock …

3. Get the current time, again.

4. Check if we are already out of time, or if we acquired the lock fast enough.

4.5. client pauses for whatever reason (the example Kleppmann gives is a GC pause), long enough for the lock to expire

4.75. another client acquires the lock

5. Two clients simultaneously hold the lock

Which is the core of Kleppmann's argument against Redlock's correctness. I think the conclusion Sanfilippo can arrive at is that the algorithm is safer than the single node locking algorithm.

pram1y ago

I've personally used Redis locks in my personal projects. The real motivation isn't because it's good or correct but because: Redis is already there in my env.

gabetax1y ago

For basic SETNX single instance redis instances, sure.

But for the Redlock algorithm, I've never encountered anyone that was already running 5 redis masters to use out of convenience.

You're much more likely to have an etcd, consul, Zookeeper, etc cluster that you could use for coarse-grained distributed locking.

pookybear2231y ago

i think this type of ‘just tacking on’ to projects, since it is so low friction to do so, is part of how we got to where we are today

HideousKojima1y ago

"Those who don't understand ACID are doomed to reimplement it... poorly"

wesselbindt1y ago

This is a very nice read, as Kleppmann always is. Thanks for the recommendation!

Axsuul1y ago

What's a good alternative?

alexey-salmin1y ago· 6 in thread

Something I don't enjoy about remote/distributed locks is that unlike distributed transactions they're usually unable to provide any strict guarantees about things they protect.

E.g. if you algorithm is:

1) Hold the distributed lock

2) Do the thing

3) Release the lock

imor801y ago

[0]: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

rand_r1y ago

Yes, exactly! We found out the hard way just how unreliable Redis-based locks are, and switched to Postgres locks. It works reliably since our code is already in a Postgres transaction.

quectophoton1y ago

I will need a distributed lock soon, but I've never used one before so I'm taking this chance to learn about them.

[1]: https://news.ycombinator.com/item?id=41315621

hinkley1y ago

If it goes dark a microsecond after #3 you might have an ambiguous success. Transaction processed but you didn't get a confirmation.

A lot of robust systems end up implementing their own bespoke WAL semantics on top of the system of record. It's like we should have a formal solution for doing that by now.

qaq1y ago

We do we have globally distributed ACID DBs like Spanner, CockroachDB, FoundationDB etc.

alexey-salmin1y ago

True. Even simple scenarios like "save a file in s3 IFF the s3 link is saved in postgres" which are seen in virtually any application are rarely handled well.

1 more reply

awinter-py1y ago· 4 in thread

redis is the easiest-to-host lock server and that's worth the risk in some applications (depending on consequence of errors obv)

inspiring + slightly terrifying that rather than a single server-side implementation, every client is responsible for its own implementation

if postgres provided fast kv cache and a lock primitive it would own

skyde1y ago

Redis is a very bad store for a distributed lock but Postgres is only slightly better.

What you truly need is something like ZooKeeper and etcd that are designed to achieve distributed consensus using algorithms like Paxos or Raft.

skyde1y ago

Paxos and Raft are consensus algorithms that provide certain guarantees and capabilities that a master-slave system with synchronous replication, such as PostgreSQL, cannot offer.

This is easily caused by :

1-network partition

2-latency issues.

3-Async failover (2 nodes think they are the master)

4-replica lag (some but not all replica acknowledged the write) while master send confirmation to client

silverwind1y ago

Redis can achieve the same with `redis-sentinel` and `min-replicas-to-write`.

1 more reply

awinter-py1y ago

a single redis instance has better consistency + partition tolerance than any raft implementation

AtlasBarfed1y ago· 2 in thread

Where's the Jepsen suite tests?

Without it this is alphaware at best

Groxx1y ago

It's an out-of-band locking mechanism. In particular, one which mentions "you should consider fencing tokens" merely in passing instead of baking them in from the start.

hinkley1y ago

It's like a 50's ad slogan. If it doesn't _____, then I won't _____.

eurleif1y ago

I helped! :) (A little.)

http://antirez.com/news/77

jetru1y ago

Famously broken badly for anything mission critical.

Bogdanp1y ago

anonzzzies1y ago

1 more reply

j / k navigate · click thread line to collapse