> I think the Redlock algorithm is a poor choice because it is “neither fish nor fowl”: it is unnecessarily heavyweight and expensive for efficiency-optimization locks, but it is not sufficiently safe for situations in which correctness depends on the lock.
https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
I think I agree with Kleppmann's analysis, though.
Is Redlock Safe? Reply to Redlock Analysis - https://news.ycombinator.com/item?id=11065933 - Feb 2016 (135 comments)
I think this is good perspective. More reliable + more safe + good performance - Fine, its not perfect, but I bet if you are currently using a single node redis lock and keep running into problems when it goes down, these improvements sound nice.
Some of antirez's comments surprise me a bit though
> A distributed lock without an auto release mechanism, where the lock owner will hold it indefinitely, is basically useless.
I have found durable locks very practical and useful
1. Get the current time.
2. … All the steps needed to acquire the lock …
3. Get the current time, again.
4. Check if we are already out of time, or if we acquired the lock fast enough.
4.5. client pauses for whatever reason (the example Kleppmann gives is a GC pause), long enough for the lock to expire
4.75. another client acquires the lock
5. Two clients simultaneously hold the lock
Which is the core of Kleppmann's argument against Redlock's correctness. I think the conclusion Sanfilippo can arrive at is that the algorithm is safer than the single node locking algorithm.
But for the Redlock algorithm, I've never encountered anyone that was already running 5 redis masters to use out of convenience.
You're much more likely to have an etcd, consul, Zookeeper, etc cluster that you could use for coarse-grained distributed locking.
E.g. if you algorithm is:
1) Hold the distributed lock
2) Do the thing
3) Release the lock
And the node goes dark for a while between steps 1 and 2 (e.g. 100% CPU load), by the time it reaches 2 the lock may have already expired and another node is holding it, resulting in a race. Adding steps like "1.1 double/triple check the lock is still held" obviously doesn't help because the node can go dark right after these and resume operation at 2. The probability of these is not too high, but still: no guarantees. Furthermore at a certain scale you do actually start seeing rogue nodes deemed dead hours ago suddenly coming back to life and doing unpleasant things.
The rule of thumb usually is "keep locks within the same transaction space as the thing they protect", and often you don't even needs locks in that case, just transactions can be enough by themselves. If you're trying to protect something that inherently un-transactional then, well, good luck because these efforts are always probabilistic in nature.
A good use-case for a remote lock would be when it's not actually used to guarantee consistency or avoid races, but merely tries to prevent duplicate calculations for cost/performance considerations. For all other cases I outright recommend avoiding them.
[0]: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
Created a “lock” table with a single string key column, so you can “select key for update” on an arbitrary string key (similar UX to redis lock). I looked at advisory locks, but they don’t work when the lock key needs to be dynamically generated.
I will need a distributed lock soon, but I've never used one before so I'm taking this chance to learn about them.
A lot of robust systems end up implementing their own bespoke WAL semantics on top of the system of record. It's like we should have a formal solution for doing that by now.
inspiring + slightly terrifying that rather than a single server-side implementation, every client is responsible for its own implementation
if postgres provided fast kv cache and a lock primitive it would own
What you truly need is something like ZooKeeper and etcd that are designed to achieve distributed consensus using algorithms like Paxos or Raft.
This ensures strong consistency and reliability in a distributed system, making them ideal for tasks like leader election, configuration management, and lease management where consistency across nodes is critical.
These algorithms ensure that a majority of nodes (a quorum) must agree on any proposed chAnge. This agreement guarantees that once a decision is made (e.g., to commit a transaction), it is final and consistent across all nodes. This strong consistency is critical in distributed systems to avoid split-brain scenarios.
This is easily caused by :
1-network partition
2-latency issues.
3-Async failover (2 nodes think they are the master)
4-replica lag (some but not all replica acknowledged the write) while master send confirmation to client
Without it this is alphaware at best
The only reasonable correctness-oriented view of that is "lol". It's not worth throwing a Jepsen-like test at it, the fundamentals aren't even slightly sound, merely "usually good enough". Whether that's worth it for [use X] depends on that use - often yes!
>The Hacker News user eurleif noticed how it is possible to reacquire the lock as a strategy if the client notices it is taking too much time in order to complete the operation. This can be done by just extending an existing lock, sending a script that extends the expire of the value stored at the key is the expected one. If there are no new partitions, and we try to extend the lock enough in advance so that the keys will not expire, there is the guarantee that the lock will be extended.