The disaggregated write-ahead log (2023) (opens in new tab)

(blog.schmizz.net)

88 pointscarlsverre2y ago13 comments

13 comments

12 comments · 5 top-level

bvrmn2y ago· 4 in thread

Generic robust replicated log would be nice to have. For two products I've implemented leaderless oplog to back application level replication to sync not only database changes but also configuration files and other needed data. Works like a charm.

EdwardDiego2y ago

Have you evaluated Bookkeeper? I haven't used it much, but keen on your thoughts on it especially on the robustness.

esafak2y ago

It underlies Apache Pulsar. https://pulsar.apache.org/docs/3.1.x/administration-zk-bk/

1 more reply

HammadB2y ago

Curious - what was the specific leaderless replication strategy?

bvrmn2y ago

Each operation has uniq id. Each node has own log with autoincremented op counter. According to topology rules. Each node tries to fetch new parts of remote logs with new ids from other nodes and apply to local state. Domain allows to express most operations in order independent manner. And for rest it's possible to use last op wins.

refset2y ago· 2 in thread

Consensus protocols, durability and transactional semantics are (should be) closely coupled. I recall TigerBeetle discussing somewhere how they could achieve better throughput and durability guarantees by combining replication/recovery with the consensus protocol, instead of layering it above. I.e. disaggregating the log can be expensive. There's a reference in [0] that might elaborate.

> TigerBeetle is “fault-aware” and recovers from local storage failures in the context of the global consensus protocol, providing more safety than replicated state machines such as ZooKeeper and LogCabin

[0] https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...

shikhar2y ago

I believe TigerBeetle are alluding to their integration of protocol-aware recovery [1], which is a worthy consideration for the log implementation. Yet another engineering concern which can be offloaded.

If the disaggregated log integrates some mechanism to support leadership "above" it [2], it can be functionally identical to a converged log. Efficiency-wise yes there will be some extra network messages – but networks are very high throughput [3] and fast (sub-millisecond within a cloud region) these days!

[1] https://www.usenix.org/conference/fast18/presentation/alagap... [2] https://maheshba.bitbucket.io/blog/2023/05/06/Leadership.htm..., also on HN yesterday [3] https://blog.enfabrica.net/the-next-step-in-high-performance...

refset2y ago

Thanks, yes protocol-aware recovery was the context. Pretty sure I first heard it described in Joran's QCon London 2023 talk here: https://youtu.be/_jfOk4L7CiY?t=1460

> If you want your distributed database to maximise availability, how your local storage engine recovers from storage faults in the write-ahead log needs to be properly integrated with the global consensus protocol.

epaulson2y ago· 1 in thread

This sentence is doing a lot of work: "Hypothetical S2 does a bit more to simplify the layers above – it makes leadership above the log convenient with leases and fenced writes."

It'd be awesome to have a bit more transactional help from S3. You could go a long way with 'only update this object if the ETags on these other objects are still the same'. I know AWS doesn't want to turn S3 into a full database but some updates you just can't do without having a whole 2nd service running alongside to keep track of the states of your updates.

shikhar2y ago

Agreed, both Google Cloud Storage and Azure Blob Storage support preconditions. Azure even has leases. S3 is for better or worse the common denominator for systems layering on top of object storage.

pilgrim02y ago

Note that I’m commenting in the spirit of “being ok to dream”, as the article promoted.

There’s only so much creativity available for software engineers to work around the fundamental constraints imposed by the lower levels. Software Engineering being a highly demanded skill, I strongly believe, is a reflection of how inefficient are the interlaces of standard architectures and processes. It reminds me of the Curse of Lisp, which, roughly, states that the reason for its lack of popularity and absence of large communities of developers stems from it being too powerful of a language, when compared, say, to Java. OSes are monstrosities struggling to keep pace with demand. The very concept and handling of the File is limiting and obsolete. It’s not an ideal building block for databases. Another is the fact that we have generalized compute but not storage. Memory and Storage are synonyms, so I imagine the ideal scenario is for both to be a single entity. I mean, the memory hierarchy should be flattened in the future. If this happens concurrently with an increase in capacity sufficient to emulate an infinite tape, then a number of today’s cutting edge software architectures will become relics — memories of a time when computers were not as cooperative and malleable as we needed them to be. The information revolution is just beginning, after all, and I absolutely love the fact that the path forward will be paved, primarily, by the collective effort of a multitude of creative minds, as always. Great article.

ditsuke2y ago

Remember reading this one just last month!

j / k navigate · click thread line to collapse

13 comments

12 comments · 5 top-level

bvrmn2y ago· 4 in thread

EdwardDiego2y ago

Have you evaluated Bookkeeper? I haven't used it much, but keen on your thoughts on it especially on the robustness.

esafak2y ago

It underlies Apache Pulsar. https://pulsar.apache.org/docs/3.1.x/administration-zk-bk/

1 more reply

HammadB2y ago

Curious - what was the specific leaderless replication strategy?

bvrmn2y ago

refset2y ago· 2 in thread

[0] https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...

shikhar2y ago

refset2y ago

Thanks, yes protocol-aware recovery was the context. Pretty sure I first heard it described in Joran's QCon London 2023 talk here: https://youtu.be/_jfOk4L7CiY?t=1460

epaulson2y ago· 1 in thread

This sentence is doing a lot of work: "Hypothetical S2 does a bit more to simplify the layers above – it makes leadership above the log convenient with leases and fenced writes."

shikhar2y ago

Agreed, both Google Cloud Storage and Azure Blob Storage support preconditions. Azure even has leases. S3 is for better or worse the common denominator for systems layering on top of object storage.

pilgrim02y ago

Note that I’m commenting in the spirit of “being ok to dream”, as the article promoted.

ditsuke2y ago

Remember reading this one just last month!

j / k navigate · click thread line to collapse