Every System is a Log: Avoiding coordination in distributed applications (opens in new tab)

(restate.dev)

229 pointssewen1y ago155 comments

155 comments

117 comments · 29 top-level

trollbridge1y ago· 15 in thread

I’ve been doing a similar thing, although I called it “append only transaction ledgers”. Same idea as a log. A few principles:

- The order of log entries does not matter.

- Users of the log are peers. No client / server distinction.

- When appending a log entry, you can send a copy of the append to all your peers.

- You can ask your peers to refresh the latest log entries.

- When creating a new entry, it is a very good idea to have a nonce field. (I use nano IDs for this purpose along with a timestamp, which is probabilistically unique.)

- If you want to do database style queries of the data, load all the log entries into an in memory database and query away.

- You can append a log entry containing a summary of all log entries you have so far. For example: you’ve been given 10 new customer entries. You can create a log entry of “We have 10 customers as of this date.”

- When creating new entries, prepare the entry or list of entries in memory, allow the user to edit/revise them as a draft, then when they click “Save”, they are in the permanent record.

- To fix a mistake in an entry, create a new entry that “negates” that entry.

A lot of parallelism / concurrency problems just go away with this design.

XorNot1y ago

How do you know summary entries are valid if order doesn't matter?

I.e. "we have 10 customers as of this date" can become immediately invalid if a new entry is appended afterwards with a date before that summary entry (i.e. because it was on a peer which hadn't yet sent it)

withinboredom1y ago

Realistically, you never store summaries in the log. Instead, you store what it took to calculate them. So you won't store "we have 10 customers on this date with this range" but instead store "we found these 10 customers on this date with this range". This assumes you can store infinite sized lists in your log, but realistically, this is never a concern if you can keep your time windows small enough. Then, you periodically do a reconciliation and log corrections (look for entries not summarized -- easily done via a bloom filter which can tell you what entries are definitely NOT in your set) over a longer period.

For example, we had a 28-day reconciliation period at one company I worked at (and handled over 120 million events per day). If you appended an event earlier than 28 days prior, it was simply ignored. This very rarely happened, but allowed us to fix bugs with events for up to 28 days.

clayg1y ago

IME you have to be willing to recalculate the summaries up to some kind of consistency window.

Yes you may be changing history and you may have a business reason not to address that revision immediately (you've already billed them?) - but the system can still learn it made a mistake and fix it (add activity from Jan 30 evening that comes in late to the Feb bill?)

Kinrany1y ago

> The order of log entries does not matter.

This is surprising, Kafka-like logs are all strictly ordered.

trollbridge1y ago

The reason I made ordering not matter is so that multiple peers don’t have to worry about keeping the exact same order when appending to each other’s logs.

The log entries do have timestamps on them. You can sort by timestamp, but a peer has the right to append an new entry that’s older than the latest timestamp.

cduzz1y ago

* within a partition

grahamj1y ago

The lack of ordering is surprising. Without that you can’t stream without a buffer.

trollbridge1y ago

The idea is that all peers eventually have the same collection of log entries, with an agreed-upon sort method. Currently that sort method is time stamps.

When it’s time to have identical replicas, a log entry must be appended that says the log is closed. Once all peers have this, the log is now immutable and can be treated as such. An example of this (for a payroll system) is a collection of entries for time clock in and outs, payroll taxes withheld, direct deposits to employees, etc when it’s time to do a payroll. The log is closed and that data can never be altered, but now the data can be relied on by someone else.

Conceptually this is surprisingly similar to the type of internal logs a typical database like PostgreSQL keeps.

1 more reply

log4shell1y ago

Calling a WAL a ledger, why? Ledger sounds fancier but why would it be a ledger in this case?

trollbridge1y ago

We called it a ledger since we stored financial data and basically used the “ledger” format from plain text accounting, initially.

1 more reply

hcarvalhoalves1y ago

I believe "ledger" implies commutative property (order does not matter).

1 more reply

glitchc1y ago

How do you manage log size for high-transaction systems?

trollbridge1y ago

It’s not terribly high transaction, yet. If it becomes that way, I would partition the one by entry type so all the high transaction stuff gets stuffed in a particular log, and then ensure it can be summarised regularly.

random31y ago

if order does not matter how do you implement deletes?

trollbridge1y ago

You don’t do deletes. The log is append only.

If you want to get rid of an entry, you create a new entry that negates the existing entry.

1 more reply

Animats1y ago· 10 in thread

This is a basic concept in accounting. The general ledger is an immutable log of transactions. Other accounting documents are constructed from the general ledger, and can, if necessary, be rebuilt from it. This is the accepted way to do money-related things.

Synchronization is called "reconcilation" in accounting terminology.

The computer concept is that we have a current state, and changes to it come in. The database with the current state is authoritative. This is not suitable for handling money.

The real question is, do you really care what happened last month? Last year? If yes, a log-based approach is appropriate.

inopinatus1y ago

I’ve always concurred with the Helland/Kleppman observation mentioned viz. that the transaction log of a typical RDBMS is the canonical form and all the rows & tables merely projections.

It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

But look underneath too. The journaled filesystem on which the database resides also has a log representation, and under that, a modern SSD is using an adaptive log structure to balance block writes.

It’s been a long time since we wrote an application event stream linearly straight to media, and although I appreciate the separate concerns that each of these layers addresses, I’d probably struggle to justify them all from first principles to even a slightly more Socratic version of myself.

formerly_proven1y ago

This is similar to the observation that memory semantics in most any powerful machine since the late 60s are implemented using messaging, and then applications go ahead and build messaging out of memory semantics. Or the more general observation that every layer of information exchange tends towards implementing packet switching if there's sufficient budget (power/performance/cost) to support doing so.

globular-toast1y ago

> It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

The database only supports CRUD. So while the CDC stream is the truth, it's very low level. We build higher-level event types (as in event sourcing) for the same reason we build any higher-level abstraction: it gives us a language in which to talk about business rules. Kleppmann makes this point in his book and it was something of an aha moment for me.

1 more reply

namtab001y ago

we're a very short step away from a "hardware" database

ianburrell1y ago

The table data in database is the canonical form. You can delete the transaction logs, and temporarily lose some reliability. It is very common to delete the transaction logs when not needed. When databases are backed up, they either dump the logical data or take snapshot of the data. Then can take stream of transaction logs for syncing or backup until the next checkpoint.

I'm pretty sure journalled filesystem recycle the journal. There are log-structured filesystem but they aren't used much beyond low-level flash.

1 more reply

LeanOnSheena1y ago

You're correct on all points. Some additional refining points regarding accounting concepts:

- General legers are formed by way of transactions recorded as journal entries. Journal entries are where two or more accounts from the general ledger are debited & credited such that total debits equals total credits. For example, a sale will involve a journal entry which debits cash or accounts receivable, and credits revenue.

- The concept of the debits always needing to equal credits is the most important and fundamental control in accounting. It's is the core idea around which all of double entry bookkeeping is built.

- temporally ordered Journal entries are what form a log from which a general ledger can be derived. That log of journal entries is append-only and immutable. If you make an mistake with a journal entry, you typically don't delete it, you just make another adjusting (i.e. correcting) entry.

Having a traditional background in accounting as a CPA, as a programmer I have written systems that are built around a log of temporally ordered transactions that can be used to construct state across time. To my colleagues that didn't have that background they found it interesting but very strange as an idea (led to a lot of really interesting discussions!). It was totally strange to me that they found it odd because it was the most comfortable & natural way for me to think about many problems.

funcDropShadow1y ago

Could you recommend some resource to understand this view of accounting better?

fellowniusmonk1y ago

This is why EG-Walker is so important, diamond types adoption and a solid TS port can't come soon enough for distributed systems.

calvinmorrison1y ago

You over estimate ERP and accounting systems.

baq1y ago

That was basic accounting when computers were people with pencils and paper.

1 more reply

shikhar1y ago· 9 in thread

This post makes a great case for how universal logs are in data systems. It was strange to me that there was no log-as-service with the qualities that make it suitable for building higher-level systems like durable execution: conditional appends (as called out by the post!), support very large numbers of logs, allow pushing high throughputs with strict ordering, and just generally provide a simple serverless experience like object storage. This led to https://s2.dev/ which is now available in preview.

It was interesting to learn how Restate links events for a key, with key-level logical logs multiplexed over partitioned physical logs. I imagine this is implemented with a leader per physical log, so you can consistently maintain an index. A log service supporting conditional appends allows such a leader to act like the log is local to it, despite offering replicated durability.

Leadership can be an important optimization for most systems, but shared logs also allow for multi-writer systems pretty easily. We blogged about this pattern https://s2.dev/blog/kv-store

xuancanh1y ago

> It was strange to me that there was no log-as-service with the qualities that make it suitable for building higher-level systems like durable execution

There are several services like that, but they are mostly kept behind the scene as a competitive advantage when building distributed systems. AWS uses it behind the scene for many services, as mentioned here by Marc Brooker https://brooker.co.za/blog/2024/04/25/memorydb.html. Facebook has similar systems like LogDevice https://logdevice.io/, and recently Delos https://research.facebook.com/publications/log-structured-pr...

shikhar1y ago

Indeed. We are trying to democratize that secret sauce. Since it is backed by object storage, the latencies are not what AWS enjoys with its internal Journal service, but we intend to get there with a NVMe-based tier later. In the meantime there is an existing large market for event streaming where a "truly serverless" (https://erikbern.com/2021/04/19/software-infrastructure-2.0-...) API has been missing.

logsr1y ago

> log as a service

very exciting. this is the future. i am working on a very similar concept. every database is a log at its core, so the log, which is the highest performance part of the system, is buried behind many layers of much lower performing cruft. edge persistence with log-per-user application patterns opens up so many possibilities.

hinkley1y ago

I just want a recognized standard format for write ahead logs. Start with replicating data between OLTP and OLAP databases with minimal glue code, and start moving other systems to a similar structure, like Kafka, then new things we haven’t thought of yet.

ianburrell1y ago

The structure for the write head logs needs to different between systems. For Postgres, the WAL is a record of writes with new blocks. It can't be used without knowing the Postgres disk format. I don't think it can be used to construct logical changes.

Using a standard format, converting things into logical data, would be significantly slower. It is important that WAL be fast because it is the bottleneck in transactions. It would make more sense to have a separate change streaming service.

1 more reply

thesz1y ago

  > It was strange to me that there was no log-as-service...

Actually, there are plenty of them. Most heard of is Ethereum 2.0 - it is a distributed log of distributed logs.

Any blockchain that is built upon PBFT derivative is such a system.

gavindean901y ago

What about journalctl?

shikhar1y ago

This is why we didn't actually call it logs as a service, but streams :P I meant to refer to the log abstraction this post talks about, see links therein. Observability events are but one kind of data you may want as a stream of durable records.

p_l1y ago

Surprisingly and disgustingly so it did not actually use a proper log structure on disk. So much pointer walking...

pjc501y ago· 7 in thread

> Having a single place (the one log) that forces a linear history of events as the ground truth and owns the decision of who can add to that ground truth, means we don’t have to coordinate much any more.

Well, yes, but then you've backed into CAP again because you only have one log.

sewenOP1y ago

Yes, we are assuming a log that picks linearizability at the cost of availability under partitions. Like most logs do, including Kafka, Pulsar, RedPanda, etc.

The application state is defined by the log here, and the log drives retries/recovery, so it doesn't much matter if the process that executes the app code splits off. The log would hydrate another one.

Also the one log is at the granularity of a single key or handler execution. More of a logical log, than a physical log or even partition.

In Restate, we implement a logical log-per-key, backed by a partitioned physical log.

mrkeen1y ago

If I've understood it, it's like using Kafka with 1 topic and 1 partition. But it shouldn't rule out multiple brokers with a >1 replication factor, giving you CP.

clayg1y ago

But can't any log be implemented as a CRDT? Was that not implied in the post? I didn't read it that close...

ismailmaj1y ago

CRDT is really only useful when inconsistencies could be acceptable in some situations and so it depends on the application.

For something that is trying to solve the general problem of consistent single ground-truth log, you can't really do much better than Spanner.

logsr1y ago

the "conflict-free" in crdt is like miller high life being the "champagne of beers." it really means conflicts ignored and conflicting items discarded by established rules, which works well for some use cases, but for many does not.

UltraSane1y ago

couldn't the log be synchronously replicated to multiple servers to increase A?

ismailmaj1y ago

In case of mutation either the replicas will be out of sync (so no C), or you'll need to synchronously mutate the replicas (so bad A due to latency).

1 more reply

EGreg1y ago· 5 in thread

Since we’re on the subject of logs and embarassingly parallel distributed systems, I know someone who’s also in NYC who’s been building a project exactly along these lines. It’s called gossiplog and it uses Prolly trees to make some interesting results.

https://www.npmjs.com/package/@canvas-js/gossiplog

Joel Gustafson started this stuff at MIT and used to work at Protocol Labs. It’s very straightforward. By any chance sewen do you know him?

I first became aware of this guy’s work when he posted “merklizing the key value store for fun and profit” or something like that. Afterwards I looked at log protocols, including SLEEP protocol for Dat/Hypercore/ pear and time-travel DBs that track diffs, including including Dolt and even Quadrable.

https://news.ycombinator.com/item?id=36265429

Gossiplog’s README says exactly what this article says— everything is a log underneath and if you can sync that (using prolly tree techniques) people can just focus on business logic and get sync for free!

sewenOP1y ago

Never encountered it before, but it looks cool.

I think they are trying to solve a related problem. "We can consolidate the work by making a generic log that has networking and syncing built-in. This can be used by developers to make automatically-decentralized apps without writing a single line of networking code."

At a first glance, I would say that Gossiplog is a bit more low level, targeting developers of databases and queues, to save them from re-building a log every time. But then there are elements of sharing the log between components. Worth a deeper look, but seems a bit lower level abstraction.

EGreg1y ago

It’s part of his higher-level framework called Canvas.

Check this out: https://joelgustafson.com/posts/2024-09-30/introduction-to-c...

And this: https://github.com/canvasxyz/canvas

hem7771y ago

There’s also OrbitDB https://github.com/orbitdb/orbitdb which to my understanding has been a pioneer for p2p logs, databases and CRDTs.

vdm1y ago

Thank you @EGreg for sharing this.

EGreg1y ago

Def. I geek out on this stuff, as I am building my own distributed systems. I have had discussions with a lot of people in the space, like Leslie Lamport, Petar Maymounkov etc.

You might like this interview: https://www.youtube.com/watch?v=JWrRqUkJpMQ

This is what I’m working on now: https://intercoin.org/intercloud.pdf

TuringTest1y ago· 5 in thread

Excuse me for sounding rough, but - isn't this reinventing comp-sci, one step at a time?

I learned about distributed incrementally -monotonic logs back at the late 90s, with many other ways to do guaranteed transactional database actions. And I'm quite certain these must have been invented in the 50s or 60s, as these are the problems that early business computer users had: banking software. These are the techniques that were buried in legacy COBOL routines, and needed to be slowly replaced by robust Java core services.

I'm sure the Restate designers will have learned terribly useful insights in how to translate these basic principles into a working system with the complexities of today's hardware/software ecosystem.

Yet it makes me wonder if young programmers are only being taught the "build fast-break things" mentality and there are no longer SW engineers able to insert these guarantees into their systems from the beginning, by standing on the shoulders of the ancients that invented our discipline, so that their lore is actually used in practice? Or am I just missing something new in the article that describes some novel twist?

hinkley1y ago

When I was in school I had an optional requirement. You had to take one out of 2 or 3 classes to graduate. That was compiler design, which was getting terrible reviews from my peers who were taking it the semester before me, or distributed computing. Might have been a third but if so it was unmemorable.

So I took distributed computing. Which ended up being one of the four classes that satisfied the 80/20 rule for my college education.

Quite recently I started asking coworkers if they took such a class and was shocked to learn how many not only didn’t take it, but could not even recall it being an option at their school. What?

I can understand it being rare in the 90’s but the 00’s and on were paying attention to horizontal scaling, and the 2020’s are rotten with it distributed computing concerns. How… why… I don’t understand how we got here.

withinboredom1y ago

So many people I work with don't "get" distributed systems and how they interplay and cause problems. Most people don't even know that the ORDER you take potentially competing (distributed) locks even matters -- which is super important if you have different teams taking the same locks in different services!

The article is well written, but they still have a lot of problems to solve.

1 more reply

mrkeen1y ago

I think your points are pretty spot on - most things have already been invented, and there's too much of a move-fast-and-break-things mentality.

Here's a follow-up thought: to what extent did the grey-beards let us juniors down by steering us down a different path? A few instances:

DB creators knew about replicated logs, but we got given DBs, not replicated log products.

The Java creators knew about immutability: "I would use an immutable whenever I can." [James Gosling, 1] but it was years later when someone else provided us with pcollections/javaslang/vavr. And they're still far from widespread, and nowhere near the standard library.

Brendan Eich supposedly wanted to put Scheme into browsers, but his superiors had him make JS instead.

What other stuff have we been missing out on?

[1] https://www.artima.com/articles/james-gosling-on-java-may-20...

BrendanEich1y ago

James (my source was an insider in the Java team at Sun, pre-Marimba) wrote java.util.Date, which I had my one assistant (Ken Smith of Netscape) translate from Java to C for JS's first Date object, regrets all around but it "made it look like Java".

I wish James had been in favor of immutability in designing java.util.Date!

sewenOP1y ago

This is certainly building on principles and ideas from a long history of computer science research.

And yes, there are moment where you go "oh, we implicitly gave up xyz (i.e., causal order across steps) when we started adopting architecture pqr (microservices). But here is a thought on how to bring that back without breaking the benefits of pqr".

If you want, you can think of this as one of these cases. I would argue that there is tremendous practical value in that (I found that to be the case throughout my career).

And technology advances in zig zag lines. You add capability x but lose y on the way and later someone finds a way to have x and y together. That's progress.

bruce3434341y ago· 5 in thread

> If everything’s in one log, there’s nothing to coordinate #

On the contrary. Everything becomes coordinated.

The entire "log" becomes a giant ass mutex lock. Good luck scaling it.

sewenOP1y ago

There is nothing to coordinate for the application, because, yes, the log coordinates everything. But not globally, on the level of a single event handler execution, or a single key.

That has been proven to scale well - the way we implement that in Restate is classical shared nothing physical partitioning, with indexing on a key granularity.

So nothing like a shared mutex unless you want to access the same key, which otherwise your database synchronizes, if you want any reasonable level of consistency.

mrkeen1y ago

I think the author is motte-and-baileying between:

Literally one log - which does indeed reduce your coordination headache, but is susceptible to your "giant ass mutex" comment, and

One log per ... - which brings the coordination problems right back into existence.

sewenOP1y ago

I can see where some of that could be written more clearly. To elaborate:

- We mean using one log across different concerns like state a, communication with b, lock c. Often that is in the scope of a single entity (payment, user, session, etc.) and thus the scope for the one log is still small, and it reduces coordination headache for coordinating between the systems. You would have a lot of independent logs still, for separate payments.

- It does _not_ mean that one should share the same log (and partition) for all the entities in your app, like necessarily funneling all users, payments, etc. through the same log. What would be needed if you try and do some multi-key-distributed transaction processing. That goes actually beyond the proposal here, and has some benefits of its own, but have a hard time scaling.

1 more reply

neuroelectron1y ago

Exactly what I was thinking. Now what's the best mutex system we've built? An SQL database.

kikimora1y ago

You can use something like DynamoDb with partition per interaction. That would scale great.

jaseemabid1y ago· 4 in thread

A notable example of a large-scale app built with a very similar architecture is ATproto/Bluesky[1].

"ATProto for Distributed Systems Engineers" describes how updates from the users end up in their own small databases (called PDS) and then a replicated log. What we traditionally think of as an API server (called a view server in ATProto) is simply one among the many materializations of this log.

I personally find this model of thinking about dataflow in large-scale apps pretty neat and easy to understand. The parallels are unsurprising since both the Restate blog and ATProto docs link to the same blog post by Martin Kleppmann.

This arch seems to be working really well for Bluesky, as they clearly aced through multiple 10x events very recently.

[1]: https://atproto.com/articles/atproto-for-distsys-engineers

sewenOP1y ago

That blog post is a great read as well. Truely, the log abstraction [1] and "Turning the DB inside out" [2] have been hugely influential.

In a way this article here suggests to extend that

(1) from a log that represents data (upserts, cdc, etc.) to a log of coordination commands (update this, acquire that log, journal that steo)

(2) have a way to link the events related to a broader operation (handler execution) together

(3) make the log aware of handler execution (better yet, put it in charge), so you can automatically fence outdated executions

[1] https://engineering.linkedin.com/distributed-systems/log-wha...

sewenOP1y ago

[2] https://martin.kleppmann.com/2015/11/05/database-inside-out-...

grahamj1y ago

Table/log duality goes back further than Kleppmann though. An earlier article that really influenced me was

https://engineering.linkedin.com/distributed-systems/log-wha...

zellyn1y ago

Martin Kleppmann was also directly involved with Bluesky as a consultant.

mrkeen1y ago· 3 in thread

Using one-log-only for an entire system does have its upsides, but it will kill performance. It would be like building a CRUD system with a single mutex for everyone to share.

hcarvalhoalves1y ago

I believe you want "one log" in the logical sense. In theory, you could have "one log" per user or group of users, or whatever sharding technique makes sense for multi tenancy model. It can also be "one log" per bounded context - e.g. the entire payment pipeline in one log.

mrkeen1y ago

That's what the rest of us are doing in event-sourcing land, but TFA is arguing for something much stronger:

  If everything’s in one log, there’s nothing to coordinate

The rest of us have to coordinate the logical log of user creation/deletion and the logical log of user payments, etc.

Separate logical logs with no need for coordination can only work if they are truly independent systems - no causality between them.

sewenOP1y ago

Yes, exactly right. One log per logical entity, here "payment ID".

The way our open source project implements that is with a partitioned log and indexes at key-granularity, so it is like virtually a log per key.

1 more reply

zellyn1y ago· 3 in thread

sewen (et al)

This is lovely and I'm itching to try it. One question:

We have a use case where a location gets cut off completely from the internet at large. In that case, it makes sense for the local hardware (typically Android and/or iOS tablets or equivalent) to take over as a log owner: even though you're cut off, if you're willing to swallow the risk (and hence cost) of offline payments, you should be able to create orders, fulfill them, pay for them, close them out, send tickets to the kitchen to cook the food or to the warehouse to fetch the tractor, etc.

Does restate include something that covers that use-case? In the noodling/daydreaming a colleague and I have done, we ended up with something very close to restate (I imagined just using Kafka), except that additionally many operations would have a CRDT nature: eg. you should _always_ be allowed to add a payment to an order, because presumably a real-life payment happened.

I've also noodled with the idea of logs whose canonical ownership can be transferred. That covers cases where you start offline and then reconnect, but doesn't work so well for transactions that start out connected (and thus owned in the datacenter) and need to continue offline.

One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc., so if you go offline, you can still proceed. It might even be possible to recognize disconnection and then take one full vote to further subdivide that pool of consensus members so if one dies it doesn't halt progress. This feels like it would be getting very tricksy…

withinboredom1y ago

I'm actually working on a database implementation for this exact use-case... It's a distributed edge database and still quite a long ways to go -- https://github.com/bottledcode/atlas-db if you want to give it a star.

It's mostly based on wpaxos (wide-area consensus), spaxos, fpaxos and pretty neat. The repo above is a productionization of several proof of concepts to get there.

> One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc., so if you go offline, you can still proceed.

This is what annoys me to no end about RAFT. It's a great protocol, don't get me wrong, but its too simple for these types of problems. RAFT fails when it doesn't have consensus and because the consensus is non-deterministic, it must have an odd number of nodes. PAXOS, while far more complex than RAFT in terms of "grok", is deterministic so you don't need an odd number of nodes.

If you throw in some flexible quorums, you can do some really neat stuff, like how Atlas handles a "region" (ie, areas connected via the internet instead of the same network) becoming disconnected; but I'm not ready yet. There's still a long way to go!

WilcoKruijer1y ago

Interesting use case, I've also thought about this in the past.

I wonder why you want to designate the local device as "owner". The devices could simply keep the (closed) payments in a local outbox, and send them to the server when it comes back online. If there is no way for syncing the local changes to the server to fail, you've essentially created a CRDT (at least the "conflict-free" part).

Of course that means you cannot have any server side validation at all (maybe this is what you mean by "owner"?). Because that would mean closed out payments would be rejected by the server and you'd end up in an inconsistent state. A consequence of this is that there could be no balance checking (as an example). Payments in a ledger are inherently a CRDT!

The consensus part is interesting. Would you really need it? Wouldn't one device be sufficient to validate a transaction? And would consensus really even mean something within a small building? If the restaurant catches on fire the data would be lost anyway, so if you don't achieve "high-availability", why bother?

mrkeen1y ago

> One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc.

If the simple-majority is inside the restaurant, then the restaurant is online and the rest of the world is offline.

sewenOP1y ago· 2 in thread

A short summary:

Complex distributed coordination and orchestration is at the root of what makes many apps brittle and prone to inconsistencies.

But we can mitigate much of complexity with a neat trick, building on the fact that every system (database, queue, state machine) is effectively a log underneath the hood. By implementing interaction with those systems as (conditional) events on a shared log, we can build amazingly robust apps.

If you have come across “Turning the Database Inside Out” (https://martin.kleppmann.com/2015/11/05/database-inside-out-...), you can think of this a bit like “Turning the Microservice Inside Out”

The post also looks at how this can be used in practice, given that our DBs and queues aren't built like this, and how to strike a sweet-spot balance between this model with its great consistency, and maintaining healthy decoupling and separation of concerns.

teddyh1y ago

Is this summary AI generated?

sewenOP1y ago

Haha, no, but maybe all the AI-generated contents out there is starting to train me to write in a similar style...

1 more reply

sewenOP1y ago· 2 in thread

Some clarification on what "one log" means here:

- It means using one log across different concerns like state a, communication with b, lock c. Often that is in the scope of a single entity (payment, user, session, etc.) and thus the scope for the one log is still small. You would have a lot of independent logs still, for separate payments.

- It does _not_ mean that one should share the same log (and partition) for all the entities in your app, like necessarily funneling all users, payments, etc. through the same log. That goes actually beyond the proposal here - has some benefits of its own, but have a hard time scaling.

magicalhippo1y ago

Interesting read, not my area but I think I got the gist of it.

In your Restate example of the "processPayment" function, how do you handle errors of the "accountService" call? Like, what if it times out or returns a server error?

Do you store the error result and the caller of "processPayment" has to re-trigger the payment, in order to generate a new log?

stsffap1y ago

By default, failing ctx.run() calls (like the accountService call) will be retried indefinitely until they succeed unless you have configured a retry policy for them. In the case of a configured retry policy where you have exhausted the number of retry attempts, Restate will mark this call as terminally failed and record it in its log as such and return it to the caller.

1 more reply

daxfohl1y ago· 2 in thread

Haven't formed thoughts on the content yet, but happy to see a company launching something non-AI for a change.

gjtorikian1y ago

My startup, Yetto (http://www.yetto.app) is building a better way for support professionals to do their job. (Shameless plug but we always gotta hustle.)

We, too, are weighed down by how much space AI-focused companies are taking.

hansonkd1y ago

TBH looking at helpdesk software in 2025, I would expect new ones to be built AI first. It would be hard for me to consider one without at least some sort of LLMs helping with triage or at classifications of tickets, etc.

2 more replies

davexunit1y ago· 2 in thread

My takeaway from this article is that the proposed solution for distributed app coordination is a shared, centralized log. What did I miss?

azmy1y ago

IMHO the article is not mainly about the implementation of the Log, but rather leveraging on the idea of the log to build reliable and fault tolerant applications. The implementation of the log itself can be either centralised or decentralised.

sewenOP1y ago

That gist is correct - I would add that the log needs a few specific properties and conceptually be the shared log for state, communication, execution scheduling.

The next step is the, how do you make this usable in practice...

jamamp1y ago· 2 in thread

I wonder how this compares, conceptually, to Temporal? While Temporal doesn't talk about a single centralized log, I feel the output is the same: your event handlers become durable and can be retried without re-executing certain actions with outside systems. Both Restate and Temporal feel, as a developer coding these event handlers, like a framework where they handle a lot of the "has this action been performed yet?" and such for you.

Though to be fair I've only read Temporal docs, and this Restate blog post, without much experience in either. Temporal may not have as much on the distributed locking (or concept of) side of things that Restate does, in this post.

sewenOP1y ago

Temporal is related, but I would say it is a subset of this.

If you only consider appending results of steps of a handler, then you have something like Temporal.

This here uses the log also for RPC between services, for state that outlives an individual handler execution (state that outlives a workflow, in Temporal's terms).

jamamp1y ago

That makes a lot of sense, thank you! Extending out to other operations and not just event handlers/workflows would be neat.

hiAndrewQuinn1y ago· 2 in thread

I am a huge fan of append-only logs as a fundamental architectural principle. The Log [1] should be required reading for any CS undergraduate.

[1]: https://engineering.linkedin.com/distributed-systems/log-wha...

kasey_junk1y ago

I love them so much that I’ve noodled with building a programming language optimized for using them.

Things like types that encode what events are legal in a log, first class support for data versions, fast file read and writes, etc

daxfohl1y ago

How do you do GDPR takedowns?

2 more replies

yftsui1y ago· 2 in thread

The diagrams are remarkably neat, has a feeling of both excalidraw and draw.io, anyone know what tool was used create those?

wlonkly1y ago

Not sure what made the diagrams, but I think you've correctly identified the Excalidraw handwriting font. There's two versions, Virgil[1] and Excalifont[2], both under the MIT license.

I think the monospaced font is Comic Mono.[3]

[1] https://plus.excalidraw.com/virgil

[2] https://plus.excalidraw.com/excalifont

[3] https://dtinth.github.io/comic-mono-font/

gvdongen1y ago

We've used purely excalidraw. Nice to hear you like them!

whoiskatrin1y ago· 2 in thread

whats your take on handling log compaction to prevent unbounded growth, especially in systems with high write throughput?

sewenOP1y ago

Nice question! Restate is not a log that retains the raw events for a long time - conceptually just until they where processed by the handlers, DB, locking, etc.

When you build stateful handlers, the state per key is in the internal DB, and that get's you a similar effect to log compaction, i.e., retain one value per key.

trollbridge1y ago

I have a “summarise” log entry: the current log’s contents that will be relevant to the future are summarised. For example, if it’s FY2023’s financial transactions, we compute the final balances at the end of the year. We then close the log, and write an entry to it of “no more log entries after this are valid”.

We then copy the summary transactions to a new log, and compress and archive the old log.

You can identify high throughput and low throughput types of log entries and segregate them into different log streams. For example, the “new customer/change customer info” stream probably gets way less traffic than the “customer has logged in” stream. The former is also harder to summarise. Put the hard to summarise but low volume stuff in its own log.

erikerikson1y ago· 2 in thread

It's there a hosted offering? (Or plans to offer one?)

sewenOP1y ago

Yes, there is one, have a look at https://restate.dev/cloud/

erikerikson1y ago

Thank you and sorry I missed it.

ris1y ago· 1 in thread

> Restate is open source and you can download it at...

https://github.com/restatedev/restate/blob/main/LICENSE#L1

> Business Source License 1.1

https://spdx.org/licenses/BUSL-1.1.html

> The Business Source License (this document, or the “License”) is not an Open Source license.

Suggest exploring e.g. https://github.com/dbos-inc/dbos-transact-py

drewbug011y ago

I definitely don't consider the BSL to be a true "open source" license. However, I think that if the Additional Use Grant is written clearly and unambiguously, it may well be a reasonable compromise.

That said - the Additional Use Grant for Restate is fairly ambiguous:

> Additional Use Grant: You may make use of the Licensed Work, provided that you may not use the Licensed Work for an Application Platform Service.

> An “Application Platform Service” is a commercial offering that allows third parties (other than your employees and contractors) to access the functionality of the Licensed Work by registering new services (directly or indirectly) with the Licensed Work for the purpose of invoking such services through the Licensed Work.

What does it mean to "access the functionality"? What does it mean to "register [a] new service" ? How do you determine the "purpose" of invoking such services?

I understand that the intent is to prevent someone from launching a competing "Restate-as-a-Service" business; but broadly written clauses like this scare me off. If I want to build a totally different kind of business that heavily _relies_ on Restate, I would worry that these ambiguous, broad definitions could easily be used against me.

I really wish that companies adopting the BSL would put much more clarity into the license text. I know there's likely very good business reasons not to - you may constrain yourself in the future by doing so - but in my mind it would be the "right" thing to do. You'd get people actually using and contributing back to your software, much like they would under a more permissive license.

xnorswap1y ago· 1 in thread

It sounds like they have just re-discovered Distributed Transactions with a Distributed Transaction Coordinator.

But DTs have a huge problem: What happens if the owner of the lock netsplits?

Either the DTC waits (potentially forever?) for the owner of the lock to get back in touch and release the lock, or a timeout is applied and now the owner of the lock (who may be unaware of the netsplit) will be out of sync with the system.

logsr1y ago

they point out that they are adding conditional append to a log, which gives the universality of a log data structure and a mechanism for implementing lock-free/wait-free concurrency algorithms. it is a complete toolkit for building distributed systems because you can build anything else on top of those primitives.

amirjak1y ago· 1 in thread

How would you compare this to the actor model or to temporal?

sewenOP1y ago

Great question:

The Virtual Objects in Restate are much like actors. They are somewhat inspired by Orleans [1], and you could call them virtual stateful actors. They blend with the durable execution for processing messages with multiple durable steps.

Regarding temporal, check also this question: https://news.ycombinator.com/item?id=42815318

dboreham1y ago· 1 in thread

This is basically CSP no?

sewenOP1y ago

I assume CSP is communicating sequential processes?

Interesting analogy - in a way it is doing something CSP-like in a distributed app/service architecture with the all the different processes and components that are there. The shared log (or a partition of that) being a way to establish a sequential order.

bfair1y ago

Isn't this just trading consistency for availability? From what I understand the single log is single node. What happens when throughput is not enough at scale? "We will distribute the log." you say. Well, then we are back to square one.

paulsutter1y ago

This is very compelling, nice work. I'm going to spend some quality time on this.

Thaxll1y ago

This is exactly this example from Temporal: https://github.com/temporal-sa/temporal-order-fulfill-demo

qudat1y ago

Great post! At pico we've been spending a lot of time thinking about logs and a distributed system that can read and respond to events from logs. This is being driven in part by building out global services and a need for centralized logs for monitoring.

The end result is https://pipe.pico.sh which is an authenticated, networked *nix pipes over SSH. Since it relies on stdin/stdout via SSH it's one of the easiest pubsub systems we've used and we keep finding its ergonomics powerful. We have a centralized log-drain, metric-drain, and cache-clearing-drain all using `pipe`.

random31y ago

looks like a new generation is ready to discover Paxos, Zab, Raft

baq1y ago

Or, every database is bitemporal, some just don’t know it yet

j / k navigate · click thread line to collapse

155 comments

117 comments · 29 top-level

trollbridge1y ago· 15 in thread

I’ve been doing a similar thing, although I called it “append only transaction ledgers”. Same idea as a log. A few principles:

- The order of log entries does not matter.

- Users of the log are peers. No client / server distinction.

- When appending a log entry, you can send a copy of the append to all your peers.

- You can ask your peers to refresh the latest log entries.

- When creating a new entry, it is a very good idea to have a nonce field. (I use nano IDs for this purpose along with a timestamp, which is probabilistically unique.)

- If you want to do database style queries of the data, load all the log entries into an in memory database and query away.

- When creating new entries, prepare the entry or list of entries in memory, allow the user to edit/revise them as a draft, then when they click “Save”, they are in the permanent record.

- To fix a mistake in an entry, create a new entry that “negates” that entry.

A lot of parallelism / concurrency problems just go away with this design.

XorNot1y ago

How do you know summary entries are valid if order doesn't matter?

withinboredom1y ago

clayg1y ago

IME you have to be willing to recalculate the summaries up to some kind of consistency window.

Kinrany1y ago

> The order of log entries does not matter.

This is surprising, Kafka-like logs are all strictly ordered.

trollbridge1y ago

The reason I made ordering not matter is so that multiple peers don’t have to worry about keeping the exact same order when appending to each other’s logs.

The log entries do have timestamps on them. You can sort by timestamp, but a peer has the right to append an new entry that’s older than the latest timestamp.

cduzz1y ago

* within a partition

grahamj1y ago

The lack of ordering is surprising. Without that you can’t stream without a buffer.

trollbridge1y ago

The idea is that all peers eventually have the same collection of log entries, with an agreed-upon sort method. Currently that sort method is time stamps.

Conceptually this is surprisingly similar to the type of internal logs a typical database like PostgreSQL keeps.

1 more reply

log4shell1y ago

Calling a WAL a ledger, why? Ledger sounds fancier but why would it be a ledger in this case?

trollbridge1y ago

We called it a ledger since we stored financial data and basically used the “ledger” format from plain text accounting, initially.

1 more reply

hcarvalhoalves1y ago

I believe "ledger" implies commutative property (order does not matter).

1 more reply

glitchc1y ago

How do you manage log size for high-transaction systems?

trollbridge1y ago

random31y ago

if order does not matter how do you implement deletes?

trollbridge1y ago

You don’t do deletes. The log is append only.

If you want to get rid of an entry, you create a new entry that negates the existing entry.

1 more reply

Animats1y ago· 10 in thread

Synchronization is called "reconcilation" in accounting terminology.

The computer concept is that we have a current state, and changes to it come in. The database with the current state is authoritative. This is not suitable for handling money.

The real question is, do you really care what happened last month? Last year? If yes, a log-based approach is appropriate.

inopinatus1y ago

I’ve always concurred with the Helland/Kleppman observation mentioned viz. that the transaction log of a typical RDBMS is the canonical form and all the rows & tables merely projections.

It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

But look underneath too. The journaled filesystem on which the database resides also has a log representation, and under that, a modern SSD is using an adaptive log structure to balance block writes.

formerly_proven1y ago

globular-toast1y ago

> It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

1 more reply

namtab001y ago

we're a very short step away from a "hardware" database

ianburrell1y ago

I'm pretty sure journalled filesystem recycle the journal. There are log-structured filesystem but they aren't used much beyond low-level flash.

1 more reply

LeanOnSheena1y ago

You're correct on all points. Some additional refining points regarding accounting concepts:

- The concept of the debits always needing to equal credits is the most important and fundamental control in accounting. It's is the core idea around which all of double entry bookkeeping is built.

funcDropShadow1y ago

Could you recommend some resource to understand this view of accounting better?

fellowniusmonk1y ago

This is why EG-Walker is so important, diamond types adoption and a solid TS port can't come soon enough for distributed systems.

calvinmorrison1y ago

You over estimate ERP and accounting systems.

baq1y ago

That was basic accounting when computers were people with pencils and paper.

1 more reply

shikhar1y ago· 9 in thread

Leadership can be an important optimization for most systems, but shared logs also allow for multi-writer systems pretty easily. We blogged about this pattern https://s2.dev/blog/kv-store

xuancanh1y ago

> It was strange to me that there was no log-as-service with the qualities that make it suitable for building higher-level systems like durable execution

shikhar1y ago

logsr1y ago

> log as a service

hinkley1y ago

ianburrell1y ago

1 more reply

thesz1y ago

  > It was strange to me that there was no log-as-service...

Actually, there are plenty of them. Most heard of is Ethereum 2.0 - it is a distributed log of distributed logs.

Any blockchain that is built upon PBFT derivative is such a system.

gavindean901y ago

What about journalctl?

shikhar1y ago

p_l1y ago

Surprisingly and disgustingly so it did not actually use a proper log structure on disk. So much pointer walking...

pjc501y ago· 7 in thread

Well, yes, but then you've backed into CAP again because you only have one log.

sewenOP1y ago

Yes, we are assuming a log that picks linearizability at the cost of availability under partitions. Like most logs do, including Kafka, Pulsar, RedPanda, etc.

Also the one log is at the granularity of a single key or handler execution. More of a logical log, than a physical log or even partition.

In Restate, we implement a logical log-per-key, backed by a partitioned physical log.

mrkeen1y ago

If I've understood it, it's like using Kafka with 1 topic and 1 partition. But it shouldn't rule out multiple brokers with a >1 replication factor, giving you CP.

clayg1y ago

But can't any log be implemented as a CRDT? Was that not implied in the post? I didn't read it that close...

ismailmaj1y ago

CRDT is really only useful when inconsistencies could be acceptable in some situations and so it depends on the application.

For something that is trying to solve the general problem of consistent single ground-truth log, you can't really do much better than Spanner.

logsr1y ago

UltraSane1y ago

couldn't the log be synchronously replicated to multiple servers to increase A?

ismailmaj1y ago

In case of mutation either the replicas will be out of sync (so no C), or you'll need to synchronously mutate the replicas (so bad A due to latency).

1 more reply

EGreg1y ago· 5 in thread

https://www.npmjs.com/package/@canvas-js/gossiplog

Joel Gustafson started this stuff at MIT and used to work at Protocol Labs. It’s very straightforward. By any chance sewen do you know him?

https://news.ycombinator.com/item?id=36265429

sewenOP1y ago

Never encountered it before, but it looks cool.

EGreg1y ago

It’s part of his higher-level framework called Canvas.

Check this out: https://joelgustafson.com/posts/2024-09-30/introduction-to-c...

And this: https://github.com/canvasxyz/canvas

hem7771y ago

There’s also OrbitDB https://github.com/orbitdb/orbitdb which to my understanding has been a pioneer for p2p logs, databases and CRDTs.

vdm1y ago

Thank you @EGreg for sharing this.

EGreg1y ago

Def. I geek out on this stuff, as I am building my own distributed systems. I have had discussions with a lot of people in the space, like Leslie Lamport, Petar Maymounkov etc.

You might like this interview: https://www.youtube.com/watch?v=JWrRqUkJpMQ

This is what I’m working on now: https://intercoin.org/intercloud.pdf

TuringTest1y ago· 5 in thread

Excuse me for sounding rough, but - isn't this reinventing comp-sci, one step at a time?

hinkley1y ago

So I took distributed computing. Which ended up being one of the four classes that satisfied the 80/20 rule for my college education.

Quite recently I started asking coworkers if they took such a class and was shocked to learn how many not only didn’t take it, but could not even recall it being an option at their school. What?

withinboredom1y ago

The article is well written, but they still have a lot of problems to solve.

1 more reply

mrkeen1y ago

I think your points are pretty spot on - most things have already been invented, and there's too much of a move-fast-and-break-things mentality.

Here's a follow-up thought: to what extent did the grey-beards let us juniors down by steering us down a different path? A few instances:

DB creators knew about replicated logs, but we got given DBs, not replicated log products.

Brendan Eich supposedly wanted to put Scheme into browsers, but his superiors had him make JS instead.

What other stuff have we been missing out on?

[1] https://www.artima.com/articles/james-gosling-on-java-may-20...

BrendanEich1y ago

I wish James had been in favor of immutability in designing java.util.Date!

sewenOP1y ago

This is certainly building on principles and ideas from a long history of computer science research.

If you want, you can think of this as one of these cases. I would argue that there is tremendous practical value in that (I found that to be the case throughout my career).

And technology advances in zig zag lines. You add capability x but lose y on the way and later someone finds a way to have x and y together. That's progress.

bruce3434341y ago· 5 in thread

> If everything’s in one log, there’s nothing to coordinate #

On the contrary. Everything becomes coordinated.

The entire "log" becomes a giant ass mutex lock. Good luck scaling it.

sewenOP1y ago

There is nothing to coordinate for the application, because, yes, the log coordinates everything. But not globally, on the level of a single event handler execution, or a single key.

That has been proven to scale well - the way we implement that in Restate is classical shared nothing physical partitioning, with indexing on a key granularity.

So nothing like a shared mutex unless you want to access the same key, which otherwise your database synchronizes, if you want any reasonable level of consistency.

mrkeen1y ago

I think the author is motte-and-baileying between:

Literally one log - which does indeed reduce your coordination headache, but is susceptible to your "giant ass mutex" comment, and

One log per ... - which brings the coordination problems right back into existence.

sewenOP1y ago

I can see where some of that could be written more clearly. To elaborate:

1 more reply

neuroelectron1y ago

Exactly what I was thinking. Now what's the best mutex system we've built? An SQL database.

kikimora1y ago

You can use something like DynamoDb with partition per interaction. That would scale great.

jaseemabid1y ago· 4 in thread

A notable example of a large-scale app built with a very similar architecture is ATproto/Bluesky[1].

This arch seems to be working really well for Bluesky, as they clearly aced through multiple 10x events very recently.

[1]: https://atproto.com/articles/atproto-for-distsys-engineers

sewenOP1y ago

That blog post is a great read as well. Truely, the log abstraction [1] and "Turning the DB inside out" [2] have been hugely influential.

In a way this article here suggests to extend that

(1) from a log that represents data (upserts, cdc, etc.) to a log of coordination commands (update this, acquire that log, journal that steo)

(2) have a way to link the events related to a broader operation (handler execution) together

(3) make the log aware of handler execution (better yet, put it in charge), so you can automatically fence outdated executions

[1] https://engineering.linkedin.com/distributed-systems/log-wha...

sewenOP1y ago

[2] https://martin.kleppmann.com/2015/11/05/database-inside-out-...

grahamj1y ago

Table/log duality goes back further than Kleppmann though. An earlier article that really influenced me was

https://engineering.linkedin.com/distributed-systems/log-wha...

zellyn1y ago

Martin Kleppmann was also directly involved with Bluesky as a consultant.

mrkeen1y ago· 3 in thread

Using one-log-only for an entire system does have its upsides, but it will kill performance. It would be like building a CRUD system with a single mutex for everyone to share.

hcarvalhoalves1y ago

mrkeen1y ago

That's what the rest of us are doing in event-sourcing land, but TFA is arguing for something much stronger:

  If everything’s in one log, there’s nothing to coordinate

The rest of us have to coordinate the logical log of user creation/deletion and the logical log of user payments, etc.

Separate logical logs with no need for coordination can only work if they are truly independent systems - no causality between them.

sewenOP1y ago

Yes, exactly right. One log per logical entity, here "payment ID".

The way our open source project implements that is with a partitioned log and indexes at key-granularity, so it is like virtually a log per key.

1 more reply

zellyn1y ago· 3 in thread

sewen (et al)

This is lovely and I'm itching to try it. One question:

withinboredom1y ago

It's mostly based on wpaxos (wide-area consensus), spaxos, fpaxos and pretty neat. The repo above is a productionization of several proof of concepts to get there.

> One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc., so if you go offline, you can still proceed.

WilcoKruijer1y ago

Interesting use case, I've also thought about this in the past.

mrkeen1y ago

> One could also imagine ensuring that > n/2 consensus members are always located inside the restaurant/hardware store/etc.

If the simple-majority is inside the restaurant, then the restaurant is online and the rest of the world is offline.

sewenOP1y ago· 2 in thread

A short summary:

Complex distributed coordination and orchestration is at the root of what makes many apps brittle and prone to inconsistencies.

teddyh1y ago

Is this summary AI generated?

sewenOP1y ago

Haha, no, but maybe all the AI-generated contents out there is starting to train me to write in a similar style...

1 more reply

sewenOP1y ago· 2 in thread

Some clarification on what "one log" means here:

magicalhippo1y ago

Interesting read, not my area but I think I got the gist of it.

In your Restate example of the "processPayment" function, how do you handle errors of the "accountService" call? Like, what if it times out or returns a server error?

Do you store the error result and the caller of "processPayment" has to re-trigger the payment, in order to generate a new log?

stsffap1y ago

1 more reply

daxfohl1y ago· 2 in thread

Haven't formed thoughts on the content yet, but happy to see a company launching something non-AI for a change.

gjtorikian1y ago

My startup, Yetto (http://www.yetto.app) is building a better way for support professionals to do their job. (Shameless plug but we always gotta hustle.)

We, too, are weighed down by how much space AI-focused companies are taking.

hansonkd1y ago

2 more replies

davexunit1y ago· 2 in thread

My takeaway from this article is that the proposed solution for distributed app coordination is a shared, centralized log. What did I miss?

azmy1y ago

sewenOP1y ago

That gist is correct - I would add that the log needs a few specific properties and conceptually be the shared log for state, communication, execution scheduling.

The next step is the, how do you make this usable in practice...

jamamp1y ago· 2 in thread

sewenOP1y ago

Temporal is related, but I would say it is a subset of this.

If you only consider appending results of steps of a handler, then you have something like Temporal.

This here uses the log also for RPC between services, for state that outlives an individual handler execution (state that outlives a workflow, in Temporal's terms).

jamamp1y ago

That makes a lot of sense, thank you! Extending out to other operations and not just event handlers/workflows would be neat.

hiAndrewQuinn1y ago· 2 in thread

I am a huge fan of append-only logs as a fundamental architectural principle. The Log [1] should be required reading for any CS undergraduate.

[1]: https://engineering.linkedin.com/distributed-systems/log-wha...

kasey_junk1y ago

I love them so much that I’ve noodled with building a programming language optimized for using them.

Things like types that encode what events are legal in a log, first class support for data versions, fast file read and writes, etc

daxfohl1y ago

How do you do GDPR takedowns?

2 more replies

yftsui1y ago· 2 in thread

The diagrams are remarkably neat, has a feeling of both excalidraw and draw.io, anyone know what tool was used create those?

wlonkly1y ago

Not sure what made the diagrams, but I think you've correctly identified the Excalidraw handwriting font. There's two versions, Virgil[1] and Excalifont[2], both under the MIT license.

I think the monospaced font is Comic Mono.[3]

[1] https://plus.excalidraw.com/virgil

[2] https://plus.excalidraw.com/excalifont

[3] https://dtinth.github.io/comic-mono-font/

gvdongen1y ago

We've used purely excalidraw. Nice to hear you like them!

whoiskatrin1y ago· 2 in thread

whats your take on handling log compaction to prevent unbounded growth, especially in systems with high write throughput?

sewenOP1y ago

Nice question! Restate is not a log that retains the raw events for a long time - conceptually just until they where processed by the handlers, DB, locking, etc.

When you build stateful handlers, the state per key is in the internal DB, and that get's you a similar effect to log compaction, i.e., retain one value per key.

trollbridge1y ago

We then copy the summary transactions to a new log, and compress and archive the old log.

erikerikson1y ago· 2 in thread

It's there a hosted offering? (Or plans to offer one?)

sewenOP1y ago

Yes, there is one, have a look at https://restate.dev/cloud/

erikerikson1y ago

Thank you and sorry I missed it.

ris1y ago· 1 in thread

> Restate is open source and you can download it at...

https://github.com/restatedev/restate/blob/main/LICENSE#L1

> Business Source License 1.1

https://spdx.org/licenses/BUSL-1.1.html

> The Business Source License (this document, or the “License”) is not an Open Source license.

Suggest exploring e.g. https://github.com/dbos-inc/dbos-transact-py

drewbug011y ago

I definitely don't consider the BSL to be a true "open source" license. However, I think that if the Additional Use Grant is written clearly and unambiguously, it may well be a reasonable compromise.

That said - the Additional Use Grant for Restate is fairly ambiguous:

> Additional Use Grant: You may make use of the Licensed Work, provided that you may not use the Licensed Work for an Application Platform Service.

What does it mean to "access the functionality"? What does it mean to "register [a] new service" ? How do you determine the "purpose" of invoking such services?

xnorswap1y ago· 1 in thread

It sounds like they have just re-discovered Distributed Transactions with a Distributed Transaction Coordinator.

But DTs have a huge problem: What happens if the owner of the lock netsplits?

logsr1y ago

amirjak1y ago· 1 in thread

How would you compare this to the actor model or to temporal?

sewenOP1y ago

Great question:

Regarding temporal, check also this question: https://news.ycombinator.com/item?id=42815318

dboreham1y ago· 1 in thread

This is basically CSP no?

sewenOP1y ago

I assume CSP is communicating sequential processes?

bfair1y ago

paulsutter1y ago

This is very compelling, nice work. I'm going to spend some quality time on this.

Thaxll1y ago

This is exactly this example from Temporal: https://github.com/temporal-sa/temporal-order-fulfill-demo

qudat1y ago

random31y ago

looks like a new generation is ready to discover Paxos, Zab, Raft

baq1y ago

Or, every database is bitemporal, some just don’t know it yet

j / k navigate · click thread line to collapse