undefined | Better HN

0 pointsLgWoodenBadger4y ago0 comments

Their solution seems like a "produce to Kafka first" but with extra steps.

Regarding:

When we produce first and the database update fails (because of incorrect state) it means in the worst case we enter a loop of continuously sending out duplicate messages until the issue is resolved

I don't understand where either 1) the incorrect state or 2) the need to continuously send duplicate messages come from.

Regarding:

The Job might still fail during execution, in which case it’s retried with exponential backoff, but at least no updates are lost. While the issue persists, further state change messages will be queued up also as Jobs (with same group value). Once the (transient) issue resolves, and we can again produce messages to Kafka, the updates would go out in logical order for the rest of the system and eventually everyone would be in sync.

This is the part that is equivalent to Kafka-first, except with all the extra steps of a job scheduling, grouping, tracking, and execution framework on top of it.

0 comments

1 comments · 1 top-level

anentropic4y ago

the article does not explain things very clearly, but I think this is describing the problem rather than their solution

Our high level idea was:

- Insert “work” into a table that acts like a queue

- “Executor” takes “work” from DB and runs it

...

A Job is an abstraction for a scheduled DB backed async activity

...

How did we solve the #2 state problem?

By recording Jobs in the service database we can do the state update within the same transaction as inserting a new Job. Combining this with a Job that produces the actual Kafka message, allows us to make the whole operation transactional. If either of the parts fails, updating the data or scheduling the job, both get rolled back and neither happens.

I think this is describing basically a Transactional Outbox

i.e. "jobs" are recorded in the postgres db as part of the same db transaction as the business logic actions

the difference from Kafka-first is that if the app decides to rollback the business logic then the message hasn't already sent

j / k navigate · click thread line to collapse