What they don’t tell you about event sourcing (opens in new tab)

(medium.com)

234 pointsjackthebadcat7y ago77 comments

77 comments

The last section on Operational Flexibility and the inability to change the event history raises a very good point.

Like most of the issues, the solution requires experience to know when you are at the Goldilocks point (Just Right). This specific issue has a lot in common with managing database migrations in django or any other migration system.

The ideal situation is to create migrations that can always be rolled back, but sometimes this is not possible to do operationally. For example, a schema change that restricts a field from NVARCHAR to INTEGER can only be generically rolled back if all of the unconvertable data is persisted. This can be mitigated by structuring the database to avoid these dead-ends, and that really is only gained by hard-won experience.

The problem with undoing operations via new events is the same thing -- unless you have foreknowledge of this kind of problem, it is very easy to accidentally create events that perform un-undoable actions. A very simple example of a problematic event would be something that modifies a foreign-key relationship -- let's say it's a digital asset in a game and you want the ability to transfer ownership of the item from one player to another.

The simple solution is a event like SetAssetOwner(asset_pk, player_pk). This would set the item's player foreign key field to the player's primary key. Easy. However, you have lost knowledge of where the asset came from and cannot undo this operation. A better solution would be to make an event SwapAssetOwner(asset_pk, owner_pk, recipient_pk). Yes, the owner_pk is technically redundant, but it provides a check against someone trying to steal an item with a maliciously crafted SetAssetOwner event that performs no checking. Even better, this operation can be reverted by sending the same event with the owner and recipient arguments reversed. Since these properties are part of the event message, they will be persisted in the event history and all of the information to undo the event is self-contained.

spitfire7y ago

Basically implement double entry accounting.

I think ideally you want something like Rich Hickey speaks about when he speaks of Datomic. An append only database. You can see what the previous values for that row were, along with schema changes.

fusiongyro7y ago

I worry about GDPR with respect to Kafka and other append-only databases.

1 more reply

Yokohiii7y ago

Idk why people are saying that ES data is always immutable. They can be by default sure, but if a facility is useful to change the history, why not?

hesdeadjim7y ago

If you need to change history you can just create new events that accomplish your mutation -- and even mark them as a type 'change history' or some such obvious identifier so when you inspect the event stream you know exactly what you are looking at.

3 more replies

wst_7y ago

In many cases they are legal proof that something happened. If you can mess with history, proof can't be used in court.

joefreeman7y ago

Good article. I've spent the last year migrating to an event sourced system, so thought I'd share some thoughts.

On the eventual consistency point, I've found you can get quite far with having the read model managing the race condition. This probably doesn't work everywhere, but in our system, multiple users can accept an invitation, so we have something like `InvitationAccepted{invitation_id, user_id}`. It's possible that multiple users might accept the invitation at roughly the same time, but the command-side doesn't really have to be concerned with this - it can happily allow multiple users to accept an invitation. It's up to the read model to ask, 'has this invitation already been accepted?' - if not, the acceptance is successful and will be indicated when queried, otherwise the acceptance is unsuccessful (and as a bonus, we can separately record who unsuccessfully tried to accept it). From the user's point of view, when they accept the invitation, they see a spinner until we confirm with the read model one way or the other (this could be done by polling the read model, but in our case we have an event sent back to the client).

Coming up with the event schema and versioning/granularity are hard. We have version numbers on all our events to make this a bit more manageable/explicit (`InvitationAccepted1`, for example). Storing events in a relational database does make it a bit easier to go back and edit/upgrade/delete them (sort-of cheating, but also relevant for GDPR). Also, I think we're going to end up suffering a bit from the 'whole system fallacy', but at the moment namespacing all the events (keeping in mind their expected volume) makes it a lot easier to manage.

sbov7y ago

> It's up to the read model to ask, 'has this invitation already been accepted?'

I feel like you've skipped over the interesting part of your strategy here. If it's an eventually consistent system, what keeps the read model from having the wrong answer to this question?

hvidgaard7y ago

Not OP, but I am working with a CQRS system.

In CQRS eventual consistency does not mean that we have multiple servers such that we have 2 servers with 2 different answers. It means from a command is issued to an event is propagated to all read models, there is a delay.

You need to handle the race condition at some point or another. From a CQRS point of view, a user accepting an invitation is just an event like any other event. What happens based on that event must account for the possibility that multiple users have accepted, and it's a rather straight forward thing to solve with an ES. The "accept" event with the lowest sequence number is the first.

Having the read side handle it would probably mean that when you have a read model for accepted invitations, you ignore all but the first accepted of an "invitationId".

tonyhb7y ago

I've been planning out an event sourcing like system for our healthtech startup, and you're definitely right re namespacing and versioning.

I explicitly keep a version in the event - which is a date. It's similar in how Stripe versions their API. We're also planning to handle the events in the same way as stripe's API: each event has side effects; the side effects may change depending on the version and each version has its own application logic (cascading, so you can have 2018-08-01 run all of 2018-07-30 plus its own changes).

This lets us replay events as they happened, run an event using two different versions and perform only the diffs etc.

Our system is probably not a typical CQRS/event sourcing setup.

The event system itself idempotent: you take an event with all input data necessary to run the event (form data, necessary current state), so the system can run independently. This means that every event is typed such that the input data dictates what is necessary.

The event handler validates the event, returning errors if necessary.

Then, the event handler runs all side effects and returns operations to perform: update model X attributes to Y, insert new record Z.

In effect, we go:

User -> API -> (generate event) -> Event Handler -> (error|response) -> save event and side effects in a transaction.

This means our DB is a cache of the event and all previous data, so we're not really event sourcing — we're audit trailing.

The main benefits are:

1. Medical records are complex and we always need audit trails.

2. If a doctor submits a prescription, we can show all side effects that happened for visibility (ie. this triggered a lab task, push notification, sent this message). We can verify this in the UI and see what happened for each patient without relying on assumptions.

3. Engineers know that the API produces events and can look up exactly the side effects that happen when an event occurs (we're using Rails for the API logic right now and this isn't always obvious).

4. We can ensure that we validate when an event happens based on input and current state without complex code, catching edge cases.

5. We can choose to save the event and side effects or not. This lets us "preview" actions or "replay" actions without actually changing any world state (you toggle a "test" flag in the event which also means the event handlers know not to trigger outside side effects).

6. The "side effects" response from the event handler can be sent to a websocket observable and consumed by frontends, ensuring that the doctor UI always has an up to date version of patient data.

Random thoughts:

- It's really just a framework for the logic of an application controller that's typed and ensures everything is consistent. Plus, similar to Stripe, it allows us to version events and write migrations/upgrade paths etc.

- What about conflicts? We have a plan to use hashes of the previous data to ensure consistency with medical records: if you're modifying fields A, B, C, you send over a hash of the previous data for A, B, C alongside the request. If the event handler can't verify the hash the data must've changed in the meantime.

----

We're producing events now but the handlers aren't yet in place, so this is currently still being planned. Essentially, we're using the API as authentication, authorization, routing/HTTP management, transaction/database management while the event/controller logic is being placed into a structured framework to ingest form data, current state and produce output.

hesdeadjim7y ago

This is exactly what we did in our system and it worked wonderfully. It also has the side effect of avoiding locks or contention when doing such mutations.

7sigma7y ago

Regarding eventual consistency, a CQRS\ES system can also be synchronous, or partially. You could have listeners for events that need to supply a strongly consistent model and others events that feed parts of the system that don't need strong consistency.

"However the events in a event store are immutable and can’t be deleted, to undo an action means sending the command with the opposite action"

Well they don't have to be immutable. I don't see why you can't update/migrate events.

superqwert7y ago

The author has explained the problem with eventual consistency and the CAP theorem, but he is trying to blame the problem at least partly on CQRS/ES - the problem will exist independently of what pattern you use if your system is distributed.

CQRS/ES is a pattern and doesn't need to be shoe-horned into every solution.

I work at Transport for London and we have a CQRS/ES system for managing the data driven design data. It is synchronous and is incredibly useful for ensuring it has both strong business logic and a fast readable side, while providing auditing for free.

We still have problems with EC and CAP, and they are not due to CQRS/ES. Make the system distributed and you will have to handle all the new scenarios. Those are the trade-offs.

vorpalhex7y ago

> can also be synchronous

Then you're giving up several of the benefits of CQRS, and might as well just not bother with the additional complexity.

> Well they don't have to be immutable. I don't see why you can't update/migrate events.

The events are your source of truth. When you "migrate" your source of truth, you're in somewhat dangerous waters (and if you're using CQRS for scale, a 1% failure to migrate data might be millions of records). You also now have the issue that your source of truth is being migrated, and probably can't accept writes (or you need to emit writes in both new format and old format for whenever your change over happens)

victorNicollet7y ago

>> can also be synchronous > Then you're giving up several of the benefits of CQRS, and might as well just not bother with the additional complexity.

Even without asynchronous read/write, there are still benefits worth the (arguably, small) additional complexity. For instance, the ability to add new functionality without having to migrate existing data is amazing.

1 more reply

zihotki7y ago

I also share same opinion based on my experience. Events can be modified and deleted but that must be an exceptional situation (GDPR and other compliances, etc.). But even if it's exceptional you have to provide a clear and easy way to do so and that increases complexity of the solution by a lot.

Another thing is strongly consistent models, there may be valid requirements in some problem areas to have a strongly consistent and normalized model and use it for command validation. This helps especially well in the case when all requirements are not known upfront and/or business domain changes very frequently. A small change in business may require to completely redo the aggregate roots and logic if you follow standard approach, this is very expensive. A better decision could be to use a normalized SQL database instead of an aggregate root. Such approach may be more flexible in certain cases and have it's own benefits as well as cost and drawbacks.

jwhitlark7y ago

Encryption + lose the key policies seem to satisfy the GDPR. So you don't have to actually delete an event, you can just give up your ability to read its payload.

victorNicollet7y ago

If the decision to abandon strong consistency involved careful analysis of the performance/maintenance trade-offs, then by definition the lack of consistency is less expensive than keeping a consistent but low-performance model, and you're just paying the price of having to solve a Hard Problem.

But if strong consistency was abandoned because someone wrote general statements in favor of eventual consistency...

dmitriid7y ago

Is it just me or the article never presented a solution to the posited problem: ”Since each entity is represented by a stream of events there is no way to reliable query the data. I have yet to meet an application that doesn’t require some sort of querying.”

The solution is said to be CQRS, and nothing in the article shows how you solve that.

If you unwind the unnecessarily winded florid poetic waxing, it all comes down to “dump it to a database, and query that”.

voiceofunreason7y ago

"dump it to a database, and query that"

Yes.

In some cases, "dump" is a fold/reduce, and your database is just an in memory data structure, and depending on how much latency is permitted by your service level objectives you might cache the data structure as opposed to regenerating it every time.

There's no magic.

The pattern is analogous to what you would do if your book of record were an RDBMS, and you had to run graph queries. "Dumping" the data into a graph database and running the query there would be a tempting solution, no?

oftenwrong7y ago

In practice it is typical to maintain a snapshot of the current state in a more-query-able form. In some systems, event writes and their corresponding snapshot updates occur within a transaction.

shady-lady7y ago

Hopefully bi-temporal tables get added to Postgres soon.

I think it would cover a few of the use cases that people are turning to event sourcing for(excluding scale).

theptip7y ago

I don't see much discussion of event-sourcing simply using a SQL database (i.e. skipping the CQRS part). This would allow you to keep your CP (strongly-consistent) semantics.

While this clearly wouldn't work in high-volume cases (i.e. where you _actually_ need CQRS), it seems like this would be the simplest option for many systems. I see a lot of articles advocating for immediately jumping into CQRS, which seems like a big increase in architectural complexity.

Does anyone have opinions/experience on this approach?

marxama7y ago

I went with this approach on a recent project, for two reasons: * Tracing/history/auditing * Structuring the service around these events, it became trivial to add new "event types", rather than expanding some big hairy PATCH endpoint or similar. In other words, when I needed a user entity to be able to belong to a group, I just added a new event type, "user/join-group". Having a single endpoint for all events had two other nice benefits: batching became trivial, as well as doing several events transactionally (all events in one requests are processed in one transaction).

It's been running in production for a couple of months now, and it's been working great! The main drawback I can think of was that it obviously didn't work well with Swagger out-of-the-box, had to do some custom handling for showing all the available event types.

tomerbd7y ago

Don't you still need a queue in your example? do you lock the whole table when you insert a new record to the table? can you elaborate how this solves consistency?

theptip7y ago

I don't think you need a separate queue; if you have an "Events" table then you can just write everything there.

It solves the consistency problem because you can create your event inside a transaction, which will rollback if another event touching the same source is created simultaneously.

E.g. if you have these incompatible events in a ledger:

CreditAccount(account_id=123, amount=100)

DebitAccount(account_id=123, amount=60)

You'd want one of the debit transactions to fail, assuming you want to preserve the invariant that the account's balance is always positive. You could put the `account_id` UUID as an `Event.source` field, which would allow you to lock the table for rows matching that UUID.

1 more reply

aaronmu7y ago

I’ve implemented this.

TLDR: orm’s + large transactions resulted in a lot of unexpected complexity.

At some point you get really big transactions because one write triggers five process managers which all trigger more writes and so on and so on. Performance was not a problem but I was surprised by the complexity of these big transactions in combination with an orm.

I dont have a concrete example but over the course of two years we have encountered multiple bugs that took days to solve. Theres one of these bugs that we fixed without identifying the root cause until this day.

theptip7y ago

Thanks for the perspective -- given what you know now, would you have built it differently if you could do it over again? Or was it the right call for the stage of your system, even if there were pain points?

dustingetz7y ago

Strong consistency is possible with this model, consider for example git.

One real database that works like this is Datomic, which is competitive with SQL for most kinds of read-heavy data modeling loads that SQL and CQRS is used for.

subsubsub7y ago

All the issues the author talks about are valid, however, the idea that no one warns you about them is wrong.

If you do even the most basic research into Event Sourcing you will find people warning of these trade offs (foremost amongst them Greg Young, the person who originally set out the idea)

likeclockwork7y ago

I've never quite been able to get my head all the way around "Eventual Consistency". I don't understand how actions that could conflict or resources that could be contended are supposed to work? At some point, something has to say, given two actions A and B that are in conflict, the later action B must fail.

So, where does that happen? When getting written into the read model? Then what, it emits an event saying clearly that the previous action failed? All while there's a client waiting for results?

I'd love to see a worked example based on discrete resources. Such as two people, a box, and a ball; where the ball can be held by either person or in the box, and a person can take the ball from the box but not from another person.

I find the concept intriguing but I don't really get it and I haven't been able to identify in any of the writing where "the buck stops".

paragraft7y ago

Conflict resolution is a separate problem that isn't addressed by Eventual Consistency. If I make a write in an EC system, that write may eventually be accepted, it may be rejected, or it could be resolved through something smarter like a CRDT. EC just says that I won't know that immediately; that different parts of the system can have different views of the truth at the same time. And for most business systems that's usually okay, if you're talking about consistency that resolves on a good-enough timescale.

likeclockwork7y ago

Thanks for answering. Is Eventual Consistency a necessary property of a CQRS/Event Sourcing architecture?

1 more reply

ninjakeyboard7y ago

I've worked with event sourced systems quite a bit in production and at scale, and have written a book on Akka and another one on related topics. While I have been an advocate of the approach, My experiences are guiding me away from implementing Event Sourcing in many use cases (especially where the entities are long lived). While CQRS is more complex, I'm more likely to implement CQRS without event sourcing, where an entity is responsible for its own persistence using whatever mechanism we deem suitable, and emits events that can be used for the view (or the data can be viewed as read only.) Ultimately a bounded context is there, and you send it commands and get out events. You have to evaluate the recovery and persistence mechanism based on your needs. Exactly once delivery is a pipedream so it doesn't matter how you're persisting the event and delivering it, evaluating line by line, crashing somewhere it's going to be possible to emit an event twice somewhere (eg if the outbound projection emits the event but doesn't persist its offset.) You need to deduplicate somewhere to get exactly once processing semantics. There _are_ simpler approaches to persistence that work just fine with the same delivery guarantees that work fine in the place of Event Sourcing. Nobody talks about any alternatives though - everyone defaults to event sourcing. My intuition guides me to look at other options having been burned one too many times, watching organizations collapse around 100% event sourced applications, etc.

I'm still working on event/message driven systems (today with Elixir mostly) but I've started to make architectural compromises to move away from especially event sourcing. Event Sourcing + CQRS may be prescriptive but it's very hard for new developers to pick up and understand the layers of abstraction underneath eg Akka + the Persistence Journal. And I'm not sure I can trust many of the open source libraries outside of Akka to be honest. I've had to dig into the depths of postgres journal implementations and apply windowing to journal queries for example because they weren't burned in at the scale I was working with (partially because I inherited an application with a single entity in a context which had many million line long journals - this highlights a design error though but hopefully you can see my point.)

You don't _need_ to use these patterns but you can still apply DDD and event/message based abstractions, and publish events. An entity can write its state to a record and then apply the state in memory as well without using a journal given you handle exactly once processing semantics correctly. This means there are knobs and dials. The problem with event sourcing in the greater picture is that it's descriptive of an approach, and there aren't many clear alternatives that people are talking about that work in similar system designs. If you have very long lived entities, or only a few of them, it gets especially difficult to keep the system alive over time, but for those use cases it doesn't mean you should stop receiving commands and emitting events.

You always here about the idea of the approach, never the reality of maintaining these systems, or the inappropriate use. In one implementation, there is one entity that receives thousands of events a day, and lives forever. How do you maintain the journal while changing the code, keep it alive over time? I've watched event sourcing and CQRS sink projects and teams. I've watched well paid contractors unable to figure out how to cluster and scale these systems. The barrier to entry for people to become effective can be high and you should understand the long view in terms of people required and cost over time and validate the approach for your use case very carefully. Again, the fact that everyone talks about event sourcing and no closely related alternatives makes it seem like the gold standard or the only option but there are other (simpler) ways to deal with your persistence in an overall similar architectural approach.

raarts7y ago

I've re-read your post a couple of times, but still find it confusing. It reads like you're an early-adopter, and got burnt multiple times by design flaws, and people not being familiar with it.

Suppose ES/CQRS is designed well, and a well-understood mechanism, would you still move away from it? For which scenarios? I can imagine exactly-once delivery not being a problem in all scenarios.

And what is the problem with long-lived entities? Solely the fact it takes longer to reconstruct current state?

ninjakeyboard7y ago

I'm only saying it isn't the only approach, even if you're keeping the source of truth in memory. It's the most prescriptive approach for sure so is the one people tend to go for first but there be dragons in managing especially long lived journals. Migrating the journal over time, figuring out how to truncate it (especially if there is an initial creation command/event that describes the rest of the life, or periodic updates to the structure - you have to keep all of that data forever potentially to know that it's going to wind up being correct.)

And yes long recovery times are an issue unless you want to keep every entity in memory forever and ever, or you can tolerate extremely slow turnaround times. There are knobs and dials here that are subtleties that people won't think about until after they launch.

If you have a hammer everything looks like a nail kind of thing. There are other ways to handle problems that are only marginally different in how they persist and recover state, yet are far more usable for certain use cases. I'm extremely skeptical when I see someone gung-ho for event sourcing if they've never used it though. I tend to look at the problem very hard to see what else we could do for persistence while still maintaining the "source of truth" in memory.

marcosdumay7y ago

The GP sounds exactly like the "you don't need microservices for everything, you can have well built monoliths too", "not everything must be a single page app", or "not every datacenter should be replaced with a cloud one".

ES specifically has a large list of drawbacks, so not everybody should use it. It would be great if people could tell about how there are many changes of gray between "CRUD-only we forget everything that is not current" and ES, but in a hush to make a point people act like those are the only known possibilities.

mrdoops7y ago

Elixir is fantastic, especially in these use cases because it allows for stateful in-memory representations of an entity with less development burdens than elsewhere.

I find it useful to have a GenServer for complex entities like state machines modeling business processes. I'm fine with the simple entities using the database schema as the state definition. However with the complex entities I still find I run into the classic ORM problem where the database structure doesn't perfectly fit the domain's pure business logic representation.

ninjakeyboard7y ago

Yeah for sure. That's kind of the idea with event sourcing in general, but I think with elixir/otp, it's easy to see how event-sourcing is only a persistence mechanism. Usually how I think about it is Event Sourcing (or gen server using a db as a recovery mechanism) "moves" the source of truth from a db into the process that owns the data.

kilburn7y ago

I admittedly don't have your level experience, especially with such huge scale. However, I've worked with smaller systems and have found a "sweet-spot" that worked well in my cases:

1. Keep a traditional RDBM system. Place in here everything that needs consistency (e.g., the bank's accounts, balances and transactions) or has to be stored indefinitely.

2. That part of the system generates events. Those events are used to maintain Queryable stores that are better structured for your required queries. For instance, you could store per-client transaction lists in here. Or you could fill in an OLAP database, or whatever.

3. For each query-able store, implement a process that can populate it from the RDBMS data. This serves several purposes:

a) You can rebuild any such stores at any time. This removes the durability requirement from these stores, so they may be simpler and more efficient.

b) You can compare a rebuilt store against the current one. If there are any differences, there's either a bug in your event tracking code or a bug in the populator.

c) This consistency-check procedure is really useful during development and testing. When you make a change, it is really hard to get both the event-sourced store and the populator wrong but consistent with each other. This only happened to me due to mis-specified or mis-understood specifications, completely fencing out pure programming mistakes.

Of course, this system only works so long as your write volume can be ingested by your RDBMS. Luckily, this is the case on all of my current projects (and I would argue most projects out there). Notice that scaling reads should be much easier!

Finally, this architecture just accepts that read-errors may happen (e.g., a client doesn't see a transaction in their transactions list because some event got lost). This hasn't been a huge problem for us, since the mistake will be repaired on the next "reconciliation" process (along with a warning to get devs to investigate what happened). These reconciliations can be run as frequently or as sparsely as desired, or even on some user-initiated action (e.g.: the support guy has a button to "force reconciliation now" and instructions to press it whenever a customer complains about missing transactions).

ninjakeyboard7y ago

Yeah I'm using kafka right now with small queues between processes (generally implemented as elixir/OTP gen_server). With a blend of RDBMS (postgres which is now a swiss army knife), Redis (also a swiss army knife), and zookeeper (mostly because I know it well and it's there with kafka. It's very useful for co-ordination across processes.)

Because the events coalesce into kafka, you can use a mechanism like logstash to spew this log data into eg elastic search and then use that for your queries. Or write to the db here (in process) or there (cqrs style.) There are different architectural approach used, and they all yield good results, of very high reliability in processing, if slightly "eventual."

It works well. We do use rdbms too, but a lot of the time we avoid reading from it after initialization (or after encountering an exception causing an actor to restart). Depends on the data though. Some places I read on every command because I know the risk front is small and it's easier for a jr to get in there and understand what's happening. A good rule is that a single process should own that data so you don't have to worry much about consistency related concerns. In microservices good practices, each service should really own its own data completely but I believe that it's fair enough to say a bounded context owns its own data (even if in a shared rdbms) if building at a smaller scale.

In terms of code, I'm usually building with some variation of onion architecture these days, where all context is build on the outside (eg receiving a request, getting info from state or db), and pure domain logic 100% effect free exists on the inside that receives all of the data from the outside and turns out some commands or events in response. This makes the important logic very testable, and easy to reason about. Core domain logic has no effects - not even logging - but only generates events/commands in response to commands and events. You never mock that stuff - it's the star of the show. Nor do you ever have to. The generated commands and events are later "applied" by the outer layer, logging, shipping them off to kafka, and/or writing some stuff to a database.

That's where I'm at today. I am working in realtime systems so it's a good fit for the approaches. I'm finding the software is turning out very well. We're working fairly quickly, the software is very encapsulated to a bounded context, the domain logic is clear in the core, easy to test and read, and change. Or even rewrite if needed.

gazarsgo7y ago

It's really important to manage versioned schema for events and defining rules around evolution of schemas, hopefully with tooling support like via Avro's Schema Registry or Protobuf's forward and backwards compatibility guarantees.

sigi457y ago

Still like the idea but it is more complex to do it right and keep it right.

Would only do this with a good team and a real problem where this would be useful.

Rdbm goes very far

chrisweekly7y ago

CQRS: "Command Query Responsibility Segregation"

Fowler's post is predictably excellent: https://www.martinfowler.com/bliki/CQRS.html

pc867y ago

This is already cited and linked in the article (along with Fowler's article on event sourcing).

mamcx7y ago

CQRS & Event Sourcing are good ideas, but putting yourself ON PURPOSE with eventual consistency is nuts.

Until your are in the facebook-big-data scaling challenging, you don't need the madness of lose ACID. You can get terabytes of data easily on a single server, and even partition by company, customer or similar that still allow to keep domain-level consistence.

Now, I put the events after my normal CRUD operations (ie: I emit CRUD + Save to event in a single transaction). Is super easy of operate and keep coding familiar and predictable

----

I think the ideal EE database engine must be like:

Have a log Table, and a index/subtables for each validation (ie: to check uniques, aggregates, counts, etc.)

So, if a have a customer related events, I have:

- Index on: code, name

- SubTable: code, name, isactive

all the other fields are not need for validation so are recovered from the log.

In ACID:

- POST Command

- Validate data with the index,

- Save to log

- Emit blocking events (events that need to be at the same time after the save)

- Commit

Eventual:

- Emit lazy events (events that not need ACID, like to fill external sources)

marknadal7y ago

I'm glad somebody else said it. It seems lately a new meme is coming around (by many people independently) all saying:

Wait, event sourcing / immutable data doesn't scale.

I was doing event sourcing in 2010, and loved it. It was incredible. But by the time I started hearing people call it "event sourcing" I had already moved on to:

State-based, graph CRDTs.

They combine the best of event sourcing with the best of distributed state replication, and are super scalable!

Now even the Internet Archive[1] is running it (in 2014 I implemented it into a library that they are now using - https://github.com/amark/gun )

[1] https://news.ycombinator.com/item?id=17685682

ulcica7y ago

They didn't even tell me what event sourcing is.

LandR7y ago

As I understant it:

Consider you have a database where you store the account balance.

If you want to update the account balance you might update the row for that customer, e.g.

tblAccounts ------------ | AccountHolderId | AccountBalance |

update tblAccounts set AccountBalance = @NewAccountBalance;

In an EventSource database instead you wouldn't update the AccountBalance column. You would store something like:

Then ifyou wnat o get the current balance you can just take the opening balance and add 100, add 150 and subtract 80.

Periodically you need to collapse these as querying for the balance could end up requiring going through a large log of events. So you snapshot at some point in time. So assuming an opening balance of zero, we could snapshot the above to 170.00.

It feels like you get auditing / logging of all changes out of the box, also if you are working in functional programming or a system where you constrain side / mutability as much as possible you sort of eliminate your db as a giant mutable object. But you also get the downsides this article talked about.

jessaustin7y ago

One would expect all DB-based live accounting systems to store balances as a series of transactions? Even if you have a view that sums over that or a regular process to collapse old transactions, there's really no other sensible way to do it. Concurrent updates to the same row will always cause problems.

...Having written the above, I just examined the ERD for the COTS customer system in the office I'm in now. It stores not just balances but aged balances directly in the main customer table. Good grief.

beiller7y ago

Account balance is a bad example, because it cannot be solved using event sourcing. A key to event sourcing is eventual consistency. The account balance cannot be eventually consistent, otherwise it will allow double-spending. It has to be immediately consistent.

1 more reply

yakshaving_jgt7y ago

Only store the events, and derive the current state by chronologically reducing the events.

That's all Event Sourcing is.

troels7y ago

It's in the second paragraph. But I agree that it's a terrible name.

toolslive7y ago

Earlier names for the same include prevalence and intent-logging.

BerislavLopac7y ago

In a nutshell, instead of storing a current state of the data in a database, you store the "event", i.e. the information about the data being changed. You can also additionally save the current state, either at the time the event is saved or independently, but that simply acts like a cache when reading data; otherwise you would have to replay the whole events history to get to the latest state.

sundvor7y ago

Just think of it as double-entry bookkeeping. It's been around for a while. :) The events that we record become our source of truth.

shady-lady7y ago

> think of it as double-entry bookkeeping

just think of it as bookkeeping, period. a ledger of all events. absolutely nothing to do with double entry (which for accounting; from wiki: "The double entry has two equal and corresponding sides known as debit and credit.")

trhway7y ago

Sounds like OLTP transactions and aggregated state only with hip names.

jackthebadcatOP7y ago

Guys, just need to say that this article is not mine!

ryanmarsh7y ago

If we choose to build a business critical functionality around this eventual consistency can have dire ramifications. There are use cases that availability is the needed property of a system but there are also use cases where consistency also is, where is better to not make a decision rather than making it based on stale information.

I see this issue raised quiet often. If consistency is paramount you can make commands on certain aggregates be synchronous all the way to updating the read model. There's nothing that says you MUST have a queue in-between the event log for an aggregate and the logic that updates the read model. Use common sense.

Typically shines the most when pinpointing parts of the system that benefit from it, identifying a specific bounded context in DDD terms, but never on a whole system.

I feel like Greg Young has taken great pains to make this clear. This should be taken for granted when attempting CQRS.

Also your events will be based on a SomethingCreated or SomethingUpdated which has no business value at all. If the events are being designing like this then it is clear you’re not using DDD at all and you’re better of without event sourcing. Finally, depending on the requirements on how the synchronous the UI and the flow of the task is the eventual consistency can, and most of the times will, have a klinky feel to it and deliver a poor user experience.

If the read and write model are being updated asynchronously from the UI you're gonna have to adopt an optimistic caching scheme on the client. This is why GraphQL subscriptions are pretty much boilerplate for any client I build against a CQRS service. The Apollo client seems to handle this rather well.

Converting data between two different schemas while continuing the operation of the system is a challenge when that system is expected to be always available. Due to the very nature of software development new requirements are bound to appear that will affect the schema of your events that is inevitable.

I hereby give you permission to use the Strategy Pattern. Problem solved.

The events can’t be too small, neither too large they have to be just right. Having the instinct to get it right requires an extensive knowledge of the system, business and consumer applications, it’s very easy to choose the wrong design.

Greg Young and others have talked quite a bit about how to bound aggregates.

However the events in a event store are immutable and can’t be deleted, to undo an action means sending the command with the opposite action.

This is why bookkeeping systems have the idea of "journal entries". I haven't implemented one for an event sourced system but I can see how this might work.

Overall great post. Really enjoyed that the author took the time to walk us through all of these issues. Most are non-trivial.

j / k navigate · click thread line to collapse

77 comments

lscharen7y ago

The last section on Operational Flexibility and the inability to change the event history raises a very good point.

spitfire7y ago

Basically implement double entry accounting.

I think ideally you want something like Rich Hickey speaks about when he speaks of Datomic. An append only database. You can see what the previous values for that row were, along with schema changes.

fusiongyro7y ago

I worry about GDPR with respect to Kafka and other append-only databases.

1 more reply

Yokohiii7y ago

Idk why people are saying that ES data is always immutable. They can be by default sure, but if a facility is useful to change the history, why not?

hesdeadjim7y ago

3 more replies

wst_7y ago

In many cases they are legal proof that something happened. If you can mess with history, proof can't be used in court.

joefreeman7y ago

Good article. I've spent the last year migrating to an event sourced system, so thought I'd share some thoughts.

sbov7y ago

> It's up to the read model to ask, 'has this invitation already been accepted?'

I feel like you've skipped over the interesting part of your strategy here. If it's an eventually consistent system, what keeps the read model from having the wrong answer to this question?

hvidgaard7y ago

Not OP, but I am working with a CQRS system.

Having the read side handle it would probably mean that when you have a read model for accepted invitations, you ignore all but the first accepted of an "invitationId".

tonyhb7y ago

I've been planning out an event sourcing like system for our healthtech startup, and you're definitely right re namespacing and versioning.

This lets us replay events as they happened, run an event using two different versions and perform only the diffs etc.

Our system is probably not a typical CQRS/event sourcing setup.

The event handler validates the event, returning errors if necessary.

Then, the event handler runs all side effects and returns operations to perform: update model X attributes to Y, insert new record Z.

In effect, we go:

User -> API -> (generate event) -> Event Handler -> (error|response) -> save event and side effects in a transaction.

This means our DB is a cache of the event and all previous data, so we're not really event sourcing — we're audit trailing.

The main benefits are:

1. Medical records are complex and we always need audit trails.

3. Engineers know that the API produces events and can look up exactly the side effects that happen when an event occurs (we're using Rails for the API logic right now and this isn't always obvious).

4. We can ensure that we validate when an event happens based on input and current state without complex code, catching edge cases.

6. The "side effects" response from the event handler can be sent to a websocket observable and consumed by frontends, ensuring that the doctor UI always has an up to date version of patient data.

Random thoughts:

----

hesdeadjim7y ago

This is exactly what we did in our system and it worked wonderfully. It also has the side effect of avoiding locks or contention when doing such mutations.

7sigma7y ago

"However the events in a event store are immutable and can’t be deleted, to undo an action means sending the command with the opposite action"

Well they don't have to be immutable. I don't see why you can't update/migrate events.

superqwert7y ago

CQRS/ES is a pattern and doesn't need to be shoe-horned into every solution.

We still have problems with EC and CAP, and they are not due to CQRS/ES. Make the system distributed and you will have to handle all the new scenarios. Those are the trade-offs.

vorpalhex7y ago

> can also be synchronous

Then you're giving up several of the benefits of CQRS, and might as well just not bother with the additional complexity.

> Well they don't have to be immutable. I don't see why you can't update/migrate events.

victorNicollet7y ago

>> can also be synchronous > Then you're giving up several of the benefits of CQRS, and might as well just not bother with the additional complexity.

1 more reply

zihotki7y ago

jwhitlark7y ago

Encryption + lose the key policies seem to satisfy the GDPR. So you don't have to actually delete an event, you can just give up your ability to read its payload.

victorNicollet7y ago

But if strong consistency was abandoned because someone wrote general statements in favor of eventual consistency...

dmitriid7y ago

The solution is said to be CQRS, and nothing in the article shows how you solve that.

If you unwind the unnecessarily winded florid poetic waxing, it all comes down to “dump it to a database, and query that”.

voiceofunreason7y ago

"dump it to a database, and query that"

Yes.

There's no magic.

oftenwrong7y ago

In practice it is typical to maintain a snapshot of the current state in a more-query-able form. In some systems, event writes and their corresponding snapshot updates occur within a transaction.

shady-lady7y ago

Hopefully bi-temporal tables get added to Postgres soon.

I think it would cover a few of the use cases that people are turning to event sourcing for(excluding scale).

theptip7y ago

I don't see much discussion of event-sourcing simply using a SQL database (i.e. skipping the CQRS part). This would allow you to keep your CP (strongly-consistent) semantics.

Does anyone have opinions/experience on this approach?

marxama7y ago

tomerbd7y ago

Don't you still need a queue in your example? do you lock the whole table when you insert a new record to the table? can you elaborate how this solves consistency?

theptip7y ago

I don't think you need a separate queue; if you have an "Events" table then you can just write everything there.

It solves the consistency problem because you can create your event inside a transaction, which will rollback if another event touching the same source is created simultaneously.

E.g. if you have these incompatible events in a ledger:

CreditAccount(account_id=123, amount=100)

DebitAccount(account_id=123, amount=60)

1 more reply

aaronmu7y ago

I’ve implemented this.

TLDR: orm’s + large transactions resulted in a lot of unexpected complexity.

theptip7y ago

dustingetz7y ago

Strong consistency is possible with this model, consider for example git.

One real database that works like this is Datomic, which is competitive with SQL for most kinds of read-heavy data modeling loads that SQL and CQRS is used for.

subsubsub7y ago

All the issues the author talks about are valid, however, the idea that no one warns you about them is wrong.

If you do even the most basic research into Event Sourcing you will find people warning of these trade offs (foremost amongst them Greg Young, the person who originally set out the idea)

likeclockwork7y ago

So, where does that happen? When getting written into the read model? Then what, it emits an event saying clearly that the previous action failed? All while there's a client waiting for results?

I find the concept intriguing but I don't really get it and I haven't been able to identify in any of the writing where "the buck stops".

paragraft7y ago

likeclockwork7y ago

Thanks for answering. Is Eventual Consistency a necessary property of a CQRS/Event Sourcing architecture?

1 more reply

ninjakeyboard7y ago

raarts7y ago

I've re-read your post a couple of times, but still find it confusing. It reads like you're an early-adopter, and got burnt multiple times by design flaws, and people not being familiar with it.

Suppose ES/CQRS is designed well, and a well-understood mechanism, would you still move away from it? For which scenarios? I can imagine exactly-once delivery not being a problem in all scenarios.

And what is the problem with long-lived entities? Solely the fact it takes longer to reconstruct current state?

ninjakeyboard7y ago

marcosdumay7y ago

mrdoops7y ago

Elixir is fantastic, especially in these use cases because it allows for stateful in-memory representations of an entity with less development burdens than elsewhere.

ninjakeyboard7y ago

kilburn7y ago

I admittedly don't have your level experience, especially with such huge scale. However, I've worked with smaller systems and have found a "sweet-spot" that worked well in my cases:

1. Keep a traditional RDBM system. Place in here everything that needs consistency (e.g., the bank's accounts, balances and transactions) or has to be stored indefinitely.

3. For each query-able store, implement a process that can populate it from the RDBMS data. This serves several purposes:

a) You can rebuild any such stores at any time. This removes the durability requirement from these stores, so they may be simpler and more efficient.

b) You can compare a rebuilt store against the current one. If there are any differences, there's either a bug in your event tracking code or a bug in the populator.

ninjakeyboard7y ago

gazarsgo7y ago

sigi457y ago

Still like the idea but it is more complex to do it right and keep it right.

Would only do this with a good team and a real problem where this would be useful.

Rdbm goes very far

chrisweekly7y ago

CQRS: "Command Query Responsibility Segregation"

Fowler's post is predictably excellent: https://www.martinfowler.com/bliki/CQRS.html

pc867y ago

This is already cited and linked in the article (along with Fowler's article on event sourcing).

mamcx7y ago

CQRS & Event Sourcing are good ideas, but putting yourself ON PURPOSE with eventual consistency is nuts.

Now, I put the events after my normal CRUD operations (ie: I emit CRUD + Save to event in a single transaction). Is super easy of operate and keep coding familiar and predictable

----

I think the ideal EE database engine must be like:

Have a log Table, and a index/subtables for each validation (ie: to check uniques, aggregates, counts, etc.)

So, if a have a customer related events, I have:

- Index on: code, name

- SubTable: code, name, isactive

all the other fields are not need for validation so are recovered from the log.

In ACID:

- POST Command

- Validate data with the index,

- Save to log

- Emit blocking events (events that need to be at the same time after the save)

- Commit

Eventual:

- Emit lazy events (events that not need ACID, like to fill external sources)

marknadal7y ago

I'm glad somebody else said it. It seems lately a new meme is coming around (by many people independently) all saying:

Wait, event sourcing / immutable data doesn't scale.

I was doing event sourcing in 2010, and loved it. It was incredible. But by the time I started hearing people call it "event sourcing" I had already moved on to:

State-based, graph CRDTs.

They combine the best of event sourcing with the best of distributed state replication, and are super scalable!

Now even the Internet Archive[1] is running it (in 2014 I implemented it into a library that they are now using - https://github.com/amark/gun )

[1] https://news.ycombinator.com/item?id=17685682

ulcica7y ago

They didn't even tell me what event sourcing is.

LandR7y ago

As I understant it:

Consider you have a database where you store the account balance.

If you want to update the account balance you might update the row for that customer, e.g.

tblAccounts ------------ | AccountHolderId | AccountBalance |

update tblAccounts set AccountBalance = @NewAccountBalance;

In an EventSource database instead you wouldn't update the AccountBalance column. You would store something like:

Then ifyou wnat o get the current balance you can just take the opening balance and add 100, add 150 and subtract 80.

jessaustin7y ago

beiller7y ago

1 more reply

yakshaving_jgt7y ago

Only store the events, and derive the current state by chronologically reducing the events.

That's all Event Sourcing is.

troels7y ago

It's in the second paragraph. But I agree that it's a terrible name.

toolslive7y ago

Earlier names for the same include prevalence and intent-logging.

BerislavLopac7y ago

sundvor7y ago

Just think of it as double-entry bookkeeping. It's been around for a while. :) The events that we record become our source of truth.

shady-lady7y ago

> think of it as double-entry bookkeeping

trhway7y ago

Sounds like OLTP transactions and aggregated state only with hip names.

jackthebadcatOP7y ago

Guys, just need to say that this article is not mine!

ryanmarsh7y ago

Typically shines the most when pinpointing parts of the system that benefit from it, identifying a specific bounded context in DDD terms, but never on a whole system.

I feel like Greg Young has taken great pains to make this clear. This should be taken for granted when attempting CQRS.

I hereby give you permission to use the Strategy Pattern. Problem solved.

Greg Young and others have talked quite a bit about how to bound aggregates.

However the events in a event store are immutable and can’t be deleted, to undo an action means sending the command with the opposite action.

This is why bookkeeping systems have the idea of "journal entries". I haven't implemented one for an event sourced system but I can see how this might work.

Overall great post. Really enjoyed that the author took the time to walk us through all of these issues. Most are non-trivial.

j / k navigate · click thread line to collapse