The one downside is that shifting your business logic to read-time means that you need to have very efficient ways of accessing and memoizing derived data. For some applications, this can be as simple as having the correct database indices over your WhateverUpdates tables, fetching all updates into memory and merging on each request. For others, you'll need to have a real-time stream processing pipeline to preemptively get your derived data into the right shape into a cache. And those are more moving parts than your typical monolith app, but the
One benefit to actually using event sourcing with a stream processing system is that, in many cases, it can be the most effective way to scale both traffic capacity and organizational bandwidth, much in the same way that individually scalable microservices can (and fully compatible with that approach!). Martin Kleppman at Confluent (a LinkedIn spinoff creating and consulting on stream processing systems) writes some great and highly-approachable articles about this. Highly recommended reading.
http://www.confluent.io/blog/making-sense-of-stream-processi...
http://www.confluent.io/blog/turning-the-database-inside-out...
I'm worried that event sourcing is going to become this year's over-applied design pattern with libraries in every language for every database with blog posts that recommend it be used on every project.
It's a good idea, very useful - in the right hands on the right projects. But it makes sense that junior devs normally use CRUD because that's normally the right solution. At least until better tools come along.
If by "works well", you mean it works until someone asks for historical data - then IT guy has to say w/ a straight face "we lost it". This is unacceptable considering the value of data and the strategic leverage it can have today.
Considering immutable facts tables are the most stable data model; companies often have to re-invent it (poorly) on top of relational at some point; that storage is often not a problem; and that having clean historical data is crucial for data science; there are increasingly fewer excuses to not adopt a sane data model from day one.
I agree partially w.r.t. to tooling - few implementations aid adopting this pattern, but I believe the value of historical data, over time, overcomes not being able to slap some quick Rail CRUD together and then being stuck at local minima.
That isn't to say it won't happen, but I think it's more likely that teams would miss an opportunity to leverage it than leverage it inappropriately.
Did you spot all those command-to-query-to-event-to-log-to-storage data type conversions in those pretty diagrams? That's a whole bunch of needless reshaping of data as it flows through the system.
For each one of those data transformations to be successful, there has to be accurate communications between people and bug free code written in the data conversion and routing of messages through the system. All those moving parts make changing the system extremely painful, lotsa ripple effects - and every time you have to make a change to your events, you'd have a data migration project for any running event streams.
Naming things is hard too, and there's a lot more naming of entities needed in a CQRS-ES system.
I like all the promised benefits of a CQRS and ES, but I can't imagine a case where I'd take the risk of attempting it on anything but a toy project. Perhaps if I was on the version 5 rewrite project for an insanely profitable system where the requirements and design are completely understood up-front. I would need to grok some canonical example of a large, well-architected, well-implemented representative system before I would ever attempt to implement one.
Are there any non-toy examples of successful CQRS-ES with open source available to read? Did those projects go over-budget, and by how much? Would the authors of those examples still recommend the architecture now that they've gone through the experience?
The vast majority of the things you will ever program are pretty much guaranteed from one statement to the next. Hard boundaries, where things can fail, are often decently understood and actually quite visible in the code.
Moving everything to be an event completely throws this out the window. You can take a naive view, where you pretend from one event to the next is safe to happen. However, to start building up the system to cope when this is not the case starts to build a complicated system. In areas that are decidedly not related to your business domain. (Well, for most of us.)
Maybe some day there will be a system that helps with this. Until then, my main advice is to make sure you have solved your system with a naive solution before you move on.
I became disillusioned with doing the naive solution, for two reasons:
Found it to be impossible (from a time/project mgmt perspective) to ever replace it with the "non-naive" one, so it turns into the usual mess because CRUD doesn't work well as you load more functionality on it over time.
Secondly ... it's thinking machines. From a business perspective, does it still make sense to hand code glorified rolodexes without behaviour? Maybe Excel does the trick. Seeing it as a red flag if someone asks me to build dumb data entry forms in 2016.
Therefore, I always start eventsourced theses days. YMMV.
* If you are in-memory and in-process, why bother with the events in the first place? (Simpler put, why not go with the simpler process based solutions?)
* If you are not testing distributed, how do you know you will be able to distribute?
In particular, there is a very large chance that you will have the same difficulty in replacing an in-memory/process solution that you would have had with a naive one.
And don't underestimate the amount of manpower you can get with success. Nor the amount of features that will not help get success.
This is either an antipattern or unrelated to event-sourcing depending on how you read it. "Event sourced" means that state within a transaction boundary is built ONLY through events which are regarded as persistent and immutable; evented is the term I'll use for state which is built when events happen, when those events may or may not be recorded, and may happen before or after state is changed and are independent of the state change.
If you meant "evented" then I would say that there are lots of message-based systems that aren't failing hard, and a lot that are, which says to me that there are patterns some of those systems are using to manage the nature of async, evented development that others aren't.
If you mean "event sourcing" then the application of event-sourced data is neither an application architecture nor appropriate for all areas of your application[1,2]. If you were trying to apply event-sourcing in this way its not surprising you ran into problems with it.
> Until then, my main advice is to make sure you have solved your system with a naive solution before you move on.
It is important to really know and understand the problem domain you are applying ES to. Having a strategy to upgrade streams to new versions of your domain model is a good idea if you're applying ES to a business without a well-understood domain. However, it is very, very difficult to back into an ES implementation from a "naive" solution, which I'm reading as "CRUD".
[1] https://www.youtube.com/watch?v=LDW0QWie21s [2] https://www.infoq.com/news/2016/04/event-sourcing-anti-patte...
Regardless, I meant the idea of moving everything to a communication of events between subsystems. To be fair, the sibling read me correctly and stated that if you just ignore the distributed nature of events, then things aren't that hard.
However, it is easy to follow the lure of "I'll go ahead and make this work for the distributed case" from the beginning. This is for two reasons. First, why not? :) Second, it is seen as something that would be very hard to add in later.
So, I can't disagree that it is an anti-pattern or unrelated. However... I would be surprised if the next version of this article didn't go over the distributed nature. Indeed, it already covers how this more correctly mirrors the distributed nature of the organization.
We are currently using ES end to end for a distributed application including a wearable device, iOS app and scala backend, and up to now, the things that broke hard in the system are the naive/non ES parts.
For information, on the server-side, we are currently experimenting with GetEventStore (https://geteventstore.com/), which seems to be working well for us.
My main thoughts are anywhere you are trying to hide that distributed nature, things will go awry. Add to that, anywhere you have introduced the potential for things to be distributed.
The sibling post about keeping it simple as you build up your eventing system is pretty accurate. Remember you are trying to solve an actual customer problem. Keep pointed on that and do not get distracted by any neat engineering problems that come along the way. (This is not to say you will not have to solve some... but if you are solving a neat problem that was not needed for the customer's problem, you are going to have trouble.)
Events for writes and current data for most reads.
However ... if going with eventual consistency, it's essential that the write side is consistent. The read models - not so much. This is because every aggregate (... if Domain Driven Design terminology is your thing) represents a transactional boundary within which state needs to be consistent in order for the business rules to work correctly. Read models are more forgiving.
In the first case, you keep however you were loading/saving your entities, and modify them to emit events as they mutate. Then you can play with sending those events to external systems, or using them to drive read-models for queries, etc.
In the second case, you go further, and "dogfood" those events so that they are the authoritative record of what the entity is at a given point in time.
For building up a picture of the world, it's pretty good. It's very nice to be able to replay a log of events and recreate a view of the way things are expected to be; if there's a bug in your code, you can fix it and repeat the replay to get back into a good state (with caveats, sometimes later actions creating events may be dependent on an invalid intermediate state). Whereas mutating updates erase history, perhaps with some ad-hoc logging on the side that is more often than not worthless for machine consumption.
For decoupled related action, it's not too bad. If you have some subsystem that needs to twiddle some bits or trigger an action when it sees an event go by, it just needs to plug into the event stream, appropriately filtered.
For coordinated action OTOH, e.g. a high-level application business-logic algorithm, you need to start thinking in terms of explicit state machines and, in the worst case, COMEFROM-oriented programming[1]. Depending on how the events are represented, published and subscribed to, navigating control flow involves repeated whole-repo text searching.
It's best if your application logic is not very complicated and inherently suitable to loose coupling, IMO.
https://engineering.linkedin.com/distributed-systems/log-wha...
http://martinfowler.com/eaaDev/EventSourcing.html
Note that Martin's blog is what inspired the event bus in https://home-assistant.io, an open source home automation project I occasionally contribute to.
Locking across streams is an anti-pattern / smell. It can be done (as can anything) but it usually points to a modelling problem. Example: cancelling an amazon order is a _request_ that is in a race with the fulfillment system (boundary); it may or may not be successful.
So, you get to pick, "at most once" or "at least once." And then you need to build your system to act accordingly.
There's also a great presentation by the developer, Allard Buijze, at https://www.youtube.com/watch?v=s2zH7BsqtAk.
Imagine tooling that allowed an event stream to be used to create state for testing modules, crudlike helpers to allow crud-familiar developers to think that way at first, and workflows based on snapshots, rewind, etc.
I think a model that used events that correlated to graph deltas rather than crud deltas would be the cat's ass, and many queries about the near-current state could be handled efficiently using ephemeral subgraphs as indexes located at the network's edges.
If anyone wants to discuss and possibly build some of this stuff, let me know :)
i know where you're going with this, and i honestly believe its a terrible idea (not to be discouraging or rude—just experienced.)
if your event streams contain mostly CRUD (possibly ANY) then you're most likely applying it incorrectly. Its not just a version history of your data. The event type itself is data, which provides context and semantics over and above the notion of writes and deletes. If you're falling back to crud events all you're doing is creating a lot more work for yourself and deriving almost no benefit from the use of ES—in that case, you should just use CRUD and the ORM of your choice.
Right. A good way to think about this is that as with rows in an RDBMS, events in an ES system are facts, and just as tables in an RDBMS define a category of facts with a particular shape, event-types in ES do the same thing. The difference is that whereas in an RDBMS the facts represented by rows can be general (and are often, in many designs, facts about the current state of the world), events are facts about a specific occurrence in the world rather than the state of the world (and the "state of the world" is an aggregate function of the collection of events.)
Thanks for that. I'd made that mistake: I have a system which now needs to become distributed (a copy of it goes offline for a couple of weeks, and has to merge back into the main datastore) and keep a history of changes. It's currently CRUD backed by MySQL, and I'd latched onto event sourcing as what I'd need.
> The event type itself is data, which provides context and semantics over and above the notion of writes and deletes.
OK, going to have to get my head around that :)
We ended up going with microservices that pub/sub events into Kafka, but maintain their own databases. There's another microservice that lets you query past events for statistics.
This article was extremely helpful to me for understanding some solutions in this space.
http://www.confluent.io/blog/turning-the-database-inside-out...
Scaling that up by adding asynchronicity and more ambitious plumbing when needed seems reasonably straightforward. For something more out-of-the-box, see https://geteventstore.com/ . It has clients in a variety of languages. Comes with a nice HTTP API too.
I wouldn't normally read the entire event stream; usually, only the state of a particular object (aggregate, in Domain Driven Design speak) is of interest, E.g. the customer with id 12345. Events contain the aggregate ID, so the query to whatever event store you use would be "give me all events with aggregate ID 12345".
GetEventStore documentation has some examples of how you can create projections (https://geteventstore.com/blog/20130212/projections-1-theory...), which you can use as inspiration to build your own projections.
[1] https://groups.google.com/forum/#!forum/dddcqrs
[2] https://www.amazon.com/Enterprise-Integration-Patterns-Desig...
[3] https://www.amazon.com/Implementing-Domain-Driven-Design-Vau...
[4] https://www.amazon.com/Domain-Driven-Design-Tackling-Complex...
[0]: https://github.com/reactjs/redux/issues/891#issuecomment-158...
In practice, that can be challenging, but it doesn't seem fundamentally more challenging than any other legacy data conversion effort.
So the first step is to disentangle all the data and encapsulate it, trying to prevent others from using it, so you have full control over it. This includes tracking down any other system using this data, and ensuring they too go through the database. And you have to do this for one subsystem at a time, often in several iterations.
You won't have any history though.