https://github.com/restatedev/ https://restate.dev/
It is free and open, SDKs are MIT-licensed, runtime permissive BSL (basically just the minimal Amazon defense). We worked on that for a bit over a year. A few points I think are worth mentioning:
- Restate's runtime is a single binary, self-contained, no dependencies aside from a durable disk. It contains basically a lightweight integrated version of a durable log, workflow state machine, state storage, etc. That makes it very compact and easy to run both on a laptop and a server.
- Restate implements durable execution not only for workflows, but the core building block is durable RPC handlers (or event handler). It adds a few concepts on top of durable execution, like virtual objects (turn RPC handlers into virtual actors), durable communication, and durable promises. Here are more details: https://restate.dev/programming-model
- Core design goal for APIs was to keep a familiar style. An app developer should look at Restate examples and say "hey, that looks quite familiar". You can let us know if that worked out.
- Basically every operation (handler invocation, step, ...) goes through a consensus layer, for a high degree of resilience and consistency.
- The lightweight log-centric architecture gives Restate still good latencies: For example around 50ms roundtrip (invoke to result) for a 3-step durable workflow handler (Restate on EBS with fsync for every step).
We'd love to hear what you think of it!
https://restate.dev/blog/solving-durable-executions-immutabi...
https://restate.dev/blog/code-that-sleeps-for-a-month/
The key takeaways:
1. Immutable code platforms (like Lambda) make things much more tractable - old code being executable for 'as long as your handlers run' is the property you need. This can also be achieved in Kubernetes with some clever controllers
2. The ability to make delayed RPCs and span time that way allows you to make your handlers very short running, but take action over very long periods. This is much superior to just sleeping over and over in a loop - instead, you do delayed tail calls.
My job is admittedly very old-school, but is that actually doable? I dont think my stakeholders would accept a version of "well we can't fix this bug for our current customers, but the new ones wont have it". That just seems like a chaos nobody wants to deal with.
I'll have to think through how much that solves, but it's a new insight for me - thanks!
I like that you're working on this. seems tricky, but figuring out how to clearly write workflows using this pattern could tame a lot of complexity.
If we deploy a new version of the workflow, we just keep around the existing deployed version until all of its in-flight runs are completed. Usually this can be done within a few minutes but sometimes we need to wait days.
We don't actually tie service releases 1:1 with the workflow versions just in case we need a hotfix for a given workflow version, but the general pattern has worked very well for our use cases.
The only caveat being that we generally recommend that you keep it to just a few minutes, and use delayed calls and our state primitives to have effects that span longer than that. Eg, to poll repeatedly a handler can delayed-call itself over and over, and to wait for a human, we have awakeables (https://docs.restate.dev/develop/ts/awakeables/)
More discussion: https://restate.dev/blog/code-that-sleeps-for-a-month/
You still have to ensure that all versions of handler code that may potentially be activated are fully compatible with all persisted state they may be expected to access, but that's not much different from handling rolling deployments in a large system.
The way we do that is by writing down what your code is doing, while its doing it, to a store. Then, on any failure, we re-execute your code, fill in any previously stored results, so that it can 'zoom' back to the point where it failed, and continue. It's like a much more efficient and intelligent retry, where the code doesn't have to be idempotent.
Is that true? I don't think that makes any theoretical sense, since I'm pretty sure the whole thing relies on transparent retries for external calls.
If I complete some action that can't be retried and then die before writing it to the log (completing an action unatomically) there would seem to be no way for this to recover without idempotency.
What if the first call is to get a resource that expires and then the last call fails?
Now it will retry but with an expired resource (first call is saved).
Would a good example be something like, automated highway toll collecting? i.e. I drive past a scanner on the highway, my license plate is scanned and several state bound collection events need to be triggered until the toll is ultimately collected?
Question for OP: I'd bet Flink's Statefuns comes in Restate's story. Could you please comment on this? Maybe Statefuns we're sort of a plugin, and you guys wanted to rebase to the core of a distributed function?
Yes, Flink Stateful Functions were a first experiment to build a system for the use cases we have here. Specifically in Virtual Objects you can see that legacy.
With Stateful Functions, we quickly realized that we needed something built for transactions, while Flink is built for analytics. That manifests in many ways, maybe most obviously in the latency: Transactional durability takes seconds in Flink (checkpoint interval) and milliseconds in Restate.
Also, we could give Restate a very different dev ex, more compatible with modern app development. Flink comes from a data engineering side, very different set of integrations, tools, etc.
> Stateful Functions (in Apache Flink): Our thoughts started a while back, and our early experiments created StateFun. These thoughts and ideas then grew to be much much more now, resulting in Restate. Of course, you can still recognize some of the StateFun roots in Restate.
The full post is at: https://restate.dev/blog/why-we-built-restate/
It's a terrible language for concurrency and transitive dependencies can cause panics which you often can't recover from.
Which means the entire ecosystem is like sitting on old dynamite waiting to explode.
JVM really has proven itself to be by far the best choice for high-concurrency, back-end applications.
You can absolutely do something similar with a RDBMS.
I tend to think of building services in state machines: every important step is tracked somewhere safe, and causes a state transition through the state machine. If doing this by hand, you would reach out to a DBMS and explicitly checkpoint your state whenever something important happens.
To achieve idempotency, you'd end up peppering your code with prepare-commit type steps where you first read the stored state and decide, at each logical step, whether you're resuming a prior partial execution or starting fresh. This gets old very quickly and so most code ends up relying on maybe a single idempotency check at the start, and caller retries. You would also need an external task queue or a sweeper of some sort to pick up and redrive partially-completed executions.
The beauty of a complete purpose-built system like Restate is that it gives you a durable journal service that's designed for the task of tracking executions, and also provides you with an SDK that makes it very easy to achieve the "chain of idempotent blocks" effect without hand-rolling a giant state machine yourself.
You don't have to use Restate to persist data, though you can - and you get the benefit of having the state changes automatically commit with the same isolation properties as part of the journaling process. But you could easily orchestrate writes into external stores such as RDBMS, K-V, queues with the same guaranteed-progress semantics as the rest of your Restate service. Its execution semantics make this easier and more pleasant as you get retries out of the box.
Finally, it's worth mentioning that we expose a PostgreSQL protocol-compatible SQL query endpoint. This allows you to query any state you do choose to store in Restate alongside service metadata, i.e. reflect on active invocations.
(1) it is really helpful in getting good latencies.
(2) it makes it self-contained, so easy to start and run anywhere
(3) There is a simplicity in the deeply integrated architecture, where consensus of the log, fencing of the state machine leaders, etc. goes hand in hand. It removes the need to coordinate between different components with different paradigms (pub-sub-logs, SQL databases, etc) that each have their own consistency/transactions. And coordination avoidance is probably the best one can do in distributed systems. This ultimately leads also to an easier to understand behavior when running/operating the system.
(4) The storage is actually pluggable, because the internal architecture uses virtual consensus. So if the biggest ask from users would be "let me use Kafka or SQS FIFO" then that's doable.
We'd love to go about this the following way: We aim to provide an experience than is users would end up preferring to maintaining multiple clusters of storage systems (like Cassandra + ElasticSearch + X server and Y queues) though this integrated design. If that turns out to not be what anyone wants, we can still relatively easily work with other systems.
Let me elaborate it: first of all, what would be the killer feature that justifies creating a whole new PL for durable execution? From what I can tell, the thing that IMO can really make a difference would be the ability to completely hide durable execution from the user, by being able to take snapshots of the execution at any point in time and then record those in the engine transparently. Now let's say such language exists, and it can also take those snapshots reasonably fast, it is still quite a problem to establish where it's logically safe to take a snapshot, and when the execution cannot continue because you need to wait acknowledgment for stored results. Say for example you have the following code:
val resultA = callA() val resultB = callB(resultA)
Both A and B do some non-deterministic operation, e.g. they perform HTTP calls to some other systems. Now let's say that when callB() completed, but before you got the HTTP response, your code for whatever reason crashes. If you didn't took any snapshot between callA() and callB(), you will completely lose forever the fact that B was invoked with resultA, and the next time you re-execute A, it might generate a result that is different from the one that was generated the first time. Due to this problem, you would still need to somehow manually define some "safepoints" where it's safe to take those snapshots. Meaning that we can't really hide the durable execution from the user, as you would still need some statement like "snapshot_here" to tell the engine where it's safe to snapshot or not.
In our SDKs we effectively implement that, by taking the safe approach of always waiting for storage acknowledgement when you execute two consecutive ctx.run().
But happy to be proven wrong!
In my mind, this moved restate from “huh, that’s cool” to “during tomorrow’s standup, I’m going to ask one of my engineers to build a poc.”
Evaluating a selection of these durable workflow SDKs for Go, I'm not keen on being tightly coupled to a vendor and the implementation shouldn't be that crazy to fit into existing Go interfaces.
The wording in the additional grant labels software like this as an Application Platform Service -- which is fair, and perhaps intended, but we're still not a big cloud service provider.
1. Max execution duration of a workflow
2. Max input/output payload size in bytes for a service invocation
3. Max timeout for a service invocation
4. Max number of allowed state transitions in a workflow
5. Max Journal history retention time
2. Restate currently does not impose a strict size limit for input/output messages by default (it has the option to limit it though to protect the system). Nevertheless, it is recommended to not go overboard with the input/output sizes because Restate needs to send the input messages to the service endpoint in order to invoke it. Thus, the larger the input/output sizes, the longer it takes to invoke a service handler and sending the result back to the user (increasing latency). Right now we do issue a soft warning whenever a message becomes larger than 10 MB.
3. If the user does not specify a timeout for its call to Restate, then the system won't time it out. Of course, for long-running invocations it can happen that the external client fails or its connection gets interrupted. In this case, Restate allows to re-attach to an ongoing invocation or to retrieve its result if it completed in the meantime.
4. There is no limit on the max number of state transitions of a workflow in Restate.
5. Restate keeps the journal history around for as long as the invocation/workflow is ongoing. Once the workflow completes, we will drop the journal but keep the completed result for 24 hours.
You can store a lot of data in Restate (workflow events, steps). Logged events move quickly to an embedded RocksDB, which is very scalable per node. The architecture is partitioned, and while we have not finished all the multi-node features yet, everything internally is build in a partitioned scalable manner.
So it is less a question of what the system can do, maybe more what you want:
- if you keep tens of thousands of journal entries, replays might take a bit of time. (Side note, you also don't need that, Restate's support for explicit state gives you an intuitive alternative to the "forever running infinite journal" workflow pattern some other systems promote.)
- Execution duration for a workflow is not limited by default. More of a question of how long do you want to keep instances older versions of the business logic around?
- History retention (we do this only for tasks of the "workflow" type right now) as much as you are willing to invest into for storage. RocksDB is decent at letting old data flow down the LSM tree and not get in the way.
Coming up with the best possible defaults would be something we'd appreciate some feedback on, so would love to chat more on Discord: https://discord.gg/skW3AZ6uGd
The only one where I think we need (and have) a hard limit is the message size, because this can adversely affect system stability, if you have many handlers with very large messages active. This would eventually need a feature like out-of-band transport for large messages (e.g., through S3).
One big hangup for me is that there's only a single node orchestrator as a CDK construct. Having a HA setup would be a must for business critical flows.
I stumbled on Restate a few months ago and left the following message on their discord.
> I was considering writing a framework that would let you author AWS Step Functions workflows as code in a typesafe way when I stumbled on Restate. This looks really interesting and the blog posts show that the team really understands the problem space.
> My own background in this domain was as an early user of AWS SWF internally at AWS many, many years ago. We were incredibly frustrated by the AWS Flow framework built on top of SWF, so I ended up creating a meta Java framework that let you express workflows as code with true type-safety, arrow function based step delegations, and leveraging Either/Maybe/Promise and other monads for expressiveness. The DX was leaps and bounds better than anything else out at the time. This was back around 2015, I think.
> Fast-forward to today, I'm now running a startup that uses AWS Step Functions. It has some benefits, the most notable being that it's fully serverless. However, the lack of type-safety is incredibly frustrating. An innocent looking change can easily result in States.Runtime errors that cannot be caught and ignore all your catch-error logic. Then, of course, is how ridiculous it feels to write logic in JSON or a JSON-builder using CDK. As if that wasn't bad enough, the pricing is also quite steep. $25 for every million state transitions feels like a lot when you need to create so many extra state transitions for common patterns like sagas, choice branches, etc.
> I'm looking forward to seeing how Restate matures!
Out of curiosity, have you explored the possibility of a serverless orchestration layer? That's one of the most appealing parts of Step Functions. We have many large workflows that run just a couple times a day and take several hours alongside a few short workflows that run under a minute and are executed more frequently during peak hours. Step Functions ends up being really cost effective even through many state transitions because most of the time, the orchestrator is idle.
Coming from an existing setup where everything is serverless, the fixed cost to add serverfull stuff feels like a lot. For a HA setup, it'd be 3 EC2 instances and 3 NAT gateways spread across 3 AZs. Then multiply that for each environment and dev account, and it ends up being pretty steep. You can cut costs a bit by going single AZ for non-prod envs, but still...
I couldn't find a pricing model for Restate Cloud, but I'm including "managed services" under the definition of serverless for my purposes. Maybe that offering can fill the gap, but then it does raise security concerns if the orchestration is not happening on our own infra.
Also something about this area always makes me excited. I guess it must be the thought of having all these tasks just working in the background without having to explicitly manage them.
One question I have is does anyone have experience for building data pipelines in this type of architecture?
Does it make sense to fan out on lots of small tasks? Or is it better to batch things into bigger tasks to reduce the overhead.
Regarding whether to parallelize or to batch, I think this strongly depends on what the actual operation involves. If it involves some CPU-intensive work like model inference, for example, then running more parallel tasks will probably speed things up.
(1) Restate has latencies that to the best of my knowledge are not achievable with Temporal. Restate's latencies are low because of (a) its event-log architecture and (b) the fact that Restate doesn't need to spawn tasks for activities, but calls RPC handlers.
(2) Restate works really well with FaaS. FaaS needs essentially a "push event" model, which is exactly what Restate does (push event, call handler). IIRC, Temporal has a worker model that pulls tasks, and a pull model is not great for FaaS. Restate + AWS Lambda is actually an amazing task queue that you can submit to super fast and that scales out its workers virtually infinitely automatically (Lambda).
(3) Restate is a self-contained single binary that you download and start and you are done. I think that is a vastly different experience from most systems out there, not just Temporal. Why do app developers love Redis so much, despite its debatable durability? I think it is the insanely lightweight manner they love, and this is what we want to replicate (with proper durability, though).
(4) Maybe most importantly, Restate does much more than workflows. You can use it for just workflows, but you can also implement services that communicate durably (exactly-one RPC), maintain state in an actor-style manner (via virtual objects), or ingest events from Kafka.
This is maybe not the first thing you build, but it shows you how far you can take this if you want: It is a full app with many services, workflows, digital twins, some connect to Kafka. https://github.com/restatedev/examples/tree/main/end-to-end-...
All execution and communication is async, durable, reliable. I think that kind of app would be very hard to build with Temporal, and if you build it, you'd probably be using some really weird quirks around signals, for example when building the state maintenance of the digital twin that don't make this something any other app developer would find really intuitive.
Once http2 stuff is removed, there's nothing particularly odd that our library does that shouldn't work in all platforms, but I'm sure there will be some papercuts until we are actively testing against these targets
The restate API is extremely similar to ours, and because of the similarities both Restate and Inngest should work on Bun, Deno, or any runtime/cloud. We most definitely do, and have users in production on all TS runtimes in every cloud (GCP, Azure, AWS, Vercel, Netlify, Fly, Render, Railway, Cloudflare, etc).
Question tho, when will you guys have python support? I’m a ml researcher here and can you tell that most of my work is now pipelines between different services, e.g. Chaining multiple LLM services. Big bottleneck is if one service returns an error and crashes the full chain.
Big fan of this work nevertheless. Just think you have alpha on the table
I'm particularly interested in the scaling characteristics, and how your approach to durable storage (seems no external database is required?) differs
And yes, Restate does not have any external dependencies. It comes as a single self-contained binary that you can easily deploy and operate wherever you are used to run your code.
In a multi-replica / horizontally scaled setup:
- Does each replica get its own independent storage volume?
- How much state is replicated between each volume if so?
- What does a typical workloads journal look like in terms of storage size? How often does this get compacted / is there archival to cold storage?
- How do you manage upgrades to restate in the case of a change to the on-disk format? (Or is this designed to be static between releases)
The fact that it's without external dependencies also makes me wonder if it would be encouraged to have multiple independent deployments of restate managing different independent services/workflows to avoid noisy neighbors and single points of failure - seems like it might be lightweight enough that this is practical?
I couldn’t find an equivalent of the codec server in temporal that basically encrypts all data in the event log. Is there something similar?
We plan to publish a more detailed follow-up blog post where we explain why we developed a new stateful system, how we implemented it, and what the benefits are. Stay tuned!
One difference is that Airflow seems geared towards heavier operations, like in data pipelines. In contrast, would be that Restate is not by default spawning any tasks, but it acts more of a proxy/broker for RPC- or event handlers and adds durable retries, journaling, ability to make durable RPCs, etc.
That makes it quite lightweight: If the handlers is fast in a running container, the whole thing results in super fast turnaround times (milliseconds).
You can also deploy the handlers on FaaS and basically get the equivalent of spawning a (serverless task) per step.
The other difference would be the way that the logic is defined, can maintain state, can make exactly-once calls to other handlers.
- Blog post with an overview of Restate 1.0: https://restate.dev/blog/announcing-restate-1.0-restate-clou...
- Restate docs: https://docs.restate.dev/
- Discord, for anyone who wants to chat interactively: https://discord.com/invite/skW3AZ6uGd