Building Reliable Distributed Systems in Node.js (opens in new tab)

(temporal.io)

45 pointsmfateev3y ago17 comments

17 comments

Temporal is an implementation of a paradigm I got interested in back in 2019. I wasn’t at one of those companies that had heard about Cadence, so when I was searching around to see if anyone had actually already built this idea I’d come up with, I stumbled upon Zenaton. It’s no longer around, didn’t get PMF, so I was happy when Temporal came out of stealth mode a few months later - was nice to have my intuition in this area validated.

We’ve been using Temporal quite successfully in Go (and more recently Python) for a little while now. It could do with being a bit easier to get up and running with, but day-to-day usage is very nice. I don’t think I could go back to plain out message queues, this paradigm is a real time saver.

The biggest challenge is deciding how many things are nails for the hammer that is Temporal. You tend to start out using it to replace an existing mess of task orchestration; but then you realise its actually a pretty good fit for any write operation that can’t neatly work in a single database transaction (because it’s hitting multiple services, technologies, third parties etc).

You have to be careful to keep your workflows deterministic, but once you get used to the paradigm, it’s enjoyable.

lorendsr3y ago

This post talks about the durable execution systems, which include Azure Durable Functions, Amazon SWF, Uber Cadence, Infinitic, and Temporal.

Durable execution systems run our code in a way that persists each step the code takes. If the process or container running the code dies, the code automatically continues running in another process with all state intact, including call stack and local variables.

Durable execution makes it trivial or unnecessary to implement distributed systems patterns like event-driven architecture, task queues, sagas, circuit breakers, and transactional outboxes. It’s programming on a higher level of abstraction, where you don’t have to be concerned about transient failures like server crashes or network issues.

MuffinFlavored3y ago

you have to code your entire architecture around this premise though, no?

aka start from scratch and write things a certain way

lorendsr3y ago

No, the sample app is 100% Temporal backend, but you can incrementally adopt—writing durable functions for specific processes. Usually companies start out with things that are either long running or for which reliability is particularly important, like financial transactions. Then they learn that it can be more generally useful, and expand use cases gradually.

hot_gril3y ago

Sounds like the whole point is you don't have to take anything special into account. Edit: Except determinism?

lorendsr3y ago

The point is that you can write code instead of JSON/YAML like traditional microservice orchestration like AWS step functions. And it’s not a limited dsl—you have the full lang at your disposal, with the one requirement that deterministic code (workflows / the durable code) is in separate functions from non deterministic code (like making a network request, called “Activities”).

MuffinFlavored3y ago

https://github.com/temporalio/hello-world-project-template-j...

You have to write your code using Temporal SDK.

At a quick glance:

main() calls WorkflowServiceStubs/WorkflowClient

I also see something called an "Activity"

Also see something called a Worker.

Not trying to argue. Genuinely curious if you think this is within the realm of "not take anything special into account" (be forced to use a specific SDK and lay your logic out in the exact way it supports) or if you didn't know this was referring to Temporal?

2 more replies

barbarbar3y ago

Is this similar to apache camel or spring integration?

hot_gril3y ago

Never heard of durable execution until now, but I've wondered about it. When I write backend code, I have to keep asking myself "what happens if the server goes down during this line of code?" This is often an issue in the middle of a customer order, like the example here. I end up relying on the database for very many tiny little things, like recording the fact that the user initiated an order before I start to process it.

But how fast is this? IIRC each little insert in my DB was taking like 5ms, which would add up quickly if I were to spam it everywhere; I assume durable execution layers are better optimized for that. Do they really only snapshot before and after async JS calls, treating all other lines as hermetic and thus able to be rerun?

lorendsr3y ago

Yeah, I’ve also written this write-to-db-after-each-meaningful-line-of-code style code, and this is a great improvement. See the first 20m of this talk for an example: https://youtu.be/EFIF8gk9zy8

Starting a workflow is currently ~40ms, and I think we’ll be able to get down to 10ms this year. How long it takes to complete depends on how many persisted steps it takes (and whether it has to wait on an external event). The only steps that are persisted are workflow api calls like sleep(), startChildWorkflow(), or calling code that might fail (ie “Activity”, like a network request).

hot_gril3y ago

> The only steps that are persisted are workflow api calls like sleep(), startChildWorkflow(), or calling code that might fail (ie “Activity”, like a network request).

Ok, that's what I was wondering. Makes a lot more sense this way.

lakomen3y ago

Or you could not use a scripting language and save 50 times the cost

lorendsr3y ago

We have Go and Java SDKs that have better performance characteristics if that’s what you’re optimizing for. I think for many businesses, optimizing for development speed is a higher priority (eg if the devs already know JS, use that). The Node runtime with v8 isolates is also able to better protect developers from writing non deterministic code (durable code must be deterministic). More info on that: https://temporal.io/blog/intro-to-isolated-vm

hot_gril3y ago

That doesn't solve the problem of long-running processes, CPU time isn't the limiting factor here, and devs cost more than compute resources.

j / k navigate · click thread line to collapse

17 comments

andrewingram3y ago

You have to be careful to keep your workflows deterministic, but once you get used to the paradigm, it’s enjoyable.

lorendsr3y ago

This post talks about the durable execution systems, which include Azure Durable Functions, Amazon SWF, Uber Cadence, Infinitic, and Temporal.

MuffinFlavored3y ago

you have to code your entire architecture around this premise though, no?

aka start from scratch and write things a certain way

lorendsr3y ago

hot_gril3y ago

Sounds like the whole point is you don't have to take anything special into account. Edit: Except determinism?

lorendsr3y ago

MuffinFlavored3y ago

https://github.com/temporalio/hello-world-project-template-j...

You have to write your code using Temporal SDK.

At a quick glance:

main() calls WorkflowServiceStubs/WorkflowClient

I also see something called an "Activity"

Also see something called a Worker.

2 more replies

barbarbar3y ago

Is this similar to apache camel or spring integration?

hot_gril3y ago

lorendsr3y ago

Yeah, I’ve also written this write-to-db-after-each-meaningful-line-of-code style code, and this is a great improvement. See the first 20m of this talk for an example: https://youtu.be/EFIF8gk9zy8

hot_gril3y ago

> The only steps that are persisted are workflow api calls like sleep(), startChildWorkflow(), or calling code that might fail (ie “Activity”, like a network request).

Ok, that's what I was wondering. Makes a lot more sense this way.

lakomen3y ago

Or you could not use a scripting language and save 50 times the cost

lorendsr3y ago

hot_gril3y ago

That doesn't solve the problem of long-running processes, CPU time isn't the limiting factor here, and devs cost more than compute resources.

j / k navigate · click thread line to collapse