That said, a lot of current agent workloads are I/O bound around external APIs. If 95% of the time is waiting on OpenAI or Anthropic, the scheduling model matters less than people think. The BEAM’s preemption and per process GC shine when you have real contention or CPU heavy work in the same runtime. Many teams quietly push embeddings, parsing, or model hosting to separate services anyway.
Hot code swapping is genuinely interesting in this context. Updating agent logic without dropping in flight sessions is non trivial on most mainstream stacks. In practice though, many startups are comfortable with draining connections behind a load balancer and calling it a day.
So my take is: if you actually need millions of concurrent, stateful, soft real time sessions with strong fault isolation, the BEAM is a very sane default. If you are mostly gluing API calls together for a few thousand users, the runtime differences are less decisive than the surrounding tooling and hiring pool.
When you are just “gluing together API calls” surrounding tooling doesn’t matter as much. I don’t care that Elixir doesn’t have such a large community as Python, I’m just gluing together API calls, I don’t have dependencies.
> TypeScript/Node.js: Better concurrency story thanks to the event loop, but still fundamentally single-threaded. Worker threads exist but they're heavyweight OS threads, not 2KB processes. There's no preemptive scheduling: one CPU-bound operation blocks everything.
This cannot be a real protest: 100% of the time spent in agent frameworks is spent ... waiting for the agent to respond, or waiting for a tool call to execute. Almost no time is spent in the logic of the framework itself.
Even if you use heavyweight OS threads, I just don't believe this matters.
Now, the other points about hot code swapping ... so true, painfully obvious to those of us who have used Elixir or Erlang.
For instance, OpenClaw: how much easier would "in-place updating" be if the language runtime was just designed with the ability in mind in the first place.
But that’s exactly where multi threaded Elixir is better! You want a single thread like Node for CPU bound work, you want extreme multi threading for I/O bound work like AI agents. In Elixir you can do both: heavy CPU work without worrying about stopping the world, and heavy concurrency across millions of threads where work is I/O bound and you want to saturate your network connection. In Node you can’t do either of those things easily - it’s just a single thread.
It matters a lot. How many OS threads can you run on 1 machine? With Elixir you can easily run thousands without breaking a sweat. But even if you need only a few agents on one machine, OS thread management is a headache if you have any shared state whatsoever (locks, mutexes, etc.). On Unix you can't even reliably kill dependent processes[1]. All those problems just disappear with Elixir.
[1] https://matklad.github.io/2023/10/11/unix-structured-concurr...
Spending too much time optimizing for the 1% of extra overhead seems suboptimal..
Any modern Linux machine should be able to spawn thousands of simultaneous threads without breaking a sweat.
It is not forbidden by their EULA/ToS, I suppose.
Claude code already works as an agent that calls tools when necessary so it’s not clear how an abstraction helps here.
I have been really confused by langchain and related tech because they seem so bloated without offering me any advantages?
I genuinely would like to know what I’m missing.
You could package Claude Code into the product (via agents-sdk or Claude -p) and have it use the API key (with metered billing) but in my case I didn’t find it ergonomic enough for my needs, so I ended up using my own agent framework Langroid for this.
https://github.com/langroid/langroid
(No it’s not based on that similarly named other framework, it’s a clean, minimal, extensible framework with good dx)
Erlang didn't introduce the actor model, any more than Java introduced garbage collection. That model was developed by Hewitt et al. in the 70s, and the Scheme language was developed to investigate it (core insights: actors and lambdas boil down to essentially the same thing, you really don't need much language to support some really abstract concepts).
Erlang was a fantastic implementation of the actor model for an industrial application, and probably proved out the model's utility for large-scale "real" work more than anything else. That and it being fairly semantically close to Scheme are why I like it.
The article touches very briefly on Phoenix LiveView and Websockets. I wrote about why chatbots hate page refresh[1], and it's not solved by just swapping to Websockets. By far the best mechanism is pub/sub, especially when you can get multi-user/multi-device, conversation hand-off, re-connection, history resumes, and token compaction basically for free from the transport.
1: https://zknill.io/posts/chatbots-worst-enemy-is-page-refresh...
Do I want this? If my request fails because the tool doesn't have a DB connection, I want the model to receive information about that error. If the LLM API returns an error because the conversation is too long, I want to run compacting or other context engineering strategies, I don't want to restart the process just to run into the same thing again. Am I misunderstanding Elixir's advantage here?
The benefit comes mainly from what happens when you encounter unknown errors or errors that you can't handle or errors that would get you into an invalid state. It's normal in BEAM languages to handle the errors you want to/can handle and let the runtime deal with the other transient/unknown errors to restart the process into a known good state.
The big point really is preventing state corruption, so the types of patterns the BEAM encourages will go a long way toward preventing you from accidentally ending up in some kind of unknown zombie state with your model, like for example if your model or control plane think they are connected to each other but actually aren't. That kind of thing.
Happy to clarify more if this sounds strange.
At the same time, I can't imagine the last time I had a random exception I didn't think about in prod, but I guess that's the whole point of the BEAM, just don't think about it at all.
I might take a stab at Elixir, the concepts seem interesting and the syntax looks to be up my alley.
If an LLM returns garbage, restarting the process (agent) with the same prompt and temperature 0 yields the same garbage. An Erlang Supervisor restarts a process in a clean state. For an agent "clean state" = lost conversation context
We don't just need Supervision Trees, we need Semantic Supervision Trees that can change strategy on restart. BEAM doesn't give this out of the box, you still code it manually
The good thing about those, IMO, is that they’re leveraging everything that’s already in BEAM/OTP, so there’s no need to reinvent the harder parts. They “only” add some extra features (like persistence of processes/GenServers between restarts) and higher-level abstraction APIs.
What's that about years of experience? That's obsolete thinking!
> Your Agent Framework Is Just a Bad Clone of Elixir: Concurrency Lessons from Telecom to AI
Node is great, but scaling Elixir threads is more so.
A note on terminology: Throughout this post I refer to "the BEAM." BEAM is
the virtual machine that runs both Erlang and Elixir code, similar to how the
JVM runs both Java and Kotlin. Erlang (1986) created the VM and the
concurrency model. Elixir (2012) is a modern language built on top of it with
better ergonomics. When I say "BEAM," I mean the runtime and its properties.
When I say "Elixir," I mean the language we write.Elixir just feels… Like it’s a load of pre-compile macros. There’s not even a debugger.
Are you guys okay? WTF is going on with HN?
There’s one interesting detail about this blog though, you can see how the LLM-generated spam improves over the years as models get better.