Having said that, I will add that I think it is good to have Elixir.
I use long lived processes and had to come up with some magic to work around the supervisor behavior with high child counts, etc.
Roughly I randomly assign worked to a node in the cluster if they have not yet been assigned. (there is some logic tracking total nodes on cluster and max / target that influences the decision). I verify if the remote (or local) worker is alive or migrating by checking a fragmented process Registry and unique identifier via :rpc (because I do some recovery logic if it's offline and let the caller specify if messages should force a spawn or can be ignored if the process is offline) and then pad the call with some meta information for rerouting so the receiver can confirm it is the same process the message was initially sent to (since processes cycle so frequently that the initial process may have died and a new process may have spawned in the time it took to forward the message).
If the process has changed mid transit or the worker has been flagged for migration the message gets rerouted to the new destination+process. If a process does not yet exist a hash based on the worker type + id and available worker pools is used to select which of the auto generated but named and registered (WorkType.PoolSupervisor_123) worker supervisors spawns the child node.
It's a trip, and needs to be heavily documented. Starting from scratch i'd probably change some things and it probably needs some refinement later this year before the next batch of 250-500k devices get added to the network, but the costs per reporting device are fantastic with plenty of low hanging fruit for improving the cogs further so I'm happy.
Concretely, its it the case, for an application where Elixir/Erlang/Beam are a great choice, but also, another language would be fine, that the equivalent Elixir application results in less downtime/pages than the alternative. Anything from the perfect app to something with a ton of races/leaks.
Is this a fair question (maybe I'm presuming too much of BEAM/supervisor pattern, I zero experience with it)?
Don't have any hard data to compare but having been involved in debugging running Erlang systems. It's very nice having the ability to restart separate supervisors while the rest of the processes handle requests. Being able to do hot code loading to say fix bugs or add extra logging. And my all time favorite -- live tracing after connecting to a VM's remote shell. You can just pick any function, args, and process and say "trace these for a few seconds if a specific condition happens". None of those individually are earth shattering but taken together they are just so pleasant to use. I wouldn't enjoy going back to anything didn't those capabilities.
And yes, that restarting of sub-systems (supervision trees) happens automatically as well. There were a number of cases were it turned a potential "wake up 4am and fix this now, cause everything crashed" into a "meh, it's fine until I get to it next week" kind of a problem.
I have zero on-call for Go. I had very few for Elixir. But the bug were in logic code. Same with Ruby.
But it's a disaster with Node. We used TypeScript so it catch lot of type issue. However, the Node runtime is weird. We run into DNS issue(like yo have to bump the libuv thread pool, cache DNS). JSON parsing issue and block the event loop etc...max memory...
For instance:
* Are the teams that use certain languages comprised of more experienced people?
* How mature is the company and project? I.e., a faster moving startup cutting more corners, where time was decided to be of the essence (rightly or wrongly) will likely produce more on call incidents than a slower, more established company that can takes its time
The general idea is combining queries from different HTTP requests into a single database query/transaction, amortising the (significant) per-query cost over those requests. For simple use-cases it doesn't add a whole lot of complexity, can reduce both load and latency significantly, and doesn't lose transactional guarantees.
Not 100k/sec writes on my laptop, mind you :-).
E.g. please give me guidance on how to better structure my database model so that it doesn't effectively end up as a huge spaghetti heap of global variables. My personal horror: updating a single database field spurs 20 additional SQL queries creating several new rows in seemingly unrelated tables. Digging in I find this was due to an after_save hook in the database model which created an avalanche of other after_save/after_validation hooks to fire. The worst of it: Asking for how this has come to be I find out that each step of the way was an elegant solution to some code duplication in the controller, some forgotten edge case in the UI, some bug in the business logic. Basically ending up with extremely complex control flows is the default.
So of course, if your code has next to no isolation, batching up queries produces incalculable risks.
/rant, sorry.
https://github.com/facebook/dataloader/blob/master/README.md
does anyone know how does 100k connections compare with other servers?
The order of magnitude(s) differentiator for server performance really comes down to whether or not the architecture is blocking or non-blocking.
http://highscalability.com/blog/2013/5/13/the-secret-to-10-m...
https://mrotaru.wordpress.com/2015/05/20/how-migratorydata-s...
Also, assuming it scales up linearly is a bit risky, although I agree with that kind of conn/s I am sure it will be sufficient.
that being said, this article was pretty informative. The bit about the proposed SO_REUSEPORT socket option was really interesting. Really fun to read about performance bottleneck detection and improvement.
edit: wow, downvoting for making a simple joke about liking elixir. Cool.
Maybe we should add something about this to https://news.ycombinator.com/newsfaq.html.
I think that the inclination towards "meaningless" humor makes it too much like Reddit. These folks want SUBSTANCE! (Well, that's why _I_ come here, at least!)
Also, any update on your previous article? https://news.ycombinator.com/item?id=19094233
> Efficiency in the BEAM is mainly in service of its primary goal of fault-tolerance. If one process crashes unexpectedly, the others should continue. By the same logic, if one process is CPU-intensive or IO-blocked, the others should keep making progress smoothly. And if processes are good for isolating errors and performance issues, they should be cheap enough that we can run a lot of them at once. Those assumptions are baked into how the BEAM manages processes.
If raw speed is your only goal, the BEAM probably isn't the best choice. If consistent speed and stability matter, it may be.
More on this at https://dockyard.com/blog/2018/07/18/all-for-reliability-ref...
- Go wants to be performant at high concurrency scale
- Erlang/Elixir wants to keep running at high concurrency scales, whatever the issues are in your application code. Performance comes second.
There's no clear cut answer to your question; I guess if you trust yourself to write servers that will hold a large number of connections while doing a lot of processing then Go has an advantage, otherwise you should probably trust the man-centuries behind the BEAM VM and follow the various blog posts/presentations explaining how you can fine-tune your machine to get to super large scales.
I want to state that performance is too generalize here.
BEAM VM also have a goal of low latency which can be consider as performance. I'm not entirely sure if GO is aiming for that or not. I would never do any numerical stuff on BEAM though, it's very slow.
This article is a bit dated but is interesting between Go and Erlang:
https://www.theerlangelist.com/article/reducing_maximum_late...
If it's pure benchmarks, then Go is usually going to come in a little bit ahead.
When you get into comparing language design, underlying architectural decisions, problems solved/created/avoided by those decisions it gets more complex.
I did a big write up for code ship a couple of years ago. Had a solid discussion on HN and the comparison remains fairly accurate.
Both of them give you less flexibility than is necessary to achieve highly efficient use of all threads on a multiprocessor system. For that, you'll need something like a pool of event loops using async/await. This is the system most common in high performance networking in C++, C, and Rust.
Erlang and Go both sacrifice efficiency to improve maintainability and safety by offering a model that allows you to approach concurrency from a more synchronous mindset. Erlang in particular goes beyond Go in that the Actor model is considerably easier to avoid deadlocks and other concurrency bugs in at the expense of a much more opinionated system. Erlang is also less focused on reducing average latency as much as keeping latency predictable at scale.
Long story short, Erlang, Go, and the rest are not apples to apples comparisons, and it takes investment in each language to understand the tradeoffs and use cases for each. You should also view them holistically, as in, what language can my team support, and will the wins from Erlang's message queues outweigh the smaller community, or will Go's mid tier performance be enough to avoid writing on top of the low level libevent and building a custom thread pool or fine tuning Go's scheduler.
The question is if you cannor want to write better concurrent code by rolling it yourself.
EDIT: I believe this is partially due to Go being a lot more CPU efficient overall than Erlang (see below). So for simple servers, Go and Erlang will match performance, but for slightly more complex web servers that need to crunch some data, Go [and Rust] will outperform the Erlang VM. https://stressgrid.com/blog/benchmarking_go_vs_node_vs_elixi...
Can someone educate me on what they might talking about here ? CPU is ~45% in their final graph. I don't know what network latency means in this context though. Roundtrip for a TCP handshake ? That seems unlikely.
14 core machine comparing .net core with other top webservers: https://www.ageofascent.com/2019/02/04/asp-net-core-saturati...
But since they are benchmarking Elixir, there is some amount of overhead involved in that framework's management of connections and requests. If I knew Erlang/Elixir, that would be a fascinating thing to explore.
Edit: I'm assuming the saturated CPU comes from Elixir and not the OS. It would be strange for 100k/sec to saturate the TCP stack with 36 cores.
To run this test, we used Stressgrid with twenty c5.xlarge generators.