You would not want to use this framework if you need to handle long-running requests (milliseconds or more). For example, a web api that needs to connect to a mongo database, authenticate, and respond; just use the Go net/http package instead.
There are many popular event loop based applications in the wild such as Nginx, Haproxy, Redis, and Memcached. All of these are single-threaded and very fast and written in C.
The reason I wrote this framework is so I can build certain network services that perform like the C apps above, but I also want to continue to work in Go.
First of all, thank you for publishing this project. It's very interesting in my opinion since I never thought about the benefits of an event loop. Would you mind explaining briefly why an event loop is a better suit for these applications? Is it due to performance and efficiency?
What this does instead is give a Go program direct access to the event loop. The benefit is that it bypasses all of the stuff that Go wraps around the internal event loop call that allows it to implement the way it offers a thread-like interface for you, and integrates with the channel and concurrency primitives, and maintains your position in the call stack between events, etc. The penalty is... the exact same thing, that you lose all the nice stuff that the Go runtime offers to you to implement the thread-like interface, etc., and are back to a lower-level interface that offers less services.
The performance of the Go runtime is "pretty good", especially by scripting language standards, but if you have sufficiently high performance requirements, you will not want to pay the overhead. The pathological case for all of these nice high-level abstractions is a server that handles a ton of network traffic of some sort and needs to do a little something to every request, maybe just a couple dozen cycle's worth of something, at which point paying what could be a few hundred cycles for all this runtime nice stuff that you're not using becomes a significant performance drain. Most people are not doing things where they can service a network request in a few dozen cycles, and the longer it takes to service a single request the more sense it makes to have a nice runtime layer providing you useful services, as it drops in the percentage of CPU time consumed by your program. For the most part, if you are so much as hitting a database over a network connection, even a local one, in your request, you've already greatly exceeded the amount of time you're paying to the runtime, for instance.
It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces. What differs between Node and Twisted and all of the dozens or hundreds of other viable wrappers over these calls is the services automatically provided, not whether or not they are "event loops". Go is an event loop at the kernel level. Node is an event loop at the kernel level. Erlang is an event loop at the kernel level. They aren't all the same, but "event-based" vs. "not event-based" is not the distinction; it's a question of what they lay on top of the underlying event loop, not whether they use it. Even pure OS threads are, ultimately, event loops under the hood, just in the kernel rather than the user space.
That is, I can write simple blocking code, and my server still scales.
Using event loop programming in Go would take away one of my favorite things about the language, so I won't be using this. However I do appreciate the work, as it makes an excellent bug report against the Go runtime. It gives us a standard to hold the standard library's net package to.
If Go has or can emulate 'generators' a-la Python/Nodejs,then you can write synchronous looking,blocking-like code with event loops as well.
The end result is much easier than Python/NodeJs because there is no explicit "async/await" or deferred-style programming. You simply write linear code and any blocking operation (at the syscall level) is transparently handled.
There were historical implementations of pthreads, such as NGPT, that used precisely the same model as Go, and they were abandoned because the advantages over 1:1 were not sufficient to justify the complexity.
Creating and destroying kernel threads is significantly more expensive.
A kernel thread has a fixed stack and if you go beyond, you crash. Which means that you have to create kernel threads with worst-case-scenario stack sizes (and pray that you got it right).
Goroutine has an expandable stack and starts with very small stack (which is partly why it's faster; setting up kernel page mappings to create a contiguous space for a large stack is not free).
Finally, goroutine scheduling is different than kernel thread scheduling: a blocked goroutine consumes no CPU cycles.
In a 4 core CPU there is no point in running more than 4 busy kernel threads but kernel scheduler has to give each thread a chance to run. The more threads you have, the more time kernel spends and pointless work of ping-ponging between threads. That hurts throughput, especially when we're talking about high-load servers (serving thousands or even millions of concurrent connections).
Go runtime only creates as many threads as CPUs and avoids this waste.
That's why high-perf servers (like nginx) don't just use kernel thread per connection and go through considerable complexity of writing event driven code.
Go gives you straightforward programming model of thread-per-connection with scalability and performance much closer to event-driven model.
You work on Rust and are well informed about this topic so I'm sure you know all of that.
Which is why it amazes me the lengths to which you go to denigrate Go in that respect and minimize what is a great and unique programming model among mainstream languages.
You can achieve the same on Linux or Solaris using kernel threads, but you have to work at it. With Go you don't have to work at it, and it works on macOS and Windows and a few other OSs too.
This is all comparisons between O(1) things, but the constant factor matters.
I agree with OP that golang seems to provide the best of both worlds in the “event” vs “thread” debate. We can get the performance benefits of an eventing model with a much simpler programming model of thread per request.
It’s all “semantically” similar but it’s the details that matter. And I think golang chose the correct trade offs here (and with their sub-ms GC as well). The JVM, as an opposing example, made all the wrong choices I think for the general use case. Slow GCs and 1:1 threading.
I always understood the overhead of kernel threads compared to user threads to be significant at large scale. It’s not just stacks either. It can be a lot cheaper to swap between user threads, depending on implementation, compared to the scheduler having to preempt and trap into kernel code and provide a general purpose context switch.
I would speculate the performance win is because there is no stack switching and less channels.
I've done lots of event loops in the past (eg hellepoll in c++) and think that the cost of that is on the programmer - keeping track of things, callbacks, state machines and things and avoiding using the stack for state etc is all hard work and easy to mess up.
I am reminded of this post I saw on HN a while ago https://www.mappingthejourney.com/single-post/2017/08/31/epi... Ryan Dahl, creator of node.js, would just use Go today ;)
I very much agree. In the past, I have had quite some fun developing a few streaming parsers using Node.js, which also uses an event loop. And while these parser worked relatively good and efficient, debugging them was not an easy task. In addition, understanding the code is also a though challenge, especially for people other than the original authors.
When I started using Go more and more, I really enjoyed the different I/O-model using goroutines and blocking function calls. It also has a few drawbacks but the mental model is a lot easier to reason about.
This is improving, even in C++. This is what the core loop of a line-based echo server could look like in C++17 (and something very similar compiles today on my machine)
void echo_loop (tcp::socket socket) {
io::streambuf buffer;
std::string line;
std::error_code ec;
do {
ec = co_await async_read_line (socket, buffer, line);
if (ec)
break;
ec = co_await async_write (socket, line);
} while (!ec);
}It's a testament to what is possible through the "syscall" and "golang/x/sys" facilities. As well as your confidence in playing with Linux internals ;)
Stripping those mechanism to pretend the event handling is faster only works if you never intend to have some real computation performed. That's never true in practice... Or am i missing something ?
So you're not missing anything.
The single-threaded nature of applications liked Redis an Haproxy is a singificant impediment to their vertical scalability. CPUs aren't getting faster, we're just going to get more cores, so anything that assumes there's only a single core seems like a dead end.
Haproxy literally just added multithreading support in 1.8.
HAProxy added threading support in 1.8 as you pointed out and Redis has started the same (for a certain subset of processing) in 4.0 as well. They're getting there but concurrency is tough.
To suggest that his product is a "dead end" due to not supporting threading seems a bit premature, as Redis and HAProxy are extremely well-regarded in their niche and they made it there without threading, and we've been at maximal clock speed for nearly a decade.
I suppose my experience might be unusual, but I frequently log in to c3.8xlarge redis machines that have a single core pegged at 100% and the rest doing nothing. Yes multiple processes help, but that requires updating clients and makes it harder to share memory.
> To suggest that his product is a "dead end" due to not supporting threading seems a bit premature, as Redis and HAProxy are extremely well-regarded in their niche and they made it there without threading.
Well yeah, CPUs hitting their GHZ limit and the dramatic increase in the number of cores per machine is a relatively recent phenomena.
I just think its weird to start a brand new project making those same assumptions, especially when the underlying programming language was explicitly designed with concurrency in mind.
It'd be like building a new networking library in Rust which ditches memory safety.
Yes, the event loop is single-threaded. The other cores can be used for other stuff, but not the event loop.
It's completely possible with this library to process operations in a background thread and wake up the loop when it's time to write a response. If that's what the developer desires.
> anything that assumes there's only a single core seems like a dead end.
If my documentation somehow implies that systems running this library do not have multiple cores then I'm sorry for the confusion. This library makes no assumption about the host server, and it does not limit the application to a single core. It just runs the event loop in one thread.