Evio – Fast event-loop networking for Go (opens in new tab)

(github.com)

281 pointsAcconut8y ago65 comments

65 comments

34 comments · 9 top-level

crawshaw8y ago· 7 in thread

One of my favorite things about Go is that it cuts through the "threads vs. events" debate by offering thread-style programming with event-style scaling using what you might call green threads (compiler assisted cooperative multitasking that has the programming semantics of preemptive multitasking).

That is, I can write simple blocking code, and my server still scales.

Using event loop programming in Go would take away one of my favorite things about the language, so I won't be using this. However I do appreciate the work, as it makes an excellent bug report against the Go runtime. It gives us a standard to hold the standard library's net package to.

hannofcart8y ago

<quote>That is, I can write simple blocking code, and my server still scales. Using event loop programming in Go would take away one of my favorite things about the language, so I won't be using this.</quote>

If Go has or can emulate 'generators' a-la Python/Nodejs,then you can write synchronous looking,blocking-like code with event loops as well.

giovannibajo18y ago

That is exactly what Go does by default. Any time a blocking operation is performed, Go either leaves the OS-level thread blocked there and switches away, or hand the blocking operation to an internal thread which is running epoll for the whole process.

The end result is much easier than Python/NodeJs because there is no explicit "async/await" or deferred-style programming. You simply write linear code and any blocking operation (at the syscall level) is transparently handled.

2 more replies

pcwalton8y ago

It doesn't really "cut through" the debate any more than any other implementation of threads does. The only difference between Go and plain old one-thread-per-connection is that regular threads run in the kernel, while Go threads run in userspace. That's not a semantic difference, only an implementation detail (a large detail, to be clear, but still an implementation detail).

There were historical implementations of pthreads, such as NGPT, that used precisely the same model as Go, and they were abandoned because the advantages over 1:1 were not sufficient to justify the complexity.

kjksf8y ago

What you call a "Go thread" has a precise name (goroutine) and running in userspace is hardly the only difference between a goroutine and a kernel thread.

Creating and destroying kernel threads is significantly more expensive.

A kernel thread has a fixed stack and if you go beyond, you crash. Which means that you have to create kernel threads with worst-case-scenario stack sizes (and pray that you got it right).

Goroutine has an expandable stack and starts with very small stack (which is partly why it's faster; setting up kernel page mappings to create a contiguous space for a large stack is not free).

Finally, goroutine scheduling is different than kernel thread scheduling: a blocked goroutine consumes no CPU cycles.

In a 4 core CPU there is no point in running more than 4 busy kernel threads but kernel scheduler has to give each thread a chance to run. The more threads you have, the more time kernel spends and pointless work of ping-ponging between threads. That hurts throughput, especially when we're talking about high-load servers (serving thousands or even millions of concurrent connections).

Go runtime only creates as many threads as CPUs and avoids this waste.

That's why high-perf servers (like nginx) don't just use kernel thread per connection and go through considerable complexity of writing event driven code.

Go gives you straightforward programming model of thread-per-connection with scalability and performance much closer to event-driven model.

You work on Rust and are well informed about this topic so I'm sure you know all of that.

Which is why it amazes me the lengths to which you go to denigrate Go in that respect and minimize what is a great and unique programming model among mainstream languages.

1 more reply

crawshaw8y ago

The goroutine implementation scales, while other thread implementations (by default) do not. That's a semantic difference. A Go server can have millions of active goroutines with moderate resource use.

You can achieve the same on Linux or Solaris using kernel threads, but you have to work at it. With Go you don't have to work at it, and it works on macOS and Windows and a few other OSs too.

This is all comparisons between O(1) things, but the constant factor matters.

1 more reply

nvarsj8y ago

Choice of 1:1 or M:N is all about trade offs. NPLT chose 1:1 for simplicity (and decided to focus instead on making context switches cheap as possible in the Linux kernel). But that doesn’t mean M:N has no benefits - I think it does, as golang, erlang, and other languages illustrate.

I agree with OP that golang seems to provide the best of both worlds in the “event” vs “thread” debate. We can get the performance benefits of an eventing model with a much simpler programming model of thread per request.

It’s all “semantically” similar but it’s the details that matter. And I think golang chose the correct trade offs here (and with their sub-ms GC as well). The JVM, as an opposing example, made all the wrong choices I think for the general use case. Slow GCs and 1:1 threading.

I always understood the overhead of kernel threads compared to user threads to be significant at large scale. It’s not just stacks either. It can be a lot cheaper to swap between user threads, depending on implementation, compared to the scheduler having to preempt and trap into kernel code and provide a general purpose context switch.

tychver8y ago

There's nothing really that special about Goroutines. Ruby also introduced Fibres in 2007. There's been some discussion of adding a more automatic M:N threading model to Ruby 3.

olivierva8y ago· 6 in thread

The Go network stack already makes use of epoll and kqueue: https://golang.org/src/runtime/netpoll_epoll.go So I'm not quiet sure why this would be faster since almost all I/O in Go is event driven, including the networking stack.

willvarfar8y ago

The benchmarks at the bottom of the readme show quite an improvement (with a single thread it seems).

I would speculate the performance win is because there is no stack switching and less channels.

I've done lots of event loops in the past (eg hellepoll in c++) and think that the cost of that is on the programmer - keeping track of things, callbacks, state machines and things and avoiding using the stack for state etc is all hard work and easy to mess up.

I am reminded of this post I saw on HN a while ago https://www.mappingthejourney.com/single-post/2017/08/31/epi... Ryan Dahl, creator of node.js, would just use Go today ;)

AcconutOP8y ago

> I've done lots of event loops in the past (eg hellepoll in c++) and think that the cost of that is on the programmer - keeping track of things, callbacks, state machines and things and avoiding using the stack for state etc is all hard work and easy to mess up.

I very much agree. In the past, I have had quite some fun developing a few streaming parsers using Node.js, which also uses an event loop. And while these parser worked relatively good and efficient, debugging them was not an easy task. In addition, understanding the code is also a though challenge, especially for people other than the original authors.

When I started using Go more and more, I really enjoyed the different I/O-model using goroutines and blocking function calls. It also has a few drawbacks but the mental model is a lot easier to reason about.

nly8y ago

This is improving, even in C++. This is what the core loop of a line-based echo server could look like in C++17 (and something very similar compiles today on my machine)

    void echo_loop (tcp::socket socket) {
        io::streambuf buffer;
        std::string line;
        std::error_code ec;
        do {
            ec = co_await async_read_line (socket, buffer, line);
            if (ec)
                break;
            ec = co_await async_write (socket, line);
        } while (!ec);
    }

1 more reply

striking8y ago

Yeah, but that's with 1 maximum thread. The whole point of Go is to use green threading to your advantage.

1 more reply

cafxx8y ago

Well, I guess because the runtime has to do a bunch of work to dispatch the events to the appropriate goroutine that is blocked waiting for that event. Switching and synchronization between goroutines is cheap, not free.

amelius8y ago

And this solution does somehow not switch between goroutines?

1 more reply

tidwall8y ago· 4 in thread

This project is not intended to be a general purpose replacement for the standard Go net package or goroutines. It's for building specialized services such as key value stores, L7 proxies, static websites, etc.

You would not want to use this framework if you need to handle long-running requests (milliseconds or more). For example, a web api that needs to connect to a mongo database, authenticate, and respond; just use the Go net/http package instead.

There are many popular event loop based applications in the wild such as Nginx, Haproxy, Redis, and Memcached. All of these are single-threaded and very fast and written in C.

The reason I wrote this framework is so I can build certain network services that perform like the C apps above, but I also want to continue to work in Go.

AcconutOP8y ago

> It's for building specialized services such as key value stores, L7 proxies, static websites, etc.

First of all, thank you for publishing this project. It's very interesting in my opinion since I never thought about the benefits of an event loop. Would you mind explaining briefly why an event loop is a better suit for these applications? Is it due to performance and efficiency?

jerf8y ago

I'd suggest that's not the right way to look at it. To a first approximation, "everything" is using an event loop nowadays, in that everything is using the same fundamental primitives to handle and dispatch events. In particular, this includes the Go runtime; run "strace" on a Go network program and you'll see these same calls pop up in the strace.

What this does instead is give a Go program direct access to the event loop. The benefit is that it bypasses all of the stuff that Go wraps around the internal event loop call that allows it to implement the way it offers a thread-like interface for you, and integrates with the channel and concurrency primitives, and maintains your position in the call stack between events, etc. The penalty is... the exact same thing, that you lose all the nice stuff that the Go runtime offers to you to implement the thread-like interface, etc., and are back to a lower-level interface that offers less services.

The performance of the Go runtime is "pretty good", especially by scripting language standards, but if you have sufficiently high performance requirements, you will not want to pay the overhead. The pathological case for all of these nice high-level abstractions is a server that handles a ton of network traffic of some sort and needs to do a little something to every request, maybe just a couple dozen cycle's worth of something, at which point paying what could be a few hundred cycles for all this runtime nice stuff that you're not using becomes a significant performance drain. Most people are not doing things where they can service a network request in a few dozen cycles, and the longer it takes to service a single request the more sense it makes to have a nice runtime layer providing you useful services, as it drops in the percentage of CPU time consumed by your program. For the most part, if you are so much as hitting a database over a network connection, even a local one, in your request, you've already greatly exceeded the amount of time you're paying to the runtime, for instance.

It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces. What differs between Node and Twisted and all of the dozens or hundreds of other viable wrappers over these calls is the services automatically provided, not whether or not they are "event loops". Go is an event loop at the kernel level. Node is an event loop at the kernel level. Erlang is an event loop at the kernel level. They aren't all the same, but "event-based" vs. "not event-based" is not the distinction; it's a question of what they lay on top of the underlying event loop, not whether they use it. Even pure OS threads are, ultimately, event loops under the hood, just in the kernel rather than the user space.

2 more replies

shabbyrobe8y ago

This is an extremely helpful explanation. Would you consider adding a "Rationale" subheading to the readme and pasting this in wholesale? Great project, thanks for sharing!

tidwall8y ago

I just added it. Thanks for the suggestion

cdoxsey8y ago· 3 in thread

This is single-threaded? What are you going to do with the other 31 or 63 cores?

The single-threaded nature of applications liked Redis an Haproxy is a singificant impediment to their vertical scalability. CPUs aren't getting faster, we're just going to get more cores, so anything that assumes there's only a single core seems like a dead end.

Haproxy literally just added multithreading support in 1.8.

meritt8y ago

The CPU is rarely the bottleneck and for both Redis/HAProxy the vertical scalability solution has been to launch multiple processes or forks with different core affinities. There are downsides of course (no IPC) but I still argue that CPU is not the bottleneck for 99% of usage scenarios.

HAProxy added threading support in 1.8 as you pointed out and Redis has started the same (for a certain subset of processing) in 4.0 as well. They're getting there but concurrency is tough.

To suggest that his product is a "dead end" due to not supporting threading seems a bit premature, as Redis and HAProxy are extremely well-regarded in their niche and they made it there without threading, and we've been at maximal clock speed for nearly a decade.

cdoxsey8y ago

> There are downsides of course (no IPC) but I still argue that CPU is not the bottleneck for 99% of usage scenarios.

I suppose my experience might be unusual, but I frequently log in to c3.8xlarge redis machines that have a single core pegged at 100% and the rest doing nothing. Yes multiple processes help, but that requires updating clients and makes it harder to share memory.

> To suggest that his product is a "dead end" due to not supporting threading seems a bit premature, as Redis and HAProxy are extremely well-regarded in their niche and they made it there without threading.

Well yeah, CPUs hitting their GHZ limit and the dramatic increase in the number of cores per machine is a relatively recent phenomena.

I just think its weird to start a brand new project making those same assumptions, especially when the underlying programming language was explicitly designed with concurrency in mind.

It'd be like building a new networking library in Rust which ditches memory safety.

tidwall8y ago

> This is single-threaded? What are you going to do with the other 31 or 63 cores?

Yes, the event loop is single-threaded. The other cores can be used for other stuff, but not the event loop.

It's completely possible with this library to process operations in a background thread and wake up the loop when it's time to write a response. If that's what the developer desires.

> anything that assumes there's only a single core seems like a dead end.

If my documentation somehow implies that systems running this library do not have multiple cores then I'm sorry for the confusion. This library makes no assumption about the host server, and it does not limit the application to a single core. It just runs the event loop in one thread.

bsaul8y ago· 2 in thread

Not sure i understand what the use case is. As soon as you start doing something on the event loop , you need some kind of way to perform the operation in another "thread" ( or goroutine or whatever). And then you start to need some kind of concurrency mechanism, and pay the price.

Stripping those mechanism to pretend the event handling is faster only works if you never intend to have some real computation performed. That's never true in practice... Or am i missing something ?

dboreham8y ago

Not the OP but typically you resort to these tactics when you want to shave the last ms of the server's response time, and/or get that last 1000 requests/s/core performance. You have a "fast path" that is simple and event driven and hand off operation processing to regular threads for the (less frequent) more complex operations.

So you're not missing anything.

nvarsj8y ago

Aren’t you losing a lot of the benefit by using a GC’d language in the first place?

fooyc8y ago· 1 in thread

I love Go because I never had to write asynchronous, callback driven programs in this language. I hope it won't become the norm in Go, too.

tidwall8y ago

It won't become the norm. I promise you, cross my heart and hope to die, stick a needle in my eye.

indescions_20178y ago· 1 in thread

I'd be interested in the level seven reverse proxy application. As well as unix domain socket message queues. There are probably many other places in the networking pipeline evio could provide a boost.

It's a testament to what is possible through the "syscall" and "golang/x/sys" facilities. As well as your confidence in playing with Linux internals ;)

tidwall8y ago

Thanks! The L7 proxy should be pretty sweet. :)

brian-armstrong8y ago· 1 in thread

At this point, why not just use C++? I feel like people are trying to stretch Go way past what it's good for. It's not going to replace C++ where C++ is effective, and it shouldn't :)

pcwalton8y ago

Because memory safety is much easier to achieve in Go than in C++, even "modern C++".

adrianratnapala8y ago

Worst of both worlds, only faster?

j / k navigate · click thread line to collapse