Epoll is fundamentally broken (2017) (opens in new tab)

(idea.popcount.org)

81 pointsjustin_3y ago41 comments

41 comments

22 comments · 4 top-level

rwaksmunski3y ago· 7 in thread

Linux was the last to implement "events", they had IO completion ports and kqueue to learn from. They decided to implement something broken and document it as such, it was right in the man page. I find that astonishing. That reminds me, dnotify and inotify are probably still there. I'm bitter because I've had to port a clean kqueue event loop to Linux as a younger programmer and it was a dreadful experience.

ajross3y ago

Was your experience the same thing as the linked article? I mean, reading this as someone who knows epoll and not kqueues quite as much, it sounds very much like "I tried to use this API like something I already know and it didn't work, so I hate it and it's broken".

The linked article basically goes through a bunch of scenarios examining clumsy ways to chase a slightly-obscure requirement ("wake up exactly one thread per event using a single descriptor") just to land in the very last paragraph on the clearly correct solution ("just use ONESHOT, that's what it's for!").

I mean, there's literally a feature right there in the man page[1] that does exactly what the author wants. They just didn't want to learn about it and view their ignorance as a bug in the software.

Meh. This gets tiresome. As the article linked yesterday points out, epoll() is the "API that powers the internet", and has effectively solved the C10k problem for everyone that has it.

But yeah, you need to read the man page.

[1] No, seriously, it's right there in the man page, discussing exactly this scenario and how to avoid it.

naasking3y ago

> No, seriously, it's right there in the man page, discussing exactly this scenario and how to avoid it.

The point is, do all of the other configurations for epoll have legitimate usecases justifying the complexity and need for those parameters? The kqueue design scales from single-threaded to multithreaded scenarios without issue and without all of these pitfalls, so why not just adopt that design? Why does the specific issue need to be a solution described in the man page at all?

> As the article linked yesterday points out, epoll() is the "API that powers the internet", and has effectively solved the C10k problem for everyone that has it.

Being able to solve a problem and doing it well are not the same. The latter arguably deserves criticism.

1 more reply

rwaksmunski3y ago

I've had to deal with forks and threads, epoll and inotify. It was beyond my skills at the time.

nextaccountic3y ago

Is inotify broken? What should be used instead of it to watch for filesystem events?

mastax3y ago

http://wingolog.org/archives/2018/05/21/correct-or-inotify-p...

1 more reply

rwaksmunski3y ago

I don't know man, I've given up and handed it over to someone more senior. Their port was half the performance and the project was killed. We kept on using FreeBSD for the product.

2 more replies

jgerrish3y ago

Jesus, is it sad when I hear senior now I don't dream of leading the design of big systems?

I think of stakeholder meetings and fighting business requirements?

I had an inotify project in process for creating a better developer experience. Seeing this hit Hacker News tells me it's a fucking political landmine.

Sigh.

Don't worry, I'll have the strength to argue the proper technical solution that still satisfies business needs. That's the important thing!

1 more reply

adontz3y ago· 7 in thread

My experience of epoll is that it's for single process, single thread network I/O mostly. If you want multithreaded network server use SO_REUSEPORT, and one epoll per thread. Everything else is either broken, overcomplicated or slow.

mritun3y ago

This is the correct answer (on Linux) and the article dismisses it after lightly touching on it’s correctness.

In a multi-threaded environment someone needs to pay the cost of synchronization if the entire event-queue is loadbalanced. If you don’t want the events to be trivially load-balanced and want the events from one fd to be delivered to a single worker, then it’s way better to use SO_REUSEPORT and get it right from the get go.

Expecting kernel to solve a problem of user-space’s making is asking for trouble - can be done, but the edge cases will sink your project!

naasking3y ago

> Expecting kernel to solve a problem of user-space’s making is asking for trouble

These are problems the kernel has introduced. I'm not sure you've read article carefully enough.

wbl3y ago

Or use IOCP on Windows or kqueue on FreeBSD. IOCP is particularly interesting as it will interact with the scheduler and release more threads if ones get blocked, trying to right-size CPU usage.

throwaway8583y ago

The Go runtime (and GHC runtime) use epoll from multiple threads and don't use SO_REUSEPORT and are highly efficient. I do believe that they use one epoll per thread.

scaramanga3y ago

Another correct solution which avoids the issue which is entirely of userspaces creation :)

jbluepolarbear3y ago

This is what I was thinking reading this. Why are they reading the the same buffer in 2 threads. This problem will always happen if reading the same buffer in 2 threads.

naasking3y ago

They are not. This happens for any fd, like a socket waiting for a connection.

dxuh3y ago· 4 in thread

I have used select and poll before and sort of skipped to io_uring for a recent project, which also doesn't do too well with multiple threads (you have to use multiple rings and do everything else yourself). It's a shame that there is no obvious, relatively easy to use async IO mechanism on Linux that you can use from multiple threads without getting yourself in trouble. Reading more about kqueue it looks a lot like something that would solve it. Why was it not ported to Linux? Would it have been too hard to integrate? The linked article is great btw. Very informative and concise!

LaLaLand1223y ago

> you have to use multiple rings and do everything else yourself

Since you are supposed to use liburing, not the kernel interface directly, I guess somebody could add multithreading "support" to it.

Or at least add documentation/examples of the most common/performant options: https://github.com/shuveb/loti/issues/4

AFAIR Windows IOCP handles multithreading by:

- Handling locking at kernel level, the syscall is thread safe

- Making it LIFO, to keep things in the same threads, to have a decent cache behaviour.

It's as simple as it gets.

ilyt3y ago

> I have used select and poll before and sort of skipped to io_uring for a recent project, which also doesn't do too well with multiple threads (you have to use multiple rings and do everything else yourself

Wouldn't that be on purpose ? Coordination requires more cpu cycles and so cuts on max performance

> Why was it not ported to Linux?

We've been asking since epoll got introduced.

But here is some context: https://lwn.net/Articles/431297/

kasabali3y ago

> Why was it not ported to Linux?

NIH

thatcherc3y ago

National Institutes of Health?

2 more replies

samsquire3y ago

I wrote an epoll echo server that multiplexes multiple network connections over threads (multiple users per thread)

https://github.com/samsquire/epoll-server

I also have a 1:M:N (1 scheduler thread, M kernel threads and N lightweight green threads) multithreaded userspace scheduler which multiplexes lightweight threads onto kernel threads and can preempt hot loops with minimal overhead. I rely on the fact that you can change the looping variable from another thread if you use a structure. Preemptive interruption is very useful for the illusion of multitasking. That's why I call it a userspace scheduler.

https://GitHub.com/samsquire/preemptible-thread

I think the epoll-server which is kind of similar to what libuv does and the userspace scheduler could be combined into an application server.

I also wrote a multithreaded actor implementation in Java. Threads can communicate with each other between 60 million - 100 million messages a second. The epoll-server uses a multiconsumer multiproducer lockless RingBuffer.

https://GitHub.com/samsquire/multicersion-concurrency-contro...

I think the core fundamentals of building a performant application server should be done once and reused for each application.

I want to also split the threading used by recv and send of a socket so that we have a 1:RecvKernelThread SendKernelThread with 1 RK+SK assigned to Socket:N scheduling (1 scheduler thread, 1 assigned Recv thread, 1 assigned send thread per socket). So you can send while you receive and receive while you send. True multiplexing!

We can decouple CPU and IO completely with threading.

j / k navigate · click thread line to collapse

41 comments

22 comments · 4 top-level

rwaksmunski3y ago· 7 in thread

ajross3y ago

I mean, there's literally a feature right there in the man page[1] that does exactly what the author wants. They just didn't want to learn about it and view their ignorance as a bug in the software.

Meh. This gets tiresome. As the article linked yesterday points out, epoll() is the "API that powers the internet", and has effectively solved the C10k problem for everyone that has it.

But yeah, you need to read the man page.

[1] No, seriously, it's right there in the man page, discussing exactly this scenario and how to avoid it.

naasking3y ago

> No, seriously, it's right there in the man page, discussing exactly this scenario and how to avoid it.

> As the article linked yesterday points out, epoll() is the "API that powers the internet", and has effectively solved the C10k problem for everyone that has it.

Being able to solve a problem and doing it well are not the same. The latter arguably deserves criticism.

1 more reply

rwaksmunski3y ago

I've had to deal with forks and threads, epoll and inotify. It was beyond my skills at the time.

nextaccountic3y ago

Is inotify broken? What should be used instead of it to watch for filesystem events?

mastax3y ago

http://wingolog.org/archives/2018/05/21/correct-or-inotify-p...

1 more reply

rwaksmunski3y ago

I don't know man, I've given up and handed it over to someone more senior. Their port was half the performance and the project was killed. We kept on using FreeBSD for the product.

2 more replies

jgerrish3y ago

Jesus, is it sad when I hear senior now I don't dream of leading the design of big systems?

I think of stakeholder meetings and fighting business requirements?

I had an inotify project in process for creating a better developer experience. Seeing this hit Hacker News tells me it's a fucking political landmine.

Sigh.

Don't worry, I'll have the strength to argue the proper technical solution that still satisfies business needs. That's the important thing!

1 more reply

adontz3y ago· 7 in thread

mritun3y ago

This is the correct answer (on Linux) and the article dismisses it after lightly touching on it’s correctness.

Expecting kernel to solve a problem of user-space’s making is asking for trouble - can be done, but the edge cases will sink your project!

naasking3y ago

> Expecting kernel to solve a problem of user-space’s making is asking for trouble

These are problems the kernel has introduced. I'm not sure you've read article carefully enough.

wbl3y ago

Or use IOCP on Windows or kqueue on FreeBSD. IOCP is particularly interesting as it will interact with the scheduler and release more threads if ones get blocked, trying to right-size CPU usage.

throwaway8583y ago

The Go runtime (and GHC runtime) use epoll from multiple threads and don't use SO_REUSEPORT and are highly efficient. I do believe that they use one epoll per thread.

scaramanga3y ago

Another correct solution which avoids the issue which is entirely of userspaces creation :)

jbluepolarbear3y ago

This is what I was thinking reading this. Why are they reading the the same buffer in 2 threads. This problem will always happen if reading the same buffer in 2 threads.

naasking3y ago

They are not. This happens for any fd, like a socket waiting for a connection.

dxuh3y ago· 4 in thread

LaLaLand1223y ago

> you have to use multiple rings and do everything else yourself

Since you are supposed to use liburing, not the kernel interface directly, I guess somebody could add multithreading "support" to it.

Or at least add documentation/examples of the most common/performant options: https://github.com/shuveb/loti/issues/4

AFAIR Windows IOCP handles multithreading by:

- Handling locking at kernel level, the syscall is thread safe

- Making it LIFO, to keep things in the same threads, to have a decent cache behaviour.

It's as simple as it gets.

ilyt3y ago

Wouldn't that be on purpose ? Coordination requires more cpu cycles and so cuts on max performance

> Why was it not ported to Linux?

We've been asking since epoll got introduced.

But here is some context: https://lwn.net/Articles/431297/

kasabali3y ago

> Why was it not ported to Linux?

NIH

thatcherc3y ago

National Institutes of Health?

2 more replies

samsquire3y ago

I wrote an epoll echo server that multiplexes multiple network connections over threads (multiple users per thread)

https://github.com/samsquire/epoll-server

https://GitHub.com/samsquire/preemptible-thread

I think the epoll-server which is kind of similar to what libuv does and the userspace scheduler could be combined into an application server.

https://GitHub.com/samsquire/multicersion-concurrency-contro...

I think the core fundamentals of building a performant application server should be done once and reused for each application.

We can decouple CPU and IO completely with threading.

j / k navigate · click thread line to collapse