C++20: Building a Thread-Pool with Coroutines (opens in new tab)

(blog.eiler.eu)

122 pointsMichaEiler5y ago62 comments

62 comments

32 comments · 6 top-level

cletus5y ago· 7 in thread

C++20 coroutines confuse me. Like it's not clear to me what problem they solve.

For the last few years I've been doing Hack (Facebook's PHP fork) professionally and async-await as cooperative multitasking is pervasive. IMHO it's a really nice model. Generally speaking, I've come around to believing that if it ever comes down to you spawning your own thread, you're going to have a Bad Time.

Go's channels are another variant of this.

The central idea in both cases is that expressing dependencies this way is often sufficient and way easier to write than true multithreaded code.

C++20 coroutines don't seem to solve this problem as best as I can tell.

It actually seems like C++20 coroutines are closer to Python generators. Is this the case? Or is this a classic case of a camel is a horse designed by committee and the C++ standards committee tried to create primitives to handle these and possibly other use cases? I honestly don't know.

boardwaalk5y ago

They solve exactly the problem you describe; I'm not sure what you're missing. What are you thinking you can't do? I have a project where I've put C++ coroutines on top of libuv and I can do essentially anything I could do in JS/C#/Rust async/await with task/co_await/etc with the imperative style you'd expect.

You may have looked at them at too low a level. Check out something like cppcoro to see what you can do. I don't use it myself, but I've stolen a few things, like task<>, which is a pretty core thing that the stdlib does not provide.

Goroutines are not cooperative multitasking, by the way, they're non-OS/"green" threads. Until you do something silly like run CPU-bound code that doesn't hit any yield points and you have to put them in yourself (at least the last time I used Go, it's been awhile).

harikb5y ago

While the starvation behavior of cooperative green-threads isn’t ideal as native threads, the idea is that

1. Properly written code will perform well, whether async/await or Go style.

2. Making async easy makes one use it in more places. In additon having caller decide to run something sync or async also makes it way more useful. In Async/await model that can only work if all methods are declared async - very costly in complexity

1 more reply

ninepoints5y ago

You're missing the big idea a bit. Coroutines in C++ can be used to implement generators or goroutines or async/await, etc. They are intended for library authors as a lower level construct. See for example: https://www.jeremyong.com/cpp/2021/01/04/cpp20-coroutines-a-...

cletus5y ago

I literally said this at the end of my comment about coroutines being a low-level primitive.

1 more reply

saurik5y ago

I use them constantly in my work on Orchid, and, FWIW, one of your coworkers--Lewis Baker--was seemingly hired to help champion the standards and work on Folly Coro. I use the feature much the same way I would in, say, JavaScript (doing stuff like "await Fetch(...)" or "await Query(...)"); I honestly am sufficiently confused by your question that I am having a hard time saying much more... the C++ version of this primitive is great because it doesn't have any baggage about "executors" or anything and lets me have complete control over how the task switching happens. You can build generators with the feature, but I almost never do; there are a couple things I wish the C++ people had done with their implementation (which involves a weirdly thick-feeling interface to implement, and which makes it difficult to truly avoid memory allocation)--in particular, including some of Lewis Baker's interests in asynchronous deconstruction, without which I am finding myself often boxed into a corner (though what I really want, honestly, are true "linear" types)--but by and large I love the C++ implementation of this for its complete flexibility.

harikb5y ago

Main problems with async await model is that the callee decides whether something should run sync/async. In goroutines model, caller decides

Matthias2475y ago

I don't think so. coroutine awaits are just like normal function calls in Go (since all of Go functions are implicitely suspendable).

I guess you meant the "go" statement? That is more of a coroutine spawn thing, and this would be a separate function in C++ too.

1 more reply

sannysanoff5y ago· 6 in thread

I used similar thing, baked on top of cppcoro library (wonderful thing). My application is heavily threaded with hundreds of thousands of short-lived micro-tasks, it's interpreter of highly-parallel expressions, and values are large matrices containing expressions, so it's highly parallelizable.

I moved to C++ coroutines from composable futures (CF) library that had few thread pool implementations if memory serves (and before CF all was written with callback hell). CF out of the box had extra CPU overhead because internal implementation was not efficient enough for my use, too much templates and copying when switching tasks. Also, spawned tasks had to reference shared pointers in user space (my app code), and unneeded frequent shared pointers copying added unneeded overhead.

I rewrote CF implementation later completely, so before coroutines my app used CF API extensively, but with stuff reimplemented, however shared pointers copying was something still far from perfection.

In addition to that I had some abstraction (like async/await/spawn/wait_all) on top of CF API, so transformation of application code was not painful. I had to rewrite synchronization primitives to use mutexes which came with cppcoro, and change my own internal scheduler to use some other new primitives.

I was afraid that storing local variable in coroutines frames (instead of stack frames) would affect performance, but for some reason it did not.

I also expected compilation time to increase, but for some reason it mostly did not. Probably template expansion takes all time, so coroutines code transformation fades in comparison.

Since then I stopped using C++ coroutines .

I dropped it for following reason:

1) unable to debug. Debugger does not have access to local variables, or I cannot enable it. Reference time point: around 9 months ago. Also, stack traces. They are missing, and of course, no help from tools. You have core file, go figure.

2) g++ support was missing in the early days when i employed coroutines (clang 9 was just released), but even clang 10 compiler produced wrong code, when using suspended lambda functions. I use lambdas a lot, and as suspended functions spoil the code base, lambdas inevitably become spoiled too. So, it was just occasional SIGSEGV or wrong values. There was a workaround to move 100% of the lambda body to a separated function and then call it from lambda, but it destroys all lambda beauty.

I moved to chinese libgo (can be found on github). I don't use syscall interceptors it offers, I just use cooperative scheduler it provides, along with synchronization primitives it offers. It's stackful cooperative multitasking which keeps all yummy things. And yes, it seemingly performs slightly better in my case. And yes, i had to patch it slightly.

TLDR: dropped c++ stackless coroutines in favor of stackful coroutines (cooperative stack switching), what a relief!

MichaEilerOP5y ago

Having only briefly looked through the code base of libgo. It looks like they use boost::context, which is the only good stackful coroutines implementations I've come across. Not being familiar with your project I'm slightly confused about the statement of "hundreds of thousands of short-lives micro-tasks". This is usually a no-go in case of stackful coroutines, as you would waste too much memory as well as have a lot of overhead.

Regarding your debugging issues. I'd be surprised if this doesn't improve over the next year or two. Clang afaik isn't even fully compatible with the final version of coroutines yet. Microsoft has done a lot of work on the compiler itself. I'd assume that Visual Studio will likely ship improvements once they release VS2022(?). Of course these are only guesses from my side.

Summing it up it sounds to me like you suffered from the curse of being an early adopter. It would be interesting to see if you'd have less issues once tooling and compiler support has improved enough.

sannysanoff5y ago

I have an internal scheduler which prevents spawning too many of them. In any case, it's only stack & context allocation (and freeing) which is not that CPU expensive at my rate (does not show in profiler too much). Also, multiple concurrent processes in worst cases use much virt memory, because of stack allocations, not much resident memory in fact.

> once compiler support has improved enough

I give it min 5 years. It's already few year since it was in clang. I don't believe it will be fixed soon in gdb/lldb. You need to introduce many non-generic things: at least new stack chaining debug information for proper call-stacks, which is (and will!) be threadpool-implementation specific, because otherwise it should be part of standard, part of compiler implementation which is even worse. With local vars it's slightly easier however.

sannysanoff5y ago

it does not look like it's using boost::context. At least I never saw it in runtime. It's using its own asm routines for save/restore the context.

1 more reply

einpoklum5y ago

> chinese libgo

I believe you mean this one? :

libgo -- a coroutine library and a parallel Programming Library

https://github.com/yyzybb537/libgo

(no information about the main contributor unfortunately)

sannysanoff5y ago

yes

pjmlp5y ago

Well, they have been central to WinRT since the early days, and it was Microsoft input that largely contributed to the design.

So more an issue of tooling than anything else.

secondcoming5y ago· 5 in thread

I'll admit I find coroutines difficult to grok. It seems to me that 'callback hell' is turning into 'coroutine hell'. The only plausible use-case I can see is enabling functionality similar to that of Python's `yield`.

Does threadpool::thread_loop() not have to check if the popped coroutine is suspended before attempting to resume it?

Are they really more efficient than normal callbacks when doing async?

drenvuk5y ago

You're not the only one. Coroutines are complicated as hell and have too much boiler plate BUT once you handle it for a general enough case you get javascript-esque async await syntax which is very, very nice.

Take for instance, this code which relies on libuv for its event loop and co_await to retain its state during its execution: https://gist.github.com/Qix-/09532acd0f6c9a57c09bd9ce31b3023...

Lets say that you want to batch a bunch of database operations into one transaction. You could queue them up over the course of a few milliseconds, run the transaction, and then for each context that relied on different db operations simply return to each's previous point instead of having to call a handler. Granted, the handler is now inside of the `await_transform` needed to work with `co_await` but think of the possibilities. No weirdly separate callback function, no real need to make a class that encapsulates all of the operations for let's say a user's post request, and to top it all off, you can do this on a single thread. It's a tool for cleaner code but I'll be damned if it is really easy to understand.

It's just so much stupid boiler plate and a strange way of putting it together.

asdfasgasdgasdg5y ago

It still boggles the mind that they made it so hard to use this stuff in terms of boilerplate. Hopefully that will all be abstracted into a library that handles io, networking, multiprocessing and synchronization so most of us can just focus on writing the bits that do stuff. But I will never understand how the std maintainers could not manage to do this when every other language is doing async await support in their standard library Io utilities.

4 more replies

MichaEilerOP5y ago

There are a lot more possibilities. The reason why they are so complicated is simply to give the library developers all possible options. This features is afaik something targeted at them, so that boost, asio and others can implement powerful solutions without language restrictions.

The only way to enqueue a coroutine is to call schedule() within a co_await statement/expression. In this process the coroutine is suspended. Therefore there should not be any coroutine within the queue, which we cannot immediately resume.

I'm afraid I don't have any numbers available to compare coroutines with other approaches. But nevertheless in my opinion coroutines are benefitial because they keep their state (the stack frame, local variables) alive. If you use callbacks you would have to handle all these things yourself. Think about a generator for a sequence of numbers. You would have to store at least the counter variable manually. With a coroutine this happens automatically.

contravariant5y ago

For what it's worth Python's yield pretty much is a coroutine, as it has a (very rarely used) feature that allows information from the caller to be passed back to the generator.

The c++ implementation seems closer to lisps 'call with current continuation', though as far as I can tell all implementations achieve more or less the same thing (though thread safety might vary among the options).

Actually continuation passing style (callbacks) are another way of doing the same thing, though they have the disadvantage that they require large structural changes to the code. It wouldn't surprise me if the callback hell can therefore also occur in all versions, though some might make it easier than others (python's implementation in particular makes it somewhat less likely by encouraging information to flow one way)

danuker5y ago

> as it has a (very rarely used) feature that allows information from the caller to be passed back to the generator.

The Twisted library encouraged heavy use of this before Python implemented async/await.

https://twistedmatrix.com/documents/current/core/howto/defer...

sys_647385y ago· 4 in thread

GO style co-routines and native JSON support would pretty much consign GO to history, IMO.

jerf5y ago

If things could be "consigned to history" by C++ adding more features, there'd hardly be any other languages left! It has all the features. That's its biggest problem.

logicchains5y ago

Noo, it's missing something basic: struct introspection. Unlike almost every other modern language, it's not possible to write a generic "ToJson" or "ToString" that will work for any struct.

1 more reply

pjmlp5y ago

Usually only C++ gets the blame, but that is only because most don't pay attention to other language's reference manuals and standard libraries.

I love to pick on Python for such examples, because it is considered to be the new BASIC, yet when I pick the standard language reference + standard library, the amount of pages outgrows those of ISO C++.

Then there is the list of breaking changes that have happened even across minor versions since Python 1.0.

hedora5y ago

JSON for modern C++ and this article pretty much cover it.

C++ tends to be the second best language for everything, and this is no exception. Go beats it at Go’s own niche: it has great compilation times and it forces you down a sane asynchronous programming path.

C++ fails on both those criteria. However, once you fall off the happy path in Go, you’re probably completely screwed, where as with C++, you’re already using the second best language for whatever your new problem is.

ptr5y ago· 3 in thread

Does anyone know when Coroutines are expected to show up in compilers without enabling experimental flags?

gumby5y ago

Gcc for a while, says cppreference: https://en.cppreference.com/w/cpp/compiler_support#cpp20

pjmlp5y ago

Today on Visual C++ 2019.

varispeed5y ago

I am so out of touch with C++. Can you cross compile to Linux on VC++? Does VC++ support MCUs like STM32?

1 more reply

tele_ski5y ago· 1 in thread

Really nice write up! I'm excited to see more coroutine tutorials and guides come out, I think this C++20 feature has huge potential to make C++ easier to use over the next decade. I will also say I was a bit surprised to see libcoro linked in the article! I'm glad you found it useful but I need to give most of the credit to Lewis Baker's cppcoro as well -- I learned most of what I implemented into libcoro from his fantastic repository and then tuning the coroutine primitives to how I'd want to use the library for an HTTP web server. I just generally find there is no better way to truly learn a difficult concept than to roll your own version.

MichaEilerOP5y ago

I completely agree!

Why not benefit from Lewis's work on cppcoro? He obviously has thought through the most important issues one would otherwise also stumble across. Unfortunately cppcoro doesn't look like it is actively maintained, which his why I was looking for other implementations. I'm excited to see how your library will progress in the future!

j / k navigate · click thread line to collapse

62 comments

32 comments · 6 top-level

cletus5y ago· 7 in thread

C++20 coroutines confuse me. Like it's not clear to me what problem they solve.

Go's channels are another variant of this.

The central idea in both cases is that expressing dependencies this way is often sufficient and way easier to write than true multithreaded code.

C++20 coroutines don't seem to solve this problem as best as I can tell.

boardwaalk5y ago

harikb5y ago

While the starvation behavior of cooperative green-threads isn’t ideal as native threads, the idea is that

1. Properly written code will perform well, whether async/await or Go style.

1 more reply

ninepoints5y ago

cletus5y ago

I literally said this at the end of my comment about coroutines being a low-level primitive.

1 more reply

saurik5y ago

harikb5y ago

Main problems with async await model is that the callee decides whether something should run sync/async. In goroutines model, caller decides

Matthias2475y ago

I don't think so. coroutine awaits are just like normal function calls in Go (since all of Go functions are implicitely suspendable).

I guess you meant the "go" statement? That is more of a coroutine spawn thing, and this would be a separate function in C++ too.

1 more reply

sannysanoff5y ago· 6 in thread

I was afraid that storing local variable in coroutines frames (instead of stack frames) would affect performance, but for some reason it did not.

I also expected compilation time to increase, but for some reason it mostly did not. Probably template expansion takes all time, so coroutines code transformation fades in comparison.

Since then I stopped using C++ coroutines .

I dropped it for following reason:

TLDR: dropped c++ stackless coroutines in favor of stackful coroutines (cooperative stack switching), what a relief!

MichaEilerOP5y ago

sannysanoff5y ago

> once compiler support has improved enough

sannysanoff5y ago

it does not look like it's using boost::context. At least I never saw it in runtime. It's using its own asm routines for save/restore the context.

1 more reply

einpoklum5y ago

> chinese libgo

I believe you mean this one? :

libgo -- a coroutine library and a parallel Programming Library

https://github.com/yyzybb537/libgo

(no information about the main contributor unfortunately)

sannysanoff5y ago

yes

pjmlp5y ago

Well, they have been central to WinRT since the early days, and it was Microsoft input that largely contributed to the design.

So more an issue of tooling than anything else.

secondcoming5y ago· 5 in thread

Does threadpool::thread_loop() not have to check if the popped coroutine is suspended before attempting to resume it?

Are they really more efficient than normal callbacks when doing async?

drenvuk5y ago

Take for instance, this code which relies on libuv for its event loop and co_await to retain its state during its execution: https://gist.github.com/Qix-/09532acd0f6c9a57c09bd9ce31b3023...

It's just so much stupid boiler plate and a strange way of putting it together.

asdfasgasdgasdg5y ago

4 more replies

MichaEilerOP5y ago

contravariant5y ago

For what it's worth Python's yield pretty much is a coroutine, as it has a (very rarely used) feature that allows information from the caller to be passed back to the generator.

danuker5y ago

> as it has a (very rarely used) feature that allows information from the caller to be passed back to the generator.

The Twisted library encouraged heavy use of this before Python implemented async/await.

https://twistedmatrix.com/documents/current/core/howto/defer...

sys_647385y ago· 4 in thread

GO style co-routines and native JSON support would pretty much consign GO to history, IMO.

jerf5y ago

If things could be "consigned to history" by C++ adding more features, there'd hardly be any other languages left! It has all the features. That's its biggest problem.

logicchains5y ago

Noo, it's missing something basic: struct introspection. Unlike almost every other modern language, it's not possible to write a generic "ToJson" or "ToString" that will work for any struct.

1 more reply

pjmlp5y ago

Usually only C++ gets the blame, but that is only because most don't pay attention to other language's reference manuals and standard libraries.

Then there is the list of breaking changes that have happened even across minor versions since Python 1.0.

hedora5y ago

JSON for modern C++ and this article pretty much cover it.

ptr5y ago· 3 in thread

Does anyone know when Coroutines are expected to show up in compilers without enabling experimental flags?

gumby5y ago

Gcc for a while, says cppreference: https://en.cppreference.com/w/cpp/compiler_support#cpp20

pjmlp5y ago

Today on Visual C++ 2019.

varispeed5y ago

I am so out of touch with C++. Can you cross compile to Linux on VC++? Does VC++ support MCUs like STM32?

1 more reply

tele_ski5y ago· 1 in thread

MichaEilerOP5y ago

I completely agree!

j / k navigate · click thread line to collapse