Why not benefit from Lewis's work on cppcoro? He obviously has thought through the most important issues one would otherwise also stumble across. Unfortunately cppcoro doesn't look like it is actively maintained, which his why I was looking for other implementations. I'm excited to see how your library will progress in the future!
Does threadpool::thread_loop() not have to check if the popped coroutine is suspended before attempting to resume it?
Are they really more efficient than normal callbacks when doing async?
Take for instance, this code which relies on libuv for its event loop and co_await to retain its state during its execution: https://gist.github.com/Qix-/09532acd0f6c9a57c09bd9ce31b3023...
Lets say that you want to batch a bunch of database operations into one transaction. You could queue them up over the course of a few milliseconds, run the transaction, and then for each context that relied on different db operations simply return to each's previous point instead of having to call a handler. Granted, the handler is now inside of the `await_transform` needed to work with `co_await` but think of the possibilities. No weirdly separate callback function, no real need to make a class that encapsulates all of the operations for let's say a user's post request, and to top it all off, you can do this on a single thread. It's a tool for cleaner code but I'll be damned if it is really easy to understand.
It's just so much stupid boiler plate and a strange way of putting it together.
The only way to enqueue a coroutine is to call schedule() within a co_await statement/expression. In this process the coroutine is suspended. Therefore there should not be any coroutine within the queue, which we cannot immediately resume.
I'm afraid I don't have any numbers available to compare coroutines with other approaches. But nevertheless in my opinion coroutines are benefitial because they keep their state (the stack frame, local variables) alive. If you use callbacks you would have to handle all these things yourself. Think about a generator for a sequence of numbers. You would have to store at least the counter variable manually. With a coroutine this happens automatically.
The c++ implementation seems closer to lisps 'call with current continuation', though as far as I can tell all implementations achieve more or less the same thing (though thread safety might vary among the options).
Actually continuation passing style (callbacks) are another way of doing the same thing, though they have the disadvantage that they require large structural changes to the code. It wouldn't surprise me if the callback hell can therefore also occur in all versions, though some might make it easier than others (python's implementation in particular makes it somewhat less likely by encouraging information to flow one way)
The Twisted library encouraged heavy use of this before Python implemented async/await.
https://twistedmatrix.com/documents/current/core/howto/defer...
I moved to C++ coroutines from composable futures (CF) library that had few thread pool implementations if memory serves (and before CF all was written with callback hell). CF out of the box had extra CPU overhead because internal implementation was not efficient enough for my use, too much templates and copying when switching tasks. Also, spawned tasks had to reference shared pointers in user space (my app code), and unneeded frequent shared pointers copying added unneeded overhead.
I rewrote CF implementation later completely, so before coroutines my app used CF API extensively, but with stuff reimplemented, however shared pointers copying was something still far from perfection.
In addition to that I had some abstraction (like async/await/spawn/wait_all) on top of CF API, so transformation of application code was not painful. I had to rewrite synchronization primitives to use mutexes which came with cppcoro, and change my own internal scheduler to use some other new primitives.
I was afraid that storing local variable in coroutines frames (instead of stack frames) would affect performance, but for some reason it did not.
I also expected compilation time to increase, but for some reason it mostly did not. Probably template expansion takes all time, so coroutines code transformation fades in comparison.
Since then I stopped using C++ coroutines .
I dropped it for following reason:
1) unable to debug. Debugger does not have access to local variables, or I cannot enable it. Reference time point: around 9 months ago. Also, stack traces. They are missing, and of course, no help from tools. You have core file, go figure.
2) g++ support was missing in the early days when i employed coroutines (clang 9 was just released), but even clang 10 compiler produced wrong code, when using suspended lambda functions. I use lambdas a lot, and as suspended functions spoil the code base, lambdas inevitably become spoiled too. So, it was just occasional SIGSEGV or wrong values. There was a workaround to move 100% of the lambda body to a separated function and then call it from lambda, but it destroys all lambda beauty.
I moved to chinese libgo (can be found on github). I don't use syscall interceptors it offers, I just use cooperative scheduler it provides, along with synchronization primitives it offers. It's stackful cooperative multitasking which keeps all yummy things. And yes, it seemingly performs slightly better in my case. And yes, i had to patch it slightly.
TLDR: dropped c++ stackless coroutines in favor of stackful coroutines (cooperative stack switching), what a relief!
Regarding your debugging issues. I'd be surprised if this doesn't improve over the next year or two. Clang afaik isn't even fully compatible with the final version of coroutines yet. Microsoft has done a lot of work on the compiler itself. I'd assume that Visual Studio will likely ship improvements once they release VS2022(?). Of course these are only guesses from my side.
Summing it up it sounds to me like you suffered from the curse of being an early adopter. It would be interesting to see if you'd have less issues once tooling and compiler support has improved enough.
> once compiler support has improved enough
I give it min 5 years. It's already few year since it was in clang. I don't believe it will be fixed soon in gdb/lldb. You need to introduce many non-generic things: at least new stack chaining debug information for proper call-stacks, which is (and will!) be threadpool-implementation specific, because otherwise it should be part of standard, part of compiler implementation which is even worse. With local vars it's slightly easier however.
I believe you mean this one? :
libgo -- a coroutine library and a parallel Programming Library
https://github.com/yyzybb537/libgo
(no information about the main contributor unfortunately)
So more an issue of tooling than anything else.
I love to pick on Python for such examples, because it is considered to be the new BASIC, yet when I pick the standard language reference + standard library, the amount of pages outgrows those of ISO C++.
Then there is the list of breaking changes that have happened even across minor versions since Python 1.0.
C++ tends to be the second best language for everything, and this is no exception. Go beats it at Go’s own niche: it has great compilation times and it forces you down a sane asynchronous programming path.
C++ fails on both those criteria. However, once you fall off the happy path in Go, you’re probably completely screwed, where as with C++, you’re already using the second best language for whatever your new problem is.
For the last few years I've been doing Hack (Facebook's PHP fork) professionally and async-await as cooperative multitasking is pervasive. IMHO it's a really nice model. Generally speaking, I've come around to believing that if it ever comes down to you spawning your own thread, you're going to have a Bad Time.
Go's channels are another variant of this.
The central idea in both cases is that expressing dependencies this way is often sufficient and way easier to write than true multithreaded code.
C++20 coroutines don't seem to solve this problem as best as I can tell.
It actually seems like C++20 coroutines are closer to Python generators. Is this the case? Or is this a classic case of a camel is a horse designed by committee and the C++ standards committee tried to create primitives to handle these and possibly other use cases? I honestly don't know.
You may have looked at them at too low a level. Check out something like cppcoro to see what you can do. I don't use it myself, but I've stolen a few things, like task<>, which is a pretty core thing that the stdlib does not provide.
Goroutines are not cooperative multitasking, by the way, they're non-OS/"green" threads. Until you do something silly like run CPU-bound code that doesn't hit any yield points and you have to put them in yourself (at least the last time I used Go, it's been awhile).
1. Properly written code will perform well, whether async/await or Go style.
2. Making async easy makes one use it in more places. In additon having caller decide to run something sync or async also makes it way more useful. In Async/await model that can only work if all methods are declared async - very costly in complexity
I guess you meant the "go" statement? That is more of a coroutine spawn thing, and this would be a separate function in C++ too.