> Yes, every connection has a (small) thread stack associated with it. But in a "truly asynchronous" network program, every connection still has memory associated with it; it's just that the memory doesn't take the form of a procedure stack.
That's also true with 1:1 threading. It's just that the context switching is handled by the kernel.
Semantically, there's no difference between what Go does and what NPTL does. The difference is in implementation: Go does a lot of the work itself in userspace, while NPTL does the work in the kernel. (I say NPTL not to be pedantic but because there were pthreads implementations in Linux that used Golang-like schedulers. They were abandoned in favor of NPTL because the extra complexity was judged to not be worth it for small if any performance gains.)
You're right of course that you need per-connection state in any model. But with a state machine you can be much more compact than a call stack. Modern compilers (I doubt this includes Go 6g/8g, but haven't checked) will do stack coloring to reduce stack usage, but the overhead is still significant because compilers essentially always choose runtime performance of straight-line code over stack compactness wherever there's a tradeoff. State machine compilers, like C#'s async/await compiler, make the opposite choice, and as a result they can use less memory. Moreover, with state machines you can go the extra mile and really pack your state into a tiny fixed-size allocation you can allocate with a segregated fit arena. That's pretty much unbeatable for performance.