Goroutines (and stackful coroutines in general) require a runtime. Go has to sprinkle suspension points everywhere in the machine code (to prevent starvation) and when it needs to grow a goroutine stack, it also needs to know how to adjust every single relevant pointer in the entire program's memory to point to the new stack location.
This is not something that a systems programming language would want to do. I agree with your point that structured concurrency is good and preferable for high-level languages, but it's not an absolute better choice when you start thinking about the details, like you have to do in a systems programming language.