> Slow compiles are an issue with C++ templates.
As far as I know
Rust has the same problem although to lesser extent. Monomorphization works well with judicious use. C++ STL is not written like that, they depend on 11111 layers of inlining to work well. Rust libraries aren't much better in this regard.
LTO removed some code bloat, but LTO itself takes more time. until thinLTO summary pass / equivalent pass in GCC WHOPR at least, middle end and early IR optimizations still have to happen, and Go wants to avoid that. I think that's a fine design choice. In Go's design, they have decided virtual calls aren't a cost they'd care anyway, pre 1.8 Go heavily used interfaces and that's not going to change.
> writing the code out customized for each given type should have identical performance to the template-generated version of the code
In theory yeah, but templates tend to generate more instantiations than strictly what you'd write by hand.
Also, obscure corner cases exist, but not big enough, thanks to those numerous man years spent on GCC and LLVM. https://travisdowns.github.io/blog/2020/01/20/zero.html