The point is they shouldn't be
slower than a manually-copied implementation for that concrete type. They also should be
faster than vtable dynamic dispatch in the vast majority of cases. (I also fail to see a compelling reason that they couldn't have been implemented by passing the fat pointer directly, making the codegen the same as passing an interface, instead of having that business with the extra layer of indirection.)
If there are specialization opportunities when hand-implementing the function for a given concrete type, I would indeed expect that to be faster than a monomorphized generic function.