I work in DB kernels, everything gets hand tuned as there is economic reason for hand tuning it. The expectation in many of these systems is that there are no wasted cycles. You can codegen a decent kernel, but then someone will find a better approach - do you want the slower version of the product, or the faster one?
You can see this in action with matrix math libraries, folks have been hand tuning those for decades at this point.