There is more and more effort in the automatic development of high-performance linear algebra kernels.
But based on my experience, it would certainly be a big challenge to have a tool able to exploit the subtle differences in the assembly languages of different architectures, if the aim is to match or even exceed an expert-crafted assembly kernel.
Anyway, that's surely a very promising active research direction.