This is a "broadword matrix multiplication" as described in Knuth's The Art of Computer Programming Volume 4A (exercise 55 in section 7.1.3).
Here is a lecture where Knuth explains it:
https://youtu.be/o22BAuQj3ds?t=1h20s
It's efficient for 64-bit registers, but even larger registers could be used in the same way.