The nvfuser code doesn't even call it sm_100 vs. sm_120: NVIDIA's internal nomenclature seems to be 2CTA/1CTA, it's a bin. So there are less MMA tilings in the released ISA as of 13.1 / r85 44.
The mnemonic tcgen05.mma doesn't mean anything, it's lowered onto real SASS. FWIW the people I know doing their own drivers say the whole ISA is there, but it doesn't matter.
The family of mnemonics that hits the "Jensen Keynote" path is roughly here: https://docs.nvidia.com/cuda/parallel-thread-execution/#warp....
10x path is hot today on Thor, Spark, 5090, 6000, and data center.
Getting it to trigger reliably on real tilings?
Well that's the game just now. :)
Edit: https://customer-1qh1li9jygphkssl.cloudflarestream.com/1795a...