> As I understood it, and as far as I can remember, the Mill AOT compiler has an easier job than that. The generic image already contains the parallelized instructions, the AOT just has to split those who are too wide for the given CPU.
In my opinion, this just moves the problem on a meta level. For the EPIC instructions of Itanium, one could encode multiple (parallel) instructions into one VLIW instruction. It was a huge problem to parallelize existing, say, C or C++ code so that this capability could be used. The fact that such a "smart compiler" turned out so hard to write was one of the things that broke Itanium's neck.
I openly have no idea by what magic a "sufficiently smart compiler" that can create such a "generic image [that] already contains the parallelized instructions" suddenly appears. How is it possible that compilers can suddenly parallelize the program, which turned out to be nigh impossible for the Itanium?!