> The LLVM architecture is actually already set up to do optimizations at runtime on LLVM "binaries".
Pragmatically, exactly how fast is that run-time optimization? Could you realistically JIT it, or should the more-optimal, chip-specific asm be cached between loads? Or is this so slow you'd only ever want to do it once?
Right, forgot to say that: you'd cache the results (like Rosetta translation on the PowerPC or the DEC VAX-to-Alpha binary recompilation) so you'd only take the translation hit once.