...but those CPUs are still speculative out-of-order super-scalars aren't they?
We're talking about removing those features, on which our entire computing ecosystem is built, and expecting the compiler to be able to pipeline every execution unit individually.
dex2oat is where the work could be done yes, but we just don't appear to know as a field how to fill in processor pipelines like that - we just don't have that knowledge to do it, and nobody seems to be able to figure it out despite trying several times.