I think Bryan has been doing a followup to Copperhead, probably easy to just ask him :)
I don't know what you mean by predictable performance. Flattening is a direct transformation and seems simple to reason about on SIMD architectures, though the recent dynamic schedule (work stealing) approach for multicore/distributed has the usual caveats. (I tend to avoid it for HPC.) Given the 10+ year history of the researchers involved, it seems like a slow-but-steady project..