The Haskell embedding is very likely to head in that direction.
(See e.g. in this style : http://www.fftw.org/faq/section4.html#whyfast or this style : http://www.cse.unsw.edu.au/~chak/papers/polymer.pdf -- code generation + DSL + constraint solver for instruction level timings).
At least, that's what I'd do.