Ideally, none. We're leaning toward FPGAs and CGRAs to accelerate tight inner loops. This means that it will have a huge effect on compilers. They will have to compile from a control flow behavioral description like C to a data-flow description to map onto the array. This compilation process is honestly not solved. This is why you have verilog instead of just compiling C to circuitry. I've taken a crack at it (in the form of QDI circuit synthesis from CHP) and every sub-problem is either NP-hard or NP-complete.
Though all of this is assuming we solve the memory bottleneck... which... might come about with upcoming work on 3D integration and memristors? who knows.