That said, we're at a "go big or go home point." We either need to ramp up from the 1.5 grad students + 2 undergrads / year significantly or wrap things up. Getting to a point where we can be used in general-purpose projects requires a lot of work, none of which results in papers.
If I had to guess, the most probable impact is what you would expect from PL research - integration of lessons learned in other systems down the road:
- CilkPlus has a nice first pass at a work stealing algorithm, but we showed how to do it without static tuning by the programmer with lower overheads, to boot.
- Vectorization to take advantage of wider vector hardware requires transformation of data structures (e.g., array of structs to struct of arrays). We showed how to do that automatically and reason about changes in program performance.
- We have boatloads of papers - at both workshops and conferences - on what has to be done to the compiler and runtime to run efficiently on NUMA multicore systems. Right now, most fp systems do not run into these problems because they cannot scale past their own parallel overheads. Once past that bottleneck, the next one will be memory traffic, at least in our experience.
I don't say any of that to fling mud at other systems; we started our project after them all, at the start of the multicore era (2006), with the goal of investigating these specific issues without carrying along the baggage of a pre-existing sequential implementation.
There's also still a lot more to learn. I personally don't buy that that deterministic and total chaos are the two only points in the design space of program reasoning. There have to be some interesting midpoints (e.g., histories that are linearizable) that are worth investigating.