TIL
Cool to see that people are still working on it, but I think that the main barrier to practical use of these techniques still remains unsolved. The problem is that it's just so easy for the supercomplier to go into some crazy exponential blowup of function unfolding, making the compilation step take impractically long.
Even if you avoid a literal exponential blowup you can easily end up generating tons of specializations that bloat your code cache but don't reveal any useful optimization opportunities/are infrequently used in practice. Similar performance problems also plague related techniques like trace-based JITs, even though the trace JIT happens at runtime and thus has access to strictly more information about the frequency with which a trace might be used.
You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.
One interesting research direction might be to use deep reinforcement learning to try to guide the generalization/termination decisions, where the reward is based on A) whether the unfolding leads to a tie-back later on, and B) to what extent the unfolding allowed deforestation/beta reduction to take place.
This topic was brought up many times in the past. Most of the time it was suggested to use speculative approaches to detect blowups, but I haven't actually seen any attempts to predict them before supercompilation. For example, while I was writing Mazeppa, I've seen many times that blowups occur because some function call gives insufficient information that is subsequently pattern-matched by another function, and since it cannot be properly pattern-matched at compile-time, a lot of code duplication takes place.
I'm leaning towards a kind of "abstract interpretation pass" before supercompilation to approximate "good" and "bad" interactions between functions. After all, given that offline analyses exist even for harder problems such as detecting non-termination, why cannot we understand how supercompilation gives us desirable results?
The other idea I had when I was working on this stuff was to do JIT supercompliation. Clasically, supercompilers expand the whole graph at compile time, but it would be straightforward to delay the supercompilation process until the first time that the code is actually executed, which helps tame the problem of generating lots of specializations that may never actually be executed in practice.
(Choosing a generalization heuristic becomes more important in the JIT setting because you always have access to the full info about all parameters to the function you are specializing, and naively you would end up e.g. specializing sigmoid(x)=1/(1+exp(x)) for each value of x observed at runtime.)
> In Mazeppa, we have call-by-value functions and call-by-name (call-by-need) constructors, which 1) makes deforestation possible and 2) preserves the original semantics of code.
Are there any other notable differences between this approach and call-by-value constructors? (I guess one might have to carry around the original context in closures for a little while, but presumably even that disappears in residual programs?)
[Stepan Razin lost his black hat during his ride, but the supercompiler has already removed Ivan Mazepa's?]
1. The approach involves magic numbers to implement heuristics. They worked for the samples provided by the authors, but will they work for large programs?
2. I think that there's a smarter approach to detect blowups. Specifically, there can be "good" and "bad" interactions in the form `f(g(...), ...)`, where `f` and `g` are functions; an interaction is good if `g` produces information that is known at compile-time (to be subsequently deconstructed by `f`), and bad if it doesn't. If the interaction is bad, `f` and `g` should be transformed separately.
In general, I see manual annotations as a low-level mechanism that can be used for predictable supercompilation. We haven't yet implemented heuristics that would guarantee predictability, but this is on our priority list.
> Are there any other notable differences between this approach and call-by-value constructors?
Yes, lazy constructors must be implemented differently than eager ones. The standard approach is to enclose constructor arguments under an environment of values, just as we implement closure functions. I don't think that supercompilation can remove all laziness from the constructors.
[1] Jonsson, Peter & Nordlander, Johan. (2011). Taming code explosion in supercompilation. PERM'11 - Proceedings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. 33-42. 10.1145/1929501.1929507.