Mazeppa: A modern supercompiler for call-by-value functional languages (opens in new tab)

(github.com)

189 pointsJhsto1y ago18 comments

18 comments

16 comments · 6 top-level

keiferski1y ago· 4 in thread

The likely origin of the name, if you’re curious:

https://en.wikipedia.org/wiki/Mazeppa_(poem)

Hirrolot1y ago

The name was inspired by Transcendental Étude No. 4 [1], as correctly suggested by the other comment.

[1] https://www.youtube.com/watch?v=MYSFTtYc4P4

keiferski1y ago

It looks like those compositions were ultimately inspired by the poem. So we’re both right ;)

https://en.wikipedia.org/wiki/Cultural_legacy_of_Mazeppa

alekseiprokopev1y ago

The first thing I though was this guy lol https://en.wikipedia.org/wiki/Ivan_Mazepa I support Ukraine btw.

rottc0dd1y ago

Oh! I thought it was this:

https://youtu.be/CXGeOHdiHrE

TIL

batterseapower1y ago· 3 in thread

My PhD was on supercompilation for Haskell (thanks for the cite in the repo :-))

Cool to see that people are still working on it, but I think that the main barrier to practical use of these techniques still remains unsolved. The problem is that it's just so easy for the supercomplier to go into some crazy exponential blowup of function unfolding, making the compilation step take impractically long.

Even if you avoid a literal exponential blowup you can easily end up generating tons of specializations that bloat your code cache but don't reveal any useful optimization opportunities/are infrequently used in practice. Similar performance problems also plague related techniques like trace-based JITs, even though the trace JIT happens at runtime and thus has access to strictly more information about the frequency with which a trace might be used.

You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.

One interesting research direction might be to use deep reinforcement learning to try to guide the generalization/termination decisions, where the reward is based on A) whether the unfolding leads to a tie-back later on, and B) to what extent the unfolding allowed deforestation/beta reduction to take place.

Hirrolot1y ago

Thanks for commenting! Yes, automatically predicting whether it's beneficial to specialize a function is not implemented. It's under the section "Further research" to find suitable heuristics to automatically emit annotations like those `@extract` that we already have.

This topic was brought up many times in the past. Most of the time it was suggested to use speculative approaches to detect blowups, but I haven't actually seen any attempts to predict them before supercompilation. For example, while I was writing Mazeppa, I've seen many times that blowups occur because some function call gives insufficient information that is subsequently pattern-matched by another function, and since it cannot be properly pattern-matched at compile-time, a lot of code duplication takes place.

I'm leaning towards a kind of "abstract interpretation pass" before supercompilation to approximate "good" and "bad" interactions between functions. After all, given that offline analyses exist even for harder problems such as detecting non-termination, why cannot we understand how supercompilation gives us desirable results?

c0balt1y ago

Maybe this is naive but couldn't you use the same approach with profiles collected at runtime as PGO like LLVM/GCC? In my experience it can help quite a lot with guiding inlining both by hand and via a compiler.

batterseapower1y ago

I don't think that's naive - it could definitely help ameliorate the issues. To my knowledge noone has tried this approach.

The other idea I had when I was working on this stuff was to do JIT supercompliation. Clasically, supercompilers expand the whole graph at compile time, but it would be straightforward to delay the supercompilation process until the first time that the code is actually executed, which helps tame the problem of generating lots of specializations that may never actually be executed in practice.

(Choosing a generalization heuristic becomes more important in the JIT setting because you always have access to the full info about all parameters to the function you are specializing, and naively you would end up e.g. specializing sigmoid(x)=1/(1+exp(x)) for each value of x observed at runtime.)

1 more reply

parhamn1y ago· 1 in thread

That was such a good Readme. Wonderful explanations for those unfamiliar with the space, full examples, a bit of history/context. Thanks!

Hirrolot1y ago

Thanks, I'm glad the explanations were helpful.

0823498723498721y ago· 1 in thread

Is the manual annotation is because deciding whether or not a graph will blow up upon supercompilation by trying, but checking for blowup during the process, doesn't work?

> In Mazeppa, we have call-by-value functions and call-by-name (call-by-need) constructors, which 1) makes deforestation possible and 2) preserves the original semantics of code.

Are there any other notable differences between this approach and call-by-value constructors? (I guess one might have to carry around the original context in closures for a little while, but presumably even that disappears in residual programs?)

[Stepan Razin lost his black hat during his ride, but the supercompiler has already removed Ivan Mazepa's?]

Hirrolot1y ago

Checking for blowup sort of works; for example, this paper [1] suggests exactly this approach. However, I'm not leaning to it due to the following reasons:

1. The approach involves magic numbers to implement heuristics. They worked for the samples provided by the authors, but will they work for large programs?

2. I think that there's a smarter approach to detect blowups. Specifically, there can be "good" and "bad" interactions in the form `f(g(...), ...)`, where `f` and `g` are functions; an interaction is good if `g` produces information that is known at compile-time (to be subsequently deconstructed by `f`), and bad if it doesn't. If the interaction is bad, `f` and `g` should be transformed separately.

In general, I see manual annotations as a low-level mechanism that can be used for predictable supercompilation. We haven't yet implemented heuristics that would guarantee predictability, but this is on our priority list.

> Are there any other notable differences between this approach and call-by-value constructors?

Yes, lazy constructors must be implemented differently than eager ones. The standard approach is to enclose constructor arguments under an environment of values, just as we implement closure functions. I don't think that supercompilation can remove all laziness from the constructors.

[1] Jonsson, Peter & Nordlander, Johan. (2011). Taming code explosion in supercompilation. PERM'11 - Proceedings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. 33-42. 10.1145/1929501.1929507.

voldacar1y ago· 1 in thread

Cool. How necessary is the type system? Could you make something similar for lispy languages?

batterseapower1y ago

Supercompilation doesn't depend on having a typed language - you can more or less derive a supercompiler mechanically from any operational semantics, typed or untyped.

faizshah1y ago

I wonder how the debugging experience is, it must be pretty difficult to generate correct source maps.

j / k navigate · click thread line to collapse

18 comments

16 comments · 6 top-level

keiferski1y ago· 4 in thread

The likely origin of the name, if you’re curious:

https://en.wikipedia.org/wiki/Mazeppa_(poem)

Hirrolot1y ago

The name was inspired by Transcendental Étude No. 4 [1], as correctly suggested by the other comment.

[1] https://www.youtube.com/watch?v=MYSFTtYc4P4

keiferski1y ago

It looks like those compositions were ultimately inspired by the poem. So we’re both right ;)

https://en.wikipedia.org/wiki/Cultural_legacy_of_Mazeppa

alekseiprokopev1y ago

The first thing I though was this guy lol https://en.wikipedia.org/wiki/Ivan_Mazepa I support Ukraine btw.

rottc0dd1y ago

Oh! I thought it was this:

https://youtu.be/CXGeOHdiHrE

TIL

batterseapower1y ago· 3 in thread

My PhD was on supercompilation for Haskell (thanks for the cite in the repo :-))

You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.

Hirrolot1y ago

c0balt1y ago

batterseapower1y ago

I don't think that's naive - it could definitely help ameliorate the issues. To my knowledge noone has tried this approach.

1 more reply

parhamn1y ago· 1 in thread

That was such a good Readme. Wonderful explanations for those unfamiliar with the space, full examples, a bit of history/context. Thanks!

Hirrolot1y ago

Thanks, I'm glad the explanations were helpful.

0823498723498721y ago· 1 in thread

Is the manual annotation is because deciding whether or not a graph will blow up upon supercompilation by trying, but checking for blowup during the process, doesn't work?

> In Mazeppa, we have call-by-value functions and call-by-name (call-by-need) constructors, which 1) makes deforestation possible and 2) preserves the original semantics of code.

[Stepan Razin lost his black hat during his ride, but the supercompiler has already removed Ivan Mazepa's?]

Hirrolot1y ago

Checking for blowup sort of works; for example, this paper [1] suggests exactly this approach. However, I'm not leaning to it due to the following reasons:

1. The approach involves magic numbers to implement heuristics. They worked for the samples provided by the authors, but will they work for large programs?

> Are there any other notable differences between this approach and call-by-value constructors?

voldacar1y ago· 1 in thread

Cool. How necessary is the type system? Could you make something similar for lispy languages?

batterseapower1y ago

Supercompilation doesn't depend on having a typed language - you can more or less derive a supercompiler mechanically from any operational semantics, typed or untyped.

faizshah1y ago

I wonder how the debugging experience is, it must be pretty difficult to generate correct source maps.

j / k navigate · click thread line to collapse