Show HN: Optimizing Higher Order Functions with Hypothetical Inverses (opens in new tab)

(medium.com)

55 pointspschanely10y ago35 comments

35 comments

24 comments · 9 top-level

kazinator10y ago· 5 in thread

How do we make compilers smart enough to know that “sum(reverse(x))” can be rewritten to “sum(x)”.

That examle is pretty easy; you apply a pattern match to the tree which looks for the pattern (sum (reverse ?var)) and rewrites it to (sum ?var).

In Common Lisp, we would use define-compiler-macro on the sum function which looks for the argument form being (reverse whatever), in which case the macro returns the form (sum whatever).

Not so fast, though; order of summing matters in floating-point! And over integers too, in the presence of overflows (in junk languages without bignums). That is to say, if we add together three or more integers, the arithmetic result can be in range of the type, but some orders might overflow, whereas others do not.

rntz10y ago

> And over integers too, in the presence of overflows (in junk languages without bignums)

I don't think this is true. Modular addition is commutative and associative. As long as the behavior of overflow is modular (as it is in 2's complement), order shouldn't matter. And if the behavior of overflow is to trap, then it should trap regardless of the order you add the numbers in. I guess if you care about the stack trace you get being the same even under optimizations, then you could argue this isn't a safe optimization. And if the behavior of overflow is undefined, it's undefined whether you reverse the list or not.

Maybe you're thinking of Javascript, which not only doesn't have bignums, but uses floating point even for integers?

AnimalMuppet10y ago

Seconded. I can't see any encoding system that has fixed-sized integers (and therefore can overflow) where the result of addition is order-dependent. I mean, I'm sure you could design one. But if there's an example of an actually-used encoding that has this property, I'd like to hear it. (And as you said, using floats doesn't count...)

1 more reply

kazinator10y ago

> And if the behavior of overflow is to trap, then it should trap regardless of the order you add the numbers in.

How come? Suppose A + B doesn't trap, and if T0 = A + B, then T0 + C doesn't trap. Why should (A + B) + C trap?

2 more replies

pschanelyOP10y ago

Great observations. For the purposes of this exploration, I am entirely assuming arbitrary precision integers (and the math geek in me is really enjoying your use of the adjective "junk", above).

Bigger picture, I want to rewrite "sum(reverse(x))" without explicitly saying so. How do we optimize that expression using the definitions of sum and reverse? Note that in my discussion of this in my second example, I am inconsistent here: I expand the definition of reverse() but treat sum() as a primitive. That said, I am pretty sure that the example still works if you substitute the appropriate definition of sum as "reduce(add,x)"; the derivation is just longer.

kazinator10y ago

> I want to rewrite "sum(reverse(x))" without explicitly saying so.

You can endow functions with attributes (standard functions in the language library, and possibly user-defined ones). The reverse function can carry the attribute that it only changes the order of the elements of its argument. This is encoded as some boolean predicate: (order-identity-p x) is true if x is reverse. And if x is other things, like the identity function, the sort function and whatnot.

Then our optimzer looks for the pattern/rewrite (sum (f? arg?)) -> (sum arg?), where there is a semantic constraint: the pattern matches subject to the predicate (order-identity-p f?). This is effectively part of the match; if the predicate fails, there is no match.

Now the sum function could also be "patterned out". There could be a predicate which asserts that certain arguments of a function are sequences whose order doesn't matter to the outcome. This predicate could be list valued. If an argument has no such arguments, it returns nil/false, otherwise a list of the argument positions or whatever.

So then we're looking for the pattern (f? (g? arg?)) -> (f? arg) which is very general, and can be met in lots of ways. For instance, suppose f? and g? are the same function, and f? is idempotent. One of the rules that applies is that g? satisfies (order-identity-p g?), and (order-irrelevant-args f?) returns (1), where we note that (g? ...) is argument 1 of f?. For any argument argument expression which is an orer-identity-p function call, if that argument's position is in the list returned by order-irrelevant-args on the main constituent function, we can remove that order-altering function from the argument.

We code all that up into the tree walking engine, and then just suitably declare attributes on functions.

js810y ago· 3 in thread

This is excellent, I was thinking about similar optimization using inverses:

If one calculates some reduction over a data structure, and this reduction is group operator (so it has inverse, like addition), and this data structure is then updated, it may be easier to recalculate the reduction by updating the result through inverse than to recalculate the whole reduction again.

For example, we are calculating sum over a list, we take element out, then it's easier to just subtract the element from the sum rather than calculating the sum again.

Anyway, all in all, I think knowledge of function inverses (and perhaps also isomorphisms between types) by compiler will become very handy in the future, and will enable huge number of optimizations.

pschanelyOP10y ago

Thank you. And, heck yes, more people need to be talking about how to deal with computations that are largely repeated. That problem comes up all over the place.

Couchdb had an interesting approach for this that did not require inverses: they persist intermediate reduce results, so that you can re-use nodes in the tree that haven't changed. So clever. http://horicky.blogspot.com/2008/10/couchdb-implementation.h...

btown10y ago

Database indices (SQL and no) are similar in that regard - they are persisted so that a single insertion or update does not invalidate the entire index. Cache invalidation is a hard general problem, but easier when a closed system (like a database) is the only thing using its own cache, and the interface is much higher level.

joefkelley10y ago

I would love to see an analytics database system built on this idea. Let me define my "incoming" tables, and then define how other aggregated or transformed materialized views should be computed, and automatically update the downstream datasets based on changes in the input tables.

Let me define only "what" to compute, instead of spending all my time worrying about "how" and "when". Isn't that the point of declarative languages like SQL?

brudgers10y ago· 3 in thread

Interesting article. Not sure if it meets the of the "Show HN" guidelines -- there is nothing for users to try out or play with. If it is still withing the window, maybe change the title. If not, maybe email the moderators.

pschanelyOP10y ago

Thanks! I have a link to code written in Maude at the very bottom; The code needs so much explanation though. Perhaps I'll highlight the code link at the top of the article?

dang10y ago

Yes, that is just the trick. Great post.

brudgers10y ago

Or post a clickable link here in the thread? Explaining the code might be an opportunity to create a rough draft of something more polished. A bit of background on Maude might be nice too. Save readers from Googling and maybe help people understand why it was used. All things that tend to be "intellectually interesting."

Anyway, I was a bit philosophically torn since the content tends toward mathematical/algorithmic and made me wonder what does it mean to play around with mathematics/algorithms?

1 more reply

ericbb10y ago· 2 in thread

This approach is strongly reminiscent of abstract algebra; for example, the Map-Reduce Transformation example seems to be(?) justified exactly when f is an isomorphism over r.

I've been thinking along somewhat similar lines but I've been looking at it from the perspective of introduction and elimination rules, as found in type theory. If your compiler can find a transformation that brings an intro form and an elim form together, then you can apply the computation rule to remove both from the program. My motivation is to remove allocations that aren't really essential. (I do have a functional language compiler that I work on but it doesn't implement any optimizations yet).

Another interesting idea that seems related is from a Philip Wadler talk[1] I saw which discusses how normalization can be used to achieve something like your "hypothetical inverses", which should always be eliminated from the final result.

> These transformations might only be useful in obvious optimization scenarios (cases that no programmer would intentionally create).

I would be surprised if that turned out to be the case.

[1] "Everything Old is New Again: Quoted Domain Specific Languages" https://www.youtube.com/watch?v=DlBwJ4rvz5c

pschanelyOP10y ago

Yes! I think perhaps that f needs only to be a homomorphism instead of an isomorphism, but I don't trust myself enough to make such a claim definitively :)

Always curious to follow people doing compiler work; loop me in, yeah?

And, just watched that Wadler talk; it's good stuff. Of course, it sounds like his scenario is like: "there is a way to do the transformation, you just have to find it," and in my case, it's more like: "there might or might not be an algebra in which f acts as a homomorphism, and, even if there is, you may not have the transformations you need to determine it!"

ericbb10y ago

That's funny. I was thinking isomorphism because you postulate an inverse, which homomorphisms don't always have. But the identity doesn't depend on f being invertible. You could just have a rule:

If f is a homomorphism with respect to r, then reduce(r, map(f, l)) = f(reduce(r, l)).

The rule can be proven without reference to an inverse for f so isn't it enough to just add such a rewrite rule to your optimizer? (Now I'm not sure what the inversion trick accomplishes?).

> Always curious to follow people doing compiler work; loop me in, yeah?

norstrulde.org and twitter.com/ericbb

> "there might or might not be an algebra in which f acts as a homomorphism, and, even if there is, you may not have the transformations you need to determine it!"

To my way of interpreting it, the algebra is determined by r, which is given. You might not know whether f is a homomorphism with respect to r and it'd be tricky to figure that out by inspecting the definitions for f and r but if you have a table built in advance that characterizes functions in relation to each other (incr is a homomorphism with respect to max, etc.), then you can drive your rewrite engine using that table... So I wouldn't say that you want to use transformations to determine if f is a homomorphism but rather that you want to use the fact that f is a homomorphism to decide which transformations are applicable. Does that make sense?

Edit: I guess, more generally, you could have a relation with the form homomorphism(f:a->b, q:a->a->a, r:b->b->b) which justifies the transformation reduce(r, map(f, l)) -> f(reduce(q, l)).

_lm_10y ago· 1 in thread

For the Map-Reduce transformation, you require f(r(a,b)) = r(f(a),f(b)). This is equivalent to requiring r to be a homomorphism, I think.

You could probably generalize that condition to situations where f can be lifted over the output of r so you don't have to require that r's output type is the same as its input.

pschanelyOP10y ago

Unfortunately, I took barely enough grad school classes to be dangerous! It's worth clarifying that the lifting of f through r can change r (to, say r'), in which case, yes, f is a homomorphism from an algebra with r' to an algebra with r. (hopefully I got that right?)

I am curious about the idea that the inputs and outputs of r may not need to be of the same type, and want to investigate further (help is appreciated!). Since r is given to reduce(), at least one of its inputs must match its output, depending on how you define reduce(). Haskell's fold(l/r) permit the non-accumulator side to differ in type from the output, but the output must match the accumulator side. I think some similar transformation may apply to folds, but haven't thought about it enough.

panic10y ago· 1 in thread

How does this compare to stream fusion? http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7...

pschanelyOP10y ago

Ah, I knew that HN would tell me what I should be reading! I've only been able to spend a few minutes reading this one, but it looks to me like the first focus there is on eliminating intermediate containing structures, with some interesting opportunities for optimization afterwards. Their stream definition of fold(l/r) is still recursive, so I don't think it would be able to pull, for example, the incr() outside the reduce in my first example of "reduce(max, map(incr, l))" I think the approaches might be complimentary, actually; you might apply the higher level transformations I describe first and use a stream approach to implement the resulting chain of operations. Going to need to do a deeper investigation later today, but, anything else I should be looking at?

pschanelyOP10y ago

Referenced in the paper panic suggested below ( https://news.ycombinator.com/item?id=11595273 ), and suggested to me directly by @mattcmd on twitter ( https://twitter.com/mattmcd/status/726342652845809664 ), there is something called the "fusion property of fold," which would explain transformations I'm discussing here applied to folds (sided reduce). From the paper ( http://www.cs.nott.ac.uk/~pszgmh/fold.pdf ), (and some minimal editorial substitution from me) we see that:

h(fold(g, w, l)) = fold(f, h(w), l)

h(g(x, y)) = f(x, h(y))

This is exciting because (1) it suggests the equivalent transformation for sided reduces (folds), and (2) fold is a more general operator than the others. My somewhat silly use of "hypothetical inverses" effectively just creates a goalpost to (1) constructively find the the function f, given g (or vice-versa) and (2) prove the necessary condition.

Very exciting.

sivvy10y ago

The premise of the optimizations in this article don't always hold, unfortunately. If x is floating point, sum(reverse(x)) is unlikely to be equal to sum(x) for long enough x, and sorting x on beforehand will make this effect even more apparent.

quickie6910y ago

Beat me to it wrt floating point! It's not a field.

j / k navigate · click thread line to collapse

35 comments

24 comments · 9 top-level

kazinator10y ago· 5 in thread

How do we make compilers smart enough to know that “sum(reverse(x))” can be rewritten to “sum(x)”.

That examle is pretty easy; you apply a pattern match to the tree which looks for the pattern (sum (reverse ?var)) and rewrites it to (sum ?var).

In Common Lisp, we would use define-compiler-macro on the sum function which looks for the argument form being (reverse whatever), in which case the macro returns the form (sum whatever).

rntz10y ago

> And over integers too, in the presence of overflows (in junk languages without bignums)

Maybe you're thinking of Javascript, which not only doesn't have bignums, but uses floating point even for integers?

AnimalMuppet10y ago

1 more reply

kazinator10y ago

> And if the behavior of overflow is to trap, then it should trap regardless of the order you add the numbers in.

How come? Suppose A + B doesn't trap, and if T0 = A + B, then T0 + C doesn't trap. Why should (A + B) + C trap?

2 more replies

pschanelyOP10y ago

Great observations. For the purposes of this exploration, I am entirely assuming arbitrary precision integers (and the math geek in me is really enjoying your use of the adjective "junk", above).

kazinator10y ago

> I want to rewrite "sum(reverse(x))" without explicitly saying so.

We code all that up into the tree walking engine, and then just suitably declare attributes on functions.

js810y ago· 3 in thread

This is excellent, I was thinking about similar optimization using inverses:

For example, we are calculating sum over a list, we take element out, then it's easier to just subtract the element from the sum rather than calculating the sum again.

pschanelyOP10y ago

Thank you. And, heck yes, more people need to be talking about how to deal with computations that are largely repeated. That problem comes up all over the place.

btown10y ago

joefkelley10y ago

Let me define only "what" to compute, instead of spending all my time worrying about "how" and "when". Isn't that the point of declarative languages like SQL?

brudgers10y ago· 3 in thread

pschanelyOP10y ago

Thanks! I have a link to code written in Maude at the very bottom; The code needs so much explanation though. Perhaps I'll highlight the code link at the top of the article?

dang10y ago

Yes, that is just the trick. Great post.

brudgers10y ago

Anyway, I was a bit philosophically torn since the content tends toward mathematical/algorithmic and made me wonder what does it mean to play around with mathematics/algorithms?

1 more reply

ericbb10y ago· 2 in thread

This approach is strongly reminiscent of abstract algebra; for example, the Map-Reduce Transformation example seems to be(?) justified exactly when f is an isomorphism over r.

> These transformations might only be useful in obvious optimization scenarios (cases that no programmer would intentionally create).

I would be surprised if that turned out to be the case.

[1] "Everything Old is New Again: Quoted Domain Specific Languages" https://www.youtube.com/watch?v=DlBwJ4rvz5c

pschanelyOP10y ago

Yes! I think perhaps that f needs only to be a homomorphism instead of an isomorphism, but I don't trust myself enough to make such a claim definitively :)

Always curious to follow people doing compiler work; loop me in, yeah?

ericbb10y ago

That's funny. I was thinking isomorphism because you postulate an inverse, which homomorphisms don't always have. But the identity doesn't depend on f being invertible. You could just have a rule:

If f is a homomorphism with respect to r, then reduce(r, map(f, l)) = f(reduce(r, l)).

The rule can be proven without reference to an inverse for f so isn't it enough to just add such a rewrite rule to your optimizer? (Now I'm not sure what the inversion trick accomplishes?).

> Always curious to follow people doing compiler work; loop me in, yeah?

norstrulde.org and twitter.com/ericbb

> "there might or might not be an algebra in which f acts as a homomorphism, and, even if there is, you may not have the transformations you need to determine it!"

Edit: I guess, more generally, you could have a relation with the form homomorphism(f:a->b, q:a->a->a, r:b->b->b) which justifies the transformation reduce(r, map(f, l)) -> f(reduce(q, l)).

_lm_10y ago· 1 in thread

For the Map-Reduce transformation, you require f(r(a,b)) = r(f(a),f(b)). This is equivalent to requiring r to be a homomorphism, I think.

You could probably generalize that condition to situations where f can be lifted over the output of r so you don't have to require that r's output type is the same as its input.

pschanelyOP10y ago

panic10y ago· 1 in thread

How does this compare to stream fusion? http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7...

pschanelyOP10y ago

h(fold(g, w, l)) = fold(f, h(w), l)

h(g(x, y)) = f(x, h(y))

Very exciting.

sivvy10y ago

quickie6910y ago

Beat me to it wrt floating point! It's not a field.

j / k navigate · click thread line to collapse