Nilang.jl – A Reversible Julia DSL (opens in new tab)

(github.com)

64 pointsFishysoup5y ago29 comments

29 comments

21 comments · 5 top-level

snicker75y ago· 10 in thread

Incredible. Given a (one-to-one) Julia function `f`, this package generates `~f`, the inverse of f. Clever use of automatic differentiation.

Paper: https://arxiv.org/abs/2003.04617

ssivark5y ago

Is this in essence the same as the relation between a differential equation and it’s adjoint equation, and how one could use the asking method to perform back propagation? (Eg: ref. Neural ODEs)

GiggleLiu5y ago

NiLang requires every instruction being reversible, e.g. `SWAP` and `y += f(args...)`. The back-propagation is implemented on a limited instruction set. It is different with Neural ODE because neural ODE does not back propagate the program step by step. Some time reversible integrator can utilize the reversibility (with NiLang) to save memory. e.g. the leap frog integrator for symplectic systems.

thisrod5y ago

That can't be entirely true. There must be some kind of continuous, convex or monotonic precondition on f. Something to stop me from implementing a cryptographic hash function, and asking for the solution to a 10000 bit challenge that is supposed to require a trillion times Earth's entire computing resources running for a million years.

Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.

cbkeller5y ago

I think it's a variant of the latter -- IIUC no information can be "lost" in reversible computing, so the output of a reversible hash function might be both the hash and some number of other outputs, all of which you would have to know if you wanted to reverse the hash function.

It seems there is an analogy to be drawn to the way in which there can be no "waste heat" in a reversible thermodynamic process [1]. I thought this analogy might be a bit of a stretch at first, but looking into this a bit more it seems as though this is indeed exactly the idea with reversible computing, such that if a reversible computer could be implemented at the hardware level there would supposedly be significant energy efficiency gains on the table [2-4].

[1] https://en.wikipedia.org/wiki/Reversible_process_%28thermody...

[2] https://arxiv.org/abs/1702.08715

[3] https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/INC19-...

[4] https://spectrum.ieee.org/computing/hardware/the-future-of-c...

jlokier5y ago

> Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.

Cryptographic hash functions rely on locally non-reversible non-linear operations.[1]

So you don't need to copy the original input, but you will need to copy bits here and there throughout the hash function, which allow the input to be reconstructed.

But that's not surprising, even a humble AND gate is non-reversible. So even a humble AND gate program will need to copy its inputs to the output if it's to be reversible.

[1] (As well, cryptographic hash functions are generally not reversible anyway. As in, different inputs hash to the same output value, so there's no unambiguous reversal.)

1 more reply

dlivingston5y ago

Forgive my ignorance, but what applications would this be used in where just maintaining copies of the initial parameters to f() wouldn’t work?

sriku5y ago

Very interesting. Not read the paper yet, but plan to.

Reversibility is important in quantum computing. Quantum circuits must transform input to output in a "unitary" manner, which is reversible. If you consider the input to output as a linear transformation matrix (with complex values), then the complex conjugate of the matrix gives the "inverse function".

This is probably useless info, but reversibility in the classical sense is also interesting due to the energy bounds of computation. The Landauer limit (kTln2) [1] gives a lower bound of energy that must be dissipated to destroy one bit of information in a computation. A reversible calculation does not destroy bits.

[1] https://en.wikipedia.org/wiki/Landauer%27s_principle

1 more reply

sinenomine5y ago

One can use this to express a large part of a deep ML model as a reversible function, thus drastically lowering memory requirements for backpropagation. The same technique, but applied by hand was recently used to speed up training for transformer language model: https://arxiv.org/abs/2001.04451

I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.

amitport5y ago

I think the space requirements are different.

Even if you only care about previous applications of f. Keeping snapshots of every input isn't always feasible.

2 more replies

GiggleLiu5y ago

Thanks for the link to the paper, also check the latest version here: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...

choeger5y ago· 3 in thread

Wait a second, does it yield an exact inverse or a numerical approximation?

e12e5y ago

Given the example of reversing over a function that calculates the first Fibonacci number greater than 100, it seems to be a bit more going on than "just" a numerical approximation:

https://github.com/GiggleLiu/NiLang.jl/blob/master/examples/...

Ed: after skimming the paper (that went mostly over my head) - this does indeed seem to be about "actually" running functions in reverse - given a function only defined "forward" in the nilang DSL. It appears the graph embed examples are missing in the master branch, unfortunately.

I wonder if this can be used more trivially to solve simple problems too - like calculating values/sums pertaining to compound interest/investment, given a naive function for calculating sums etc (its trivial to add up compounded interest and deposits, but a tiny bit more complicated to answer the question "at what time is my portfolio at X or more dollars).

GiggleLiu5y ago

Graph embedding example is not missing, it in the appendix of the paper's master branch: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...

The source code is available here: https://github.com/JuliaReverse/NiGraphEmbedding.jl

GiggleLiu5y ago

Very good question. Floating point `+=` and `-=` are not exactly reversible. This is the only approximation that we have made in NiLang. To compile NiLang to a reversible device (rigorously reversible), we need to overhaul current number systems, i.e. using fixed point numbers and logarithmic numbers instead. Fixed point numbers are exactly reversible under `+=` and `-=`, logarithmic numbers are exactly reversible under `*=` and `/=`. Here is an example of implementing Bessel function with two number systems: https://giggleliu.github.io/NiLang.jl/dev/examples/besselj/

vsskanth5y ago· 3 in thread

How does this compare to zygote ?

GiggleLiu5y ago

1. Scalar level

Various benchmarks (including those in the paper) show NiLang is much better than Zygote to differentiate scalar functions. And Zygote is much faster than TF and PyTorch.

2. Tensor level

Zygote, TF and PyTorch are much better than NiLang, because NiLang's matrix multiplication is not fully optimized, it is much slower than BLAS. (One can wrap BLAS into NiLang, but that does not measure NiLang's programming language level AD performance anymore)

cbkeller5y ago

As far as I understand it (which is not very far):

Zygote calculates derivatives using source-to-source automatic differentiation.

This calculates function inverses (so to stretch the analogy a bit, it's kinda like "source-to-source automatic inversion")

ChrisRackauckas5y ago

Not necessarily: you can use it to build fast inverse of scalar functions. Here's an example on a vjp of an ODE f:

https://gist.github.com/GiggleLiu/0c4608f70bda050f59992f5fc0...

1 more reply

xiphias25y ago

,,The performance of reversible programming automatic differentiation is much better than most traditional frameworks.''

It would be good to see a ResNet training benchmark comparison with PyTorch as an example if this is really true.

GiggleLiu5y ago

This is the latest NiLang tutorial notebook: https://github.com/JuliaReverse/NiLangTutorial

j / k navigate · click thread line to collapse

29 comments

21 comments · 5 top-level

snicker75y ago· 10 in thread

Incredible. Given a (one-to-one) Julia function `f`, this package generates `~f`, the inverse of f. Clever use of automatic differentiation.

Paper: https://arxiv.org/abs/2003.04617

ssivark5y ago

Is this in essence the same as the relation between a differential equation and it’s adjoint equation, and how one could use the asking method to perform back propagation? (Eg: ref. Neural ODEs)

GiggleLiu5y ago

thisrod5y ago

Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.

cbkeller5y ago

[1] https://en.wikipedia.org/wiki/Reversible_process_%28thermody...

[2] https://arxiv.org/abs/1702.08715

[3] https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/INC19-...

[4] https://spectrum.ieee.org/computing/hardware/the-future-of-c...

jlokier5y ago

> Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.

Cryptographic hash functions rely on locally non-reversible non-linear operations.[1]

So you don't need to copy the original input, but you will need to copy bits here and there throughout the hash function, which allow the input to be reconstructed.

But that's not surprising, even a humble AND gate is non-reversible. So even a humble AND gate program will need to copy its inputs to the output if it's to be reversible.

[1] (As well, cryptographic hash functions are generally not reversible anyway. As in, different inputs hash to the same output value, so there's no unambiguous reversal.)

1 more reply

dlivingston5y ago

Forgive my ignorance, but what applications would this be used in where just maintaining copies of the initial parameters to f() wouldn’t work?

sriku5y ago

Very interesting. Not read the paper yet, but plan to.

[1] https://en.wikipedia.org/wiki/Landauer%27s_principle

1 more reply

sinenomine5y ago

I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.

amitport5y ago

I think the space requirements are different.

Even if you only care about previous applications of f. Keeping snapshots of every input isn't always feasible.

2 more replies

GiggleLiu5y ago

Thanks for the link to the paper, also check the latest version here: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...

choeger5y ago· 3 in thread

Wait a second, does it yield an exact inverse or a numerical approximation?

e12e5y ago

Given the example of reversing over a function that calculates the first Fibonacci number greater than 100, it seems to be a bit more going on than "just" a numerical approximation:

https://github.com/GiggleLiu/NiLang.jl/blob/master/examples/...

GiggleLiu5y ago

Graph embedding example is not missing, it in the appendix of the paper's master branch: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...

The source code is available here: https://github.com/JuliaReverse/NiGraphEmbedding.jl

GiggleLiu5y ago

vsskanth5y ago· 3 in thread

How does this compare to zygote ?

GiggleLiu5y ago

1. Scalar level

Various benchmarks (including those in the paper) show NiLang is much better than Zygote to differentiate scalar functions. And Zygote is much faster than TF and PyTorch.

2. Tensor level

cbkeller5y ago

As far as I understand it (which is not very far):

Zygote calculates derivatives using source-to-source automatic differentiation.

This calculates function inverses (so to stretch the analogy a bit, it's kinda like "source-to-source automatic inversion")

ChrisRackauckas5y ago

Not necessarily: you can use it to build fast inverse of scalar functions. Here's an example on a vjp of an ODE f:

https://gist.github.com/GiggleLiu/0c4608f70bda050f59992f5fc0...

1 more reply

xiphias25y ago

,,The performance of reversible programming automatic differentiation is much better than most traditional frameworks.''

It would be good to see a ResNet training benchmark comparison with PyTorch as an example if this is really true.

GiggleLiu5y ago

This is the latest NiLang tutorial notebook: https://github.com/JuliaReverse/NiLangTutorial

j / k navigate · click thread line to collapse