Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.
It seems there is an analogy to be drawn to the way in which there can be no "waste heat" in a reversible thermodynamic process [1]. I thought this analogy might be a bit of a stretch at first, but looking into this a bit more it seems as though this is indeed exactly the idea with reversible computing, such that if a reversible computer could be implemented at the hardware level there would supposedly be significant energy efficiency gains on the table [2-4].
[1] https://en.wikipedia.org/wiki/Reversible_process_%28thermody...
[2] https://arxiv.org/abs/1702.08715
[3] https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/INC19-...
[4] https://spectrum.ieee.org/computing/hardware/the-future-of-c...
Cryptographic hash functions rely on locally non-reversible non-linear operations.[1]
So you don't need to copy the original input, but you will need to copy bits here and there throughout the hash function, which allow the input to be reconstructed.
But that's not surprising, even a humble AND gate is non-reversible. So even a humble AND gate program will need to copy its inputs to the output if it's to be reversible.
[1] (As well, cryptographic hash functions are generally not reversible anyway. As in, different inputs hash to the same output value, so there's no unambiguous reversal.)
Reversibility is important in quantum computing. Quantum circuits must transform input to output in a "unitary" manner, which is reversible. If you consider the input to output as a linear transformation matrix (with complex values), then the complex conjugate of the matrix gives the "inverse function".
This is probably useless info, but reversibility in the classical sense is also interesting due to the energy bounds of computation. The Landauer limit (kTln2) [1] gives a lower bound of energy that must be dissipated to destroy one bit of information in a computation. A reversible calculation does not destroy bits.
I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.
Even if you only care about previous applications of f. Keeping snapshots of every input isn't always feasible.
It would be good to see a ResNet training benchmark comparison with PyTorch as an example if this is really true.
https://github.com/GiggleLiu/NiLang.jl/blob/master/examples/...
Ed: after skimming the paper (that went mostly over my head) - this does indeed seem to be about "actually" running functions in reverse - given a function only defined "forward" in the nilang DSL. It appears the graph embed examples are missing in the master branch, unfortunately.
I wonder if this can be used more trivially to solve simple problems too - like calculating values/sums pertaining to compound interest/investment, given a naive function for calculating sums etc (its trivial to add up compounded interest and deposits, but a tiny bit more complicated to answer the question "at what time is my portfolio at X or more dollars).
The source code is available here: https://github.com/JuliaReverse/NiGraphEmbedding.jl
Various benchmarks (including those in the paper) show NiLang is much better than Zygote to differentiate scalar functions. And Zygote is much faster than TF and PyTorch.
2. Tensor level
Zygote, TF and PyTorch are much better than NiLang, because NiLang's matrix multiplication is not fully optimized, it is much slower than BLAS. (One can wrap BLAS into NiLang, but that does not measure NiLang's programming language level AD performance anymore)
Zygote calculates derivatives using source-to-source automatic differentiation.
This calculates function inverses (so to stretch the analogy a bit, it's kinda like "source-to-source automatic inversion")
https://gist.github.com/GiggleLiu/0c4608f70bda050f59992f5fc0...