A stack of zeros and ones can be encoded in a single number by keeping with bit-shifting and incrementing.
Pushing a 0 onto the stack is equivalent to doubling the number.
Pushing a 1 is equivalent to doubling and adding 1.
Popping is equivalent to dividing by 2, where the remainder is the number.
I use something not too far off for my daily a programming based on a similar idea:Rejoice is a concatenative programming language in which data is encoded as multisets that compose by multiplication. Think Fractran, without the rule-searching, or Forth without a stack.
https://paste.sr.ht/~rabbits/cd2369cc7c72bfad0fcd83e27682095...
write x#y for 1/(x-y).
x#0 = 1/(x-0) = 1/x, so you get reciprocals. Then (x#y)#0 = 1/((1/(x-y)) - 0) = x-y, so subtraction.
it's common problem to show in any (insert various algebraic structure here ) inverse and subtraction gives all 4 elementary ops.
I haven't checked this carefully, but this note seems to give a short proof (modulo knowing some other items...) https://dmg.tuwien.ac.at/goldstern/www/papers/notes/singlebi...
It cites a paper from 1935: https://www.pnas.org/doi/10.1073/pnas.21.5.252
Here is a bit more: https://mathoverflow.net/questions/57465/can-we-unify-additi...
Moreover, EML is complete in a way that your suggested function isn't: If you take a finite combination of basis functions, can it build periodic functions? Hardy proved over a century ago that real (+,-,/,*,exp,ln) can't do this (and answering the paper's unresolved question about similar real-valued functions in the negative). EML being able to build periodic functions is a lot less surprising for obvious reasons, but still pretty neat.
To represent something like sin(x) with f(x,y) requires infinite steps. Conversely, with eml you get an exact result in around 4 using identities and such.
One could argue that we do Taylor Series approximations on the hardware to represent trigonometric functions, but that highlights the key aspect of the eml approach. You can write a paper with those four steps that describes an exact model, here sin(x). And people can take that paper and optimize the result. This paper is about an auditable grammar that you can compute with.
You can reduce all Boolean logic to NAND, but that doesn't actually mean that semiconductor fabs translate their HDL to NAND gates, because it is possible to build complex gates that directly implement higher level operations in a single gate.
Your "cost of computation" objection can be easily resolved by adding more operators, which makes it boring from a research perspective.
Meanwhile the loss of expressivity can only be compensated by encoding algorithms directly into the expression tree. Your objection that an infinite series is a bad thing rings hollow, since you now introduce the concept of an infinitely sized expression tree. That sounds much more impractical than implementing an algorithm for the exponential and logarithm functions.
> This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions.
I am not claiming this is better than 1/(x-y) in any way (I have no idea, maybe it isn't if you look closely enough), but you are simply arguing against the wrong thing. Author didn't claim eml to be computationally efficient (it even feels weird to say that, since computational efficiency is not a trait of a mathematical function, but of a computer architecture implementing some program) or anything else, only that (eml, 1) are enough to produce every number and function that (admittedly, somewhat vaguely defined) a scientific calculator can produce.
However, I want to point out that it's weird 1/(x-y) didn't appear on that graph in Figure 1, since if it's as powerful as eml, it should have all the same connections as eml, and it's a pity Odrzywołek's paper misses it.
The OP only needs 1 number: 1.
Could that be used to derive trigonometric functions with single distinct expressions?
Moreover, the point is not always numerical computation. I don’t think anybody argues that eml sounds like an efficient way to compute elementary functions numerically. It may or may not still be useful for symbolic computations.
The article is about producing all elementary functions, which 1/(x-y) clearly doesn’t, as it doesn’t produce any transcendental function. Like many of such universality-style results it may not have practical applications, but may still be interesting on its own right.
``` look at this paper: https://arxiv.org/pdf/2603.21852
now please produce 2x+y as a composition on EMLs ```
Opus(paid) - claimed that "2" is circular. Once I told it that ChatGPT have already done this, finished successfully.
ChatGPT(free) - did it from the first try.
Grok - produced estimation of the depth of the formula.
Gemini - success
Deepseek - Assumed some pre-existing knowledge on what EML is. Unable to fetch the pdf from the link, unable to consume pdf from "Attach file"
Kimi - produced long output, stopped and asked to upgrade
GLM - looks ok
TIL you can taunt LLMs. I guess they exhibit more competitive spirit than I thought.
It got a result.
""" Consider a mathematical function EML defined as `eml(x,y)=exp(x)−ln(y)`
Please produce `sin(x)/x` as a composition on EMLs and constant number 1 (one). """
``` 2x + y = \operatorname{eml}\Big(1,\; \operatorname{eml}\big(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(L_2 + L_x, 1), 1) \cdot \operatorname{eml}(y,1)),1)\big),1\big)\Big) ```
for me Gemini hallucinated EML to mean something else despite the paper link being provided: "elementary mathematical layers"
Reminds me of the Iota combinator, one of the smallest formal systems that can be combined to produce a universal Turing machine, meaning it can express all of computation.
I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.
Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Got a multidimensional and multivariate function to model (with random samples or a full map)? Just do gradient descent and convert it to approximant EML trees.
Perform gradient descent on EML function tree "phi" so that the derivatives in the Schroedinger equation match.
But as I said, still reading, this sounds too good to be true, but I have witnessed such things before :)
It's interesting to see his EML approach whereas mine was more on generating a context sensitive homoiconic grammar.
I've had lots of success combining spectral neural nets (GNNs, FNOs, Neural Tangent Kernels) with symbolic regression and using Operad Theory and Category Theory as my guiding mathematical machinery
If we make the analogy from Bertrand Russel's Principia Mathematica, he derived fully expanded expressions, i.e. trees where the leaves only may refer to the same literals, everyone claimed this madness underscored how formal verification of natural mathematics was a fools errand, but nevertheless we see successful projects like metamath (us.metamath.org) where this exponential blow-up does not occur. It is easy to see why: instead of representing proofs as full trees, the proofs are represented as DAG's. The same optimization would be required for EML to prevent exponential blow-up.
Put differently: if we allow extra buttons besides {1, EML} for example to capture unary functions the authors mentally add an 'x' button so now the RPN calculator has {1, EML, x}; but wait if you want multivariate functions it becomes an RPN calculator with extra buttons {1, EML, x,y,z} for example.
But why stop there? in metamath proofs are compressed: if an expression or wff was proven before in the same proof, it first subproof is given a number, and any subsequent invocations of this N'th subproof refers to this number. Why only recall input parameters x,y,z but not recall earlier computed values/functions?
In fact every proof in metamath set.mm that uses this DAG compressibility, could be split into the main proof and the repeatedly used substatements could be automatically converted to explicitly separate lemma proofs, in which case metamath could dispose of the single-proof DAG compression (but it would force proofs to split up into lemma's + main proof.
None of the proven theorems in metamath's set.mm displays the feared exponential blowup.
Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481
That is sort of comparable to how NAND simplify scaling.
Division is hell on gates.
The single component was the reason scaling went like it did.
There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).
If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.
Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.
As an example, searching for sqrt(x) would require a tree of depth ~40 which is in the trillion-parameter regime.
Its a way to make mathematical formulas completely unreadable. Its a way to spend more time on computing functions like log (3 ems reqd) while using more precision. Its a way to blow the mind of muggles reading hacker news.
Same reason all boolean logic isn't performed with combinations of NAND – it's computationally inefficient. Polynomials are (for their expressivity) very quick to compute.
And are a much less arbitrary choice than NAND, vs. NOR, XOR, etc.
Using transistors as conceptual digital logic primitives, where power dissipation isn't a thing, Pass Logic is "The Way".
But transistors can be N or P-channel, so it’s not a single logical primitive, like e.g. NAND-gates.
I might have misunderstood, but from the two "Why do X when you can do just Y with EML" sentences, I think you are describing symbolic regression, which has been around for quite some time and is a serious grown-up technique these days. But even the best symbolic regression tools do not typically "replace" other regression approaches.
Because the EML basis makes simple functions (like +) hard to express.
Not to diminish this very cool discovery!
This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals). In mathematics and physics there have been a lot of "party pooper" results which later found more profound and interesting positive results by properly rephrasing the question. A negative result for a myopic question isn't very informative on its own.
Solving the quintic in terms of transcendental functions has already been done
https://en.wikipedia.org/wiki/Bring_radical#The_Hermite%E2%8...
It seems like a neat parlour trick, indeed. But significant discovery?
All this really says is that the Taylor's expansions of e^x and ln x are sufficient to express to express trig functions, which is trivially true from Euler's formula as long as you're in the complex domain.
Arithmetic operations follow from the fact that e^x and ln x are inverses, in particular that e^ln(x) = x.
Taylor's series seem a bit like magic when you first see them but then you get to Real Analysis and find out there are whole classes of functions that they can't express.
This paper is interesting but it's not revolutionary.
I did however keep thinking there was a lot of attention to trying to include special constants even though we don't know that much about these constants yet, while comparatively little attention went to say elliptic integrals etc.
When aiming for a wide catch, you'd include some of those esoteric functions, or erf() etc...
I also wished they had attempted to find a representation for derivative and integrals.
> Submitted on 23 Mar 2026 (v1), last revised 4 Apr 2026 (this version, v2)
eml(z, eml(x, 1))
= e^z - ln(eml(x, 1))
= e^z - ln(e^x)
= e^z - x
and the claim is that, after it's expanded, z will be such that this whole thing is equal to -x. but with some algebra, this is happening only if e^z = 0,
and there is no complex number z that satisfies this equation. indeed if we laboriously expand the given formula for z (the left branch of the tree), we see that it goes through ln(0), and compound expressions.x^-1 has the same problem.
both formulae work ...sort of... if we allow ln(0) = Infinity and some other moxie, such as x / Infinity = 0 for all finite x.
looks like it computes ln(1)=0, then computes e-ln(0)=+inf, then computes e-ln(+inf)=-inf
> EML-compiled formulas work flawlessly in symbolic Mathematica and IEEE754 floating-point… This is because some formulas internally might rely on the following properties of extended reals: ln 0 = −∞, e^(−∞) = 0.
And then follows with:
> But EML expressions in general do not work ‘out of the box’ in pure Python/Julia or numerical Mathematica.
Thus, the paper’s completeness claim depends on a non-standard arithmetic convention (ln(0) = -∞), not just complex numbers as it primarily advertises. While the paper is transparent about this, it is however, buried on page 11 rather than foregrounded as a core caveat. Your comment deserves credit for flagging it.
This is the standard convention when doing operations in the extended real number line, i.e. in the set of the real numbers completed with positive and negative infinities.
When the overflow exception is disabled, any modern CPU implements the operations with floating-point numbers as operations in the extended real number line.
So in computing this convention has been standard for more than 40 years, while in mathematics it has been standard for a couple of centuries or so.
As always in mathematics, when computing expressions, i.e. when computing any kind of function, you must be aware very well which are the sets within which you operate.
If you work with real numbers (i.e. in a computer you enable the FP overflow exception), then ln(0) is undefined. However, if you work with the extended real number line, which is actually the default setting in most current programming languages, then ln(0) is well defined and it is -∞.
The main problem is that singularities infect everything, and you can't have an equational theory without rewrite/substitution rules that aren't grounded unless you can decide if an arbitrary EML tree is zero, which is undecidable in elementary functions (because of sine).
Basically it's only valid on a undecideable subset. All of his numerical tests are carefully crafted to avoid singularities which are exactly where it fails, and the singularities are all over the place -- in particular in subtraction (which shouldn't have any!). He wants to sort of compile it to actually computable functions and use those instead, but there is no equational theory possible that you can build with this.
You can't restrict it to the positive reals either or it's not closed, it's trivially easy to get to complex or negative numbers.
Using extended reals doesn't fix it, you just get different undefined terms (-inf + inf = 0).
It's quite pretty! I love the idea or I wouldn't have spent so much time on it. It just doesn't work, and none of his other candidates will work either because they all have ln(0).
The only sense in which it's true which is that it can generate the elementary functions that it can generate, which is just tautological. It can in no sense generate all of them.
Even his verification doesn't work because they set it up to check the narrow bands where it's valid, outside of that it's a mess.
−z = 1 − (e − ((e − 1) − z))
Quick google seach brings up https://github.com/pr701/nor_vm_core, which has a basic idea
That's awesome. I always wondered if there is some way to do this.
I read the paper. Is there a table covering all other math operations translated to eml(x,y) form?
You can find this link on the right side of the arxiv page:
https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformat...
e^ix = cos x + i sin x
which means: e^-ix = cos -x + i sin -x
= cos x - i sin x
so adding them together: e^ix + e^-ix = 2 cos x
cos x = (e*ix - e^-ix) / 2
So I guess the real part of that.Multiplication, division, addition and subtraction are all straightforward. So are hyperbolic trig functions. All other trig functions can be derived as per above.
https://gist.github.com/CGamesPlay/9d1fd0a9a3bd432e77c075fb8...
Few ideas that come to my mind when reading this:
1. One should also add absolute value (as sqrt(x*x)?) as a desired function and from that min, max, signum in the available functions. Since the domain is complex some of them will be a bit weird, I am not sure.
2. I think, for any bijective function f(x) which, together with its inverse, is expressible using eml(), we can obtain another universal basis eml(f(x),f(y)) with the added constant f^-1(1). Interesting special case is when f=exp or f=ln. (This might also explain the EDL variant.)
3. The eml basis uses natural logarithm and exponent. It would be interesting to see if we could have a basis with function 2^x - log_2(y) and constants 1 and e (to create standard mathematical functions like exp,ln,sin...). This could be computationally more feasible to implement. As a number representation, it kinda reminds me of https://en.wikipedia.org/wiki/Elias_omega_coding.
4. I would like to see an algorithm how to find derivatives of the eml() trees. This could yield a rather clear proof why some functions do not have indefinite integrals in a symbolic form.
5. For some reason, extending the domain to complex numbers made me think about fuzzy logics with complex truth values. What would be the logarithm and exponential there? It could unify the Lukasiewicz and product logics.
[1] https://github.com/tromp/AIT/blob/master/ait/minbase.lam
They're primitive in the sense that you can't compute exp(x) or log(x) using a finite combination of other elementary functions for any x. If you allow infinite many operations, then you can easily find infinite sums or products of powers, or more complicated expressions to represent exp and log and other elementary functions.
> Or do we assume that we have an infinite lookup table for all possible inputs?
Essentially yes, you don't necessarily need an "implementation" to talk about a function, or more generally you don't need to explicitly construct an object from simpler pieces: you can just prove it satisfies some properties and that it is has to exist.
For exp(x), you could define the function as the solution to the diffedential equal df/dx = f(x) with initial condition f(0) = 1. Then you would enstablish that the solution exists and it's unique (it follows from the properties of the differential equation), call exp=f and there you have it. You don't necessarily know how to compute for any x, but you can assume exp(x) exists and it's a real number.
In other words, this result does not aim to improve computability or bound the complexity of calculating the numerical value. Rather, it aims to exhibit this uniform, finite tree structure for the entire family of elementary expressions.
To clarify my earlier point: the author isn't trying to build a practical calculator or generate human-readable algebra. Using exp and ln isn't a cheat code because the goal is purely topological. The paper just proves that this massive, diverse family of continuous math can be mapped perfectly onto a uniform binary tree, without secretly burying a state machine inside the operator.
Posts like these are the reason i check HN every day
Python package is live: github.com/almaguer1986/monogate/tree/main/python — EMLTree (scalar symbolic regression), EMLNetwork (function approximation from data), fully differentiable via PyTorch autograd.
Two experiments ran:
Experiment 1: EMLTree successfully finds e and 0 by gradient descent. It does NOT rediscover eml(1,1) — it finds different valid constructions. The network finds its own paths.
Experiment 2: We swept a complexity penalty (lambda) trying to force the network toward minimal constructions. Instead of snapping cleanly to eml(1,1), the optimizer got stuck in what we're calling a phantom attractor — eml(eml(0.45, 1), eml(1, 1)) — a non-minimal construction that accidentally evaluates to e and resists the penalty. Forcing true minimality costs 10,518% more MSE.
The phantom attractor is a new phenomenon: locally stable non-minimal EML constructions that gradient descent cannot escape. Whether discrete search can escape them is open.
Exhaustive search tool also found -iπ in 11 nodes, improving our hand-proved 12. monogate.dev/search runs in the browser.
Full results: github.com/almaguer1986/monogate/blob/main/python/RESULTS.md
Although you also need to encode where to put the input.
The real question is what emoji to use for eml when written out.
Some Emil or another, I suppose. Maybe the one from Ratatouille, or maybe this one: https://en.wikipedia.org/wiki/Emil_i_L%C3%B6nneberga
I'm kidding, of course. You can encode anything in bits this way.
We implemented the EML trees as a differentiable module of PyTorch. Each leaf is a softmax mixture over {0, 1, x}, and the tree is evaluated from the bottom up. The entire system can be trained with Adam.
Results: 7 of the 7 elementary functions (exp, ln, sqrt, x², x³, 1/x, sin) converge with ≤24 parameters at depth 3. Three (exp, ln, sqrt) achieve an RMSE < 0.005.
The main challenge is depth scaling. Random initialization at depth 4 always diverges: the exp() strings create towering exponential growth that cancels out the gradients. We tested 12 initialization strategies; only hierarchical hot-starting (training depth n-1 first) works. sin(x²) gets 12.9x better at depth 4 vs depth 3.
Two honest negative results: (1) The trained trees use continuous softmax mixtures, not discrete leaf assignments; therefore, numerical approximations, not exact formulas, are obtained for everything except exp(x). (2) A 49-parameter MLP and PySR outperform it in MSE by orders of magnitude. It's not a practical tool — it just shows that gradient descent can work on S → 1 | eml(S, S) without needing sin, exp, etc. as primitives.
Paper: https://doi.org/10.5281/zenodo.19592926 Code: https://github.com/seetrex-ai/monolith
It’s a derivation of the Y combinator from ruby lambdas
More on topic:
> No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations.
I was taught that these were all hypergeometric functions. What distinction is being drawn here?
When you have a function with many parameters it becomes rather trivial to express simpler functions with it.
You could find a lot of functions with 4 parameters that can express all elementary functions.
Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
Granted, but the claim in the abstract says:
>> computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations
And I don't see how this is true as to hypergeometric functions in a way that isn't shared by the approach in the paper.
> Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
> A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
These statements seem to be in direct conflict with each other; you can use the second parameter of a binary function to identify a unary function just as you can use the fourth parameter of a quaternary function to identify a trinary one.
It's one of those facts that tends to blow minds when it's first encountered, I can see why one would name a company after it.
https://th.if.uj.edu.pl/~odrzywolek/homepage/personal/pc/pc_...
Simply because bool algebra doesn't have that many functions and all of them are very simple to implement.
A complex bool function made out of NANDs (or the likes) is little more complex than the same made out of the other operators.
Implementing even simple real functions out of eml() seems to me to add a lot of computational complexity even with both exp() and ln() implemented in hardware in O(1). I think about stuff sum(), div() and mod().
Of course, I might be badly wrong as I am not a mathematician (not even by far).
But I don't see, at the moment, the big win on this.
> Everyone learns many mathematical operations in school: fractions, roots, logarithms, and trigonometric functions (+, −, ×, /, sqrt, sin, cos, log, …), each with its own rules and a dedicated button on a scientific calculator. Higher mathematics reveals that many of these are redundant: for example, trigonometric ones reduce to the complex exponential. How far can this reduction go? We show that it goes all the way: a single operation, eml(x, y), replaces every one of them. A calculator with just two buttons, EML and the digit 1, can compute everything a full scientific calculator does. This is not a mere mathematical trick. Because one repeatable element suffices, mathematical expressions become uniform circuits, much like electronics built from identical transistors, opening new ways to encoding, evaluating, and discovering formulas across scientific computing.
Consider it a bit like a "church encoding" for complex numbers. I'll try to demonstrate with an S-expression representation.
---
A small primer if you're not familiar. S-expressions are basically atoms (symbols/numbers etc), pairs, or null.
S = <symbol>
| <number>
| (S . S) ;; aka pair
| () ;; aka null
There's some syntax sugar for right chains of pairs to form lists: (a b c) == (a . (b . (c . ())) ;; a proper list
(a b . c) == (a . (b . c)) ;; an improper list
(#0=(a b c) #0#) == ((a b c) (a b c)) ;; a list with a repeated sublist using a reference
---So, we have a function `eml(x, y) and a constant `1`. `x` and `y` are symbols.
Lets say we're going to replace `eml` with an infix operator `.`, and replace the unit 1 with `()`.
C = <symbol>
| <number>
| (C . C) ;; eml
| () ;; 1
We have basically the same context-free structure - we can encode complex numbers as lists. Let's define ourselves a couple of symbols for use in the examples: ($define x (string->symbol "x"))
($define y (string->symbol "y"))
And now we can define the `eml` function as an alias for `cons`. ($define! eml cons)
(eml x y)
;; Output: (x . y)
We can now write a bunch of functions which construct trees, representing the operations they perform. We use only `eml` or previously defined functions to construct each tree: ;; e^x
($define! exp ($lambda (x) (eml x ())))
(exp x)
;; Output: (x)
;; Note: (x) is syntax sugar for (x . ())
;; Euler's number `e`
($define! c:e (exp ()))
c:e
;; Output: (())
;; Note: (()) is syntax sugar for (() . ())
;; exp(1) - ln(x)
($define! e1ml ($lambda (x) (eml () x)))
(e1ml x)
;; Output: (() . x)
;; ln(x)
($define! ln ($lambda (x) (e1ml (exp (e1ml x)))))
(ln x)
;; Output: (() (() . x))
;; Zero
($define! c:0 (ln ()))
c:0
;; Output: (() (()))
;; -infinity
($define! c:-inf (ln 0))
c:-inf
;; Output: (() (() () (())))
;; -x
($define! neg ($lambda (x) (eml c:-inf (exp x))))
(neg x)
;; Output: ((() (() () (()))) x)
;; +infinity
($define! c:+inf (neg c:-inf))
c:+inf
;; Output: (#0=(() (() () (()))) #0#)
;; 1/x
($define! recip ($lambda (x) (exp (eml c:-inf x))))
(recip x)
;; Output: (((() (() () (()))) . x))
;; x - y
($define! sub ($lambda (x y) (eml (ln x) (exp y))))
(sub x y)
;; Output: ((() (() . x)) y)
;; x + y
($define! add ($lambda (x y) (sub x (neg y))))
(add x y)
;; Output: ((() (() . x)) ((() (() () (()))) y))
;; x * y
($define! mul ($lambda (x y) (exp (add (ln x) (exp (neg y))))))
(mul x y)
;; Output: (((() (() () (() . x))) (#0=(() (() () (()))) ((#0# y)))))
;; x / y
($define! div ($lambda (x y) (exp (sub (ln x) (ln y)))))
(div x y)
;; Output: (((() (() () (() . x))) (() (() . y))))
;; x^y
($define! pow ($lambda (x y) (exp (mul x (ln y)))))
(pow x y)
;; Output: ((((() (() () (() . x))) (#0=(() (() () (()))) ((#0# (() (() . y))))))))
I'll stop there, but we continue for implementing all the trig, pi, etc using the same approach.So basically, we have a way of constructing trees based on `eml`
Next, we pattern match. For example, to pattern match over addition, extract the `x` and `y` values, we can use:
($define! perform-addition
($lambda (add-expr)
($let ((((() (() . x)) ((() (() () (()))) y)) add-expr))
(+ x y))))
;; Note, + is provided by the language to perform addition of complex numbers
(perform-addition (add 256 512))
;; Output: 768
So we didn't need to actually compute any `exp(x)` or `ln(y)` to perform this addition - we just needed to pattern match over the tree, which in this case the language does for us via deconstructing `$let`.We can simplify the defintion of perform-addition by expanding the parameters of a call to `add` as the arguments to the function:
($define! $let-lambda
($vau (expr . body) env
($let ((params (eval expr env)))
(wrap (eval (list* $vau (list params) #ignore body) env)))))
($define! perform-addition
($let-lambda (add x y)
(+ x y)))
($define! perform-subtraction
($let-lambda (sub x y)
(- x y)))
($define! sub-expr (sub 256 512))
;; Output: #inert
sub-expr
;; Output: ((() (() . 256)) 512)
(perform-subtraction sub-expr)
;; Output: -256
There's a bit more work involved for a full pattern matcher which will take some arbitrary `expr` and perform the relevant computation. I'm still working on that.Examples are in the Kernel programming language, tested using klisp[1]
https://github.com/howerj/muxleq
PD: you don't need gforth to compile:
cc -O2 -o muxleq muxleq.c
Edit muxleq.fth, add some goodies by editing these values: 1 constant opt.multi ( Add in large "pause" primitive )
1 constant opt.editor ( Add in Text Editor )
1 constant opt.info ( Add info printing function )
0 constant opt.generate-c ( Generate C code )
1 constant opt.better-see ( Replace 'see' with better version )
1 constant opt.control ( Add in more control structures )
0 constant opt.allocate ( Add in "allocate"/"free" )
1 constant opt.float ( Add in floating point code )
0 constant opt.glossary ( Add in "glossary" word )
1 constant opt.optimize ( Enable extra optimization )
1 constant opt.divmod ( Use "opDivMod" primitive )
0 constant opt.self ( self-interpreter [NOT WORKING] )
Here are my settings.Then, create a new ".dec" file (subleq program)
: ./muxleq muxleq.dec < muxleq.fth > new.dec
Now, to run EForth everytime: ./muxleq new.dec
add these further in muxleq.fth code to have them: : d. tuck dabs <# #s rot sign #> type ;
: */ */mod nip ;
The ideal place would be just below the ': dabs ' defition.I think the bigger question is whether it will be more energy-optimal or silicon density-optimal than math libraries that are currently baked into these processors (FPUs).
There are also some edge cases "exp(exp(x))" and infinities that seem to result in something akin to "division by zero" where you need more than standard floating-point representations to compute - but these edge cases seem like compiler workarounds vs silicon issues.
It gets a bit more difficult for the complex domain because you need rotation.
I'm curious how this would compare to the dedicated sse or xmx instructions currently inside most processor's instruction sets.
Lastly, you could also create 5-depth or 6-depth EML tree in hardware (fpga most likely) and use it in lieu of the rust implementation to discover weight-optimal eml formulas for input functions much quicker, those could then feed into a "compiler" that would allow it to run on a similar-scale interpreter on the same silicon.
In simple terms: you can imagine an EML co-processor sitting alongside a CPUs standard math coprocessor(s): XMX, SSE, AMX would do the multiplication/tile math they're optimized for, and would then call the EML coprocessor to do exp,sin,log calls which are processed by reconfiguring the EML trees internally to process those at single-cycle speed instead of relaying them back to the main CPU to do that math in generalized instructions - likely something that takes many cycles to achieve.
So, what happens if you take say the EML expression for addition, and invert the binary tree?
Of course the redundant primitives aren't free, since they add code size or die area. In choosing how many primitives to provide, the designer of a numerical library aims to make a reasonable tradeoff between that size cost and the speed benefit.
This paper takes that tradeoff to the least redundant extreme because that's an interesting theoretical question, at the cost of transforming commonly-used operations with simple hardware implementations (e.g. addition, multiplication) into computational nightmares. I don't think anyone has found a practical application for their result yet, but that's not the point of the work.
Traditional processors, even highly dedicated ones like TMUs in gpus, still require being preconfigured substantially in order to switch between sin/cos/exp2/log2 function calls, whereas a silicon implementation of an 8-layer EML machine could do that by passing a single config byte along with the inputs. If you had a 512-wide pipeline of EML logic blocks in modern silicon (say 5nm), you could get around 1 trillion elementary function evaluations per second on 2.5ghz chip. Compare this with a 96 core zen5 server CPU with AVX-512 which can do about 50-100 billion scalar-equivalent evaluations per second across all cores only for one specific unchanging function.
Take the fastest current math processors: TMUs on a modern gpu: it can calculate sin OR cos OR exp2 OR log2 in 1 cycle per shader unit... but that is ONLY for those elementary functions and ONLY if they don't change - changing the function being called incurs a huge cycle hit, and chaining the calculations also incurs latency hits. An EML coprocessor could do arcsinh(x² + ln(y)) in the same hardware block, with the same latency as a modern cpu can do a single FMA instruction.
The fact that addition, subtraction, and multiplication run quickly on typical processors isn't arbitrary--those operations map well onto hardware, for roughly the same reasons that elementary school students can easily hand-calculate them. General transcendental functions are fundamentally more expensive in time, die area, and/or power, for the same reasons that elementary school students can't easily hand-calculate them. A primitive where all arithmetic (including addition, subtraction, or negation) involves multiple transcendental function evaluations is not computationally faster, lower-power, lower-area, or otherwise better in any other practical way.
The comments here are filled with people who seem to be unaware of this, and it's pretty weird. Do CS programs not teach computer arithmetic anymore?
Derivatives: No. Exercise: Write the derivative of f(x)=eml(x,x)
Integrals: No. No. No. Integrals of composition are a nightmare, and here they use long composition chain like g(x)=eml(1,eml(eml(1,x),1)).
If f(x) = exp(x) - ln(x) then f’(x) = exp(x) - 1/x, which is representable in eml form as well.
To the overall point though, I don’t think it helps make derivatives easier though. To refactor a function to eml’s is far more work than refactoring into something that’s trivially differentiable with the product rule and chain rule.
f'(x) = eml(x,x) + eml(1,eml(eml(1,x)),1) + eml(eml(1, exp(eml(1, 1))),-eml(1, eml(eml(1, x))),1)
and I still have to macroexpand a few x-y = eml(eml(1, exp(eml(1, x))), eml(y,1))
but I got really boredLike when the Apollo guidance computer was made, the bottleneck was making integrated chips so they only made one, the NOR gate, and a whackton of routing to build out an entire CPU. Horribly complex routing, very simplified integrated circuit construction
Eg ln is a rather complicated construct, it's not even a function. That's because for complex numbers, e^x is not bijective, and thus its inverse ain't a function.
So using that complicated construct to define something simpler like addition invites extra complexity.
Exp and ln, isn't the operation its own inverse depending on the parameter? What a neat find.
This is a function from ℝ² to ℝ. It can't be its own inverse; what would that mean?
But no...
This is about continuous math, not ones and zeroes. Assuming peer review proves it out, this is outstanding.
Elementary functions such as exponentiation, logarithms and trigonometric functions are the standard vocabulary of STEM education. Each comes with its own rules and a dedicated button on a scientific calculator;
What?
and No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, √ , and log has always required multiple distinct operations. Here we show that a single binary operator
Yeah, this is done by using tables and series. His method does not actually facilitate the computation of these functions.
There is no such things as "continuous mathematics". Maybe he meant to say continuous function?
Looking at page 14, it looks like he reinvented the concept of the vector valued function or something. The whole thing is rediscovering something that already exists.
The point of this paper is not to revolutionize how a scientific calculator functions overnight, its to establish a single binary operation that can reproduce the rest of the typical continuous elementary operations via repeated application, analogous to how a NAND or NOR gate creates all of the discrete logic gates. Hence, "continuous mathematics" as opposed to discrete mathematics. It seems to me you're being overly negative without solid reasoning.
But he didn't show this though. I skimmed the paper many times. He creates multiple branches of these trees in the last page, so it's not truly a single nested operation.
What comes to my mind as an alternative which I would subjectivity finer is "axe". Think axiom or axiology.
Anyone with other suggestions? Or even remarks on this one?
On my side I like direct sementic connections, but find convoluted indirections conflated through lazy sigles strongly repulsive. I can appreciate an acronym that make and the direct connection and playful indirect reference to expanded terms.
This work is about continuous systems, even if the reduction of many kinds of functions to compositions of a single kind of function is analogous to the reductions of logic functions to composing a few kinds or a single kind of logic functions.
I actually do not value the fact that the logic functions can be expressed using only NAND. For understanding logic functions it is much more important to understand that they can be expressed using either only AND and NOT, or only OR and NOT, or only XOR and AND (i.e. addition and multiplication modulo 2).
Using just NAND or just NOR is a trick that does not provide any useful extra information. There are many things, including classes of mathematical functions or instruction sets for computers, which can be implemented using a small number of really independent primitives.
In most or all such cases, after you arrive to the small set of maximally simple and independent primitives, you can reduce them to only one primitive.
However that one primitive is not a simpler primitive, but it is a more complex one, which can do everything that the maximally simple primitives can do, and it can recreate those primitives by composition with itself.
Because of its higher complexity, it does not actually simplify anything. Moreover, generating the simpler primitives by composing the more complex primitive with itself leads to redundancies, if implemented thus in hardware.
This is a nice trick, but like I have said, it does not improve in any way the understanding of that domain or its practical implementations in comparison with thinking in terms of the multiple simpler primitives.
For instance, one could take a CMOS NAND gate as the basis for implementing a digital circuit, but you understand better how the CMOS logic actually works when you understand that actually the AND function and the NOT function are localized in distinct parts of that CMOS NAND gate. This understanding is necessary when you have to design other gates for a gate library, because even if using a single kind of gate is possible, the performance of this approach is quite sub-optimal, so you almost always you have to design separately, e.g. a XOR gate, instead of making it from NAND gates or NOR gates.
In CMOS logic, NAND gates and NOR gates happen to be the simplest gates that can restore at output the same logic levels that are used at input. This confuses some people to think that they are the simplest CMOS gates, but they are not the simplest gates when you remove the constraint of restoring the logic levels. This is why you can make more complex logic gates, e.g. XOR gates or AND-OR-INVERT gates, which are simpler than they would be if you made them from distinct NAND gates or NOR gates.
One thing I wonder now: NAND is symmetric while this isn't, could something similar be found where function(x, y) = function(y, x)?
nix run github:pveierland/eml-eval
EML Evaluator — eml(x, y) = exp(x) - ln(y)
Based on arXiv:2603.21852v2 by A. Odrzywołek
Constants
------------------------------------------------------------------------------
1 K=1 d=0 got 1 expected 1 sym=ok num=ok [simplify]
e K=3 d=1 got 2.718281828 expected 2.718281828 sym=ok num=ok [simplify]
0 K=7 d=3 got 0 expected 0 sym=ok num=ok [simplify]
-1 K=17 d=7 got -1 expected -1 sym=ok num=ok [simplify]
2 K=27 d=9 got 2 expected 2 sym=ok num=ok [simplify]
-2 K=43 d=11 got -2 expected -2 sym=ok num=ok [simplify]
1/2 K=51 d=15 got 0.5 expected 0.5 sym=ok num=ok [simplify]
-1/2 K=67 d=17 got -0.5 expected -0.5 sym=ok num=ok [simplify]
2/3 K=103 d=19 got 0.6666666667 expected 0.6666666667 sym=ok num=ok [simplify]
-2/3 K=119 d=21 got -0.6666666667 expected -0.6666666667 sym=ok num=ok [simplify]
sqrt2 K=85 d=21 got 1.414213562 expected 1.414213562 sym=ok num=ok [simplify]
i K=75 d=19 got i expected i sym=ok num=ok [i²=-1, simplify]
pi K=153 d=29 got 3.141592654 expected 3.141592654 sym=ok num=ok [simplify]
Unary functions (x = 7/3)
------------------------------------------------------------------------------
exp(x) K=3 d=1 got 10.3122585 expected 10.3122585 sym=ok num=ok [simplify]
ln(x) K=7 d=3 got 0.8472978604 expected 0.8472978604 sym=ok num=ok [simplify]
-x K=17 d=7 got -2.333333333 expected -2.333333333 sym=ok num=ok [simplify]
1/x K=25 d=8 got 0.4285714286 expected 0.4285714286 sym=ok num=ok [simplify]
x - 1 K=11 d=4 got 1.333333333 expected 1.333333333 sym=ok num=ok [simplify]
x + 1 K=27 d=9 got 3.333333333 expected 3.333333333 sym=ok num=ok [simplify]
2x K=67 d=17 got 4.666666667 expected 4.666666667 sym=ok num=ok [simplify]
x/2 K=51 d=15 got 1.166666667 expected 1.166666667 sym=ok num=ok [simplify]
x^2 K=41 d=10 got 5.444444444 expected 5.444444444 sym=ok num=ok [simplify]
sqrt(x) K=59 d=16 got 1.527525232 expected 1.527525232 sym=ok num=ok [simplify]
Binary operations (x = 7/3, y = 5/2)
------------------------------------------------------------------------------
x + y K=27 d=9 got 4.833333333 expected 4.833333333 sym=ok num=ok [simplify]
x - y K=11 d=4 got -0.1666666667 expected -0.1666666667 sym=ok num=ok [simplify]
x * y K=41 d=10 got 5.833333333 expected 5.833333333 sym=ok num=ok [simplify]
x / y K=25 d=8 got 0.9333333333 expected 0.9333333333 sym=ok num=ok [simplify]
x ^ y K=49 d=12 got 8.316526261 expected 8.316526261 sym=ok num=ok [simplify]My dearest congrats to the author in case s/he shows around this site ^^.
And i is obviously `sqrt(-1)`
Table 4 shows the "size" of the operators when fully expanded to "eml" applications, which is quite large for +, -, ×, and /.
Here's one approach which agrees with the minimum sizes they present:
eml(x, y ) = exp(x) − ln(y) # 1 + x + y
eml(x, 1 ) = exp(x) # 2 + x
eml(1, y ) = e - ln(y) # 2 + y
eml(1, exp(e - ln(y))) = ln(y) # 6 + y; construction from eq (5)
ln(1) = 0 # 7
After you have ln and exp, you can invert their applications in the eml function eml(ln x, exp y) = x - y # 9 + x + y
Using a subtraction-of-subtraction to get addition leads to the cost of "27" in Table 4; I'm not sure what formula leads to 19 but I'm guessing it avoids the expensive construction of 0 by using something simpler that cancels: x - (0 - y) = x + y # 25 + {x} + {y}xy = eml(eml(1, eml(eml(eml(eml(1, eml(eml(1, eml(1, x)), 1)), eml(1, eml(eml(1, eml(y, 1)), 1))), 1), 1)), 1)
From Table 4, I think addition is slightly more complicated?
const eml = (x,y) => Math.exp(x) - Math.log(y);
const mul = (x,y) => eml(eml(1,eml(eml(eml(1,eml(eml(1,eml(1,x)),1)),eml(1,eml(eml(1,eml(y,1)),1))),1)),1);
console.log(mul(5,7));
> 35.00000000000001For larger or negative inputs you get a NaN because ECMAScript has limited precision and doesn't handle imaginary numbers.
This also shows why EML is not practical for computation.
exp(a) = eml(a, 1) ln(a)=eml(1,eml(eml(1,a),1))
Plugging those in is an excercise to the reader
"All" is a tall claim. Have a look at https://perso.ens-lyon.fr/jean-michel.muller/FP5.pdf for example. Jump to slide 18:
> Forget about Taylor series
> Taylor series are local best approximations: they cannot compete on a whole interval.
There is no need to worry about "sh-tt-ng" on their result when there is so much to learn about other approximation techniques.
I personally mostly do my everyday work using taylor expansion (mostly explicit numerical methods in comp. EM because they're cheaper these days and it's simpler to write down) so it's what first comes to mind.
So it really helps when people explain (1) their context** and (2) their reasoning. Communicating well is harder than people think. Many comments are read by hundreds or more (thousands?) of people, most of whom probably have no idea who we are, what we know, or what we do with our brains on a regular basis. It is generous and considerate to other people to slow down and really explain where we're coming from.
So, when I read "most people use Taylor approximations"...
1. my first question is "on what basis can someone say this?"
2. for what domains might this somewhat true? False?
3. but the bigger problem is that claims like the above don't teach. i.e. When do Taylor series methods fall short? Why? When are the other approaches more useful?
Here's my quick take... Taylor expansions tends to work well when you are close to the expansion point and the function is analytic. Taylor expansions work less well when these assumptions don't hold. More broadly they don't tend to give uniform accuracy across a range. So Taylor approximations are usually only local. Other methods (Padé, minimax, etc) are worth reaching for when other constraints matter.
* I think this is a huge area we're going to need to work on in the age where anyone can sound like an expert.
** In the case above, does "comp. EM" mean "computational electromagnetics" or something else? The paper talks about "EML" so it makes me wonder if "EM" is a typo. All of these ambiguities add up and make it hard for people to understand each other.
Here is my attempt. I think they should be optimal up to around 15 eml.nodrs, the latter might not be:
# 0
1=1
# 1
exp(x)=eml(x,1)
e-ln(x)=eml(1,x)
e=exp(1)
# 2
e-x=e-ln(exp(x))
# 3
0=e-e
ln(x)=e-(e-ln(x))
exp(x)-exp(y)=eml(x,exp(exp(y)))
# 4
id(x)=e-(e-x)
inf=e-ln(0)
x-ln(y)=eml(ln(x),y)
# 5
x-y=x-ln(exp(y))
-inf=e-ln(inf)
# 6
-ln(x)=eml(-inf,x)
ln(ln(x))=ln(ln(x))
# 7
-x=-ln(exp(x))
-1=-1
x^-1=exp(-ln(x))
ln(x)+ln(y)=e-((e-ln(x))-ln(y))
ln(x)-ln(y)=ln(x)-ln(y) # using x - ln(y)
# 8
xy=exp(ln(x)+ln(y))
x/y=exp(ln(x)-ln(y))
# 9
x + y = ln(exp(x))+ln(exp(y))
2 = 1+1
# 10
ipi = ln(-1)
# 13
-i
pi=-ln(-1)x^y = exp(ln(x)y)
# 16
1/2 = 2^-1
# 17
x/2 = x/2
x
2 = x2# 20
ln(sqrt(x)) = ln(x)/2
# 21
sqrt(x) = exp(ln(sqrt(x)))
# 25
sqrt(xy) = exp((ln(x)+ln(y))/2)
# 27
ln(i)=ln(sqrt(-1))
# 28
i = sqrt(-1)
-pi^2 = (i
pi)(ipi)# 31
pi^2 = (ipi)(-ipi)
# 37
exp(x
i)=exp(xi)# 44
exp(-x
i)=exp(-(xi))# 46
pi = (i
pi)/i# 90+x?
2cos(x)=exp(xi)+exp(-xi))
# 107+x?
cos(x) = (2cos(x))/2
# 118+x?
2sin(x)=(exp(x*i)-exp(-xi))/i # using exp(x)-exp(y)
# 145+x?
sin(x) = (2sin(x))/2
# 217+3x?
tan(x) = 2sin(x)/(2cos(x))
The plan is to use this new "structurally flawless mathematical primitive" EML (this is all beyond me, was just having some fun trying to make it cook things together) in TPUs made out of logarithmic number system circuits. EML would have DAGs to help with the exponential bloat problem. Like CERN has these tiny fast "harcode models" as an inspiration. All this would be bounded by the deductive causality of Pedro Domingoses Tensor Logic and all of this would einsum like a mf. I hope it does.
Behold, The Weather Dominator!
https://notebooklm.google.com/notebook/e0a54a54-c644-4c89-9d...