While that's true, Python would be using big integers (PyLongObject) for most of the computations, meaning every number gets allocated on the heap.
If we use a Python implementation that would avoid this, like PyPy or Cython, the results change significantly:
% cat sum.py
def sum(depth, x):
if depth == 0:
return x
else:
fst = sum(depth-1, x*2+0) # adds the fst half
snd = sum(depth-1, x*2+1) # adds the snd half
return fst + snd
if __name__ == '__main__':
print(sum(30, 0))
% time pypy sum.py
576460751766552576
pypy sum.py 4.26s user 0.06s system 96% cpu 4.464 total
That's on an M2 Pro. I also imagine the result in Bend would not be correct since it only supports 24 bit integers, meaning it'd overflow quite quickly when summing up to 2^30, is that right?[Edit: just noticed the previous comment had already mentioned pypy]
> I'm aware it is 2x slower on non-Apple CPUs.
Do you know why? As far as I can tell, HVM has no aarch64/Apple-specific code. Could it be because Apple Silicon has wider decode blocks?
> can be underwhelming, and I understand if you don't believe on my words
I don't think anyone wants to rain on your parade, but extraordinary claims require extraordinary evidence.
The work you've done in Bend and HVM sounds impressive, but I feel the benchmarks need more evaluation/scrutiny. Since your main competitor would be Mojo and not Python, comparisons to Mojo would be nice as well.