undefined | Better HN

0 pointsoutworlder26d ago0 comments

> The issue is solely that OG Mercurial was written in Python.

Are we back to "programming language X is slow" assertions? I thought those had died long ago.

Better algorithms win over 'better' programming languages every single time. Git is really simple and efficient. You could reimplement it in Python and I doubt it would see any significant slowness. Heck, git was originally implemented as a handful of low level binaries stitched together with shell scripts.

0 comments

jmalicki26d ago

Every time I've rewritten something from Python into Java, Scala, or Rust it has gotten around ~30x faster. Plus, now I can multithread too for even more speedups.

Python is absurdly slow - every method call is a string dict lookup (slots are way underused), everything is all dicts all the time, the bytecode doesn't specialize at all to observed types, it is a uniquely horrible slow language.

I love it, but python is almost uniquely a slow language.

Algorithms matter, but if you have good algorithms, or you're already linear time and just have a ton of data, rewriting something from a single-threaded Python program to a multithreaded rust program I've seen 500x speedups, where the algorithms were not improved at all.

It's the difference between a program running overnight vs. in 30 seconds. And if there are problems, the iteration speed from that is huge.

eru26d ago

> [...], it is a uniquely horrible slow language.

To be fair, Python as implement today is horribly slow. You could leave the language the same but apply all the tricks and heroic efforts they used to make JavaScript fast. The language would be the same, but the implementations would be faster.

Of course, in practice the available implementations are very much part of the language and its ecosystems; especially for a language like Python which is so defined by its dominant implementation of CPython.

jmalicki26d ago

Fair! I guess I didn't mean language as such, but as used.

But a lot of the monkey-patching kind of things and dynamism of python also means a lot of those sorts of things have to be re-checked often for correctness, so it does take a ton of optimizations off the table. (Of course, those are rare corner cases, so compilers like pypy have been able to optimize for the "happy case" and have a slow fall-back path - but pypy had a ton of incompatibility issues and now seems to be dying).

dtech26d ago

Javascript has a lot of the same theoretical dynamism, yet V8 and WebkitCore were able to make it fast

1 more reply

mike_hearn26d ago

Python has a JIT compiling version in GraalPy. If you have pure Python it works well. The problem is, a lot of Python code is just callouts to C++ ML libs these days and the Python/C interop boundary just assumes you're using CPython and requires other runtimes to emulate it.

xmcqdpt226d ago

You don't even need to go all V8, you could just build something like LuaJIT and get most of the way there. LuaJIT is like 10k LOCs and V8 is 3M LOC.

The real reason is that it is a deliberate choice by the CPython project to prefer extensibility and maintainability to performance. The result is that python is a much more hackable language, with much better C interop than V8 or JVM.

byroot26d ago

> every method call is a string dict lookup

Doesn't the Python VM have inline caches? [0]

https://en.wikipedia.org/wiki/Inline_caching

jmalicki26d ago

I think that's a new thing from like python 3.12+ or something after I stopped using Python as much.

It didn't used to.

EDIT: python 3.11+: https://peps.python.org/pep-0659/

kuschku26d ago

I've rewritten a python tool in go, 1:1. And that turned something that was so slow that it was basically a toy, into something so fast that it became not just usable, but an essential asset.

Later on I also changed some of the algorithms to faster ones, but their impact was much lower than the language change.

bonesss26d ago

I don’t know if people think this way anymore, but Python gained traction to some degree as a prototyping language. Verify the logic and structures, then implement the costly bits or performance sensitive bits in a more expense-to-produce more performant language.

Which is only to say: that rewrite away from python story can also work to show python doing its job. Risk reduction, scaffolding, MVP validation.

Diggsey26d ago

> git was originally implemented as a handful of low level binaries stitched together with shell scripts.

A bunch of low level binaries stitched together with shell scripts is a lot faster than python, so not really sure what the point of this comparison is.

Python is an extremely versatile language, but if what you're doing is computing hashes and diffs, and generally doing entirely CPU-bound work, then it's objectively the wrong tool, unless you can delegate that to a fast, native kernel, in which case you're not actually using Python anymore.

eru26d ago

Well, you can and people do use Python to stitch together low level C code. In that sense, you could go the early git approach, but use Python instead of shell as the glue.

saghm26d ago

Their point was that by offloading the bottlenecks to C, you've essentially conceded that Python isn't fast enough for them, which was the original point made above

eru26d ago

Fair point!

eru26d ago

> Better algorithms win over 'better' programming languages every single time.

That's often true, but not "every single time".

20k26d ago

Python is by far the slowest programming language, an order of magnitude slower than other languages

One of the reason mercurial lost the dvcs battle is because of its performance - even the mercurial folks admitted that was at least in part because of python

ragall26d ago

> I thought those had died long ago.

No, it's always been true. It's just that at some point people got bored and tired of pointing it out.

bmitc26d ago

You barely have to try to have Python be noticeably slow. It's the only language I have ever used where I was even aware that a programming language could be slow.

ezst26d ago

> Are we back to "programming language X is slow" assertions? thought those had died long ago.

Yes we are? The slow paths of mercurial have been rewritten in C (and more recently in Rust) and improved the perf story substantially, without taking away from the wild modularity and extensibility hg always had.

saghm26d ago

> You could reimplement it in Python and I doubt it would see any significant slowness

I doubt it wouldn't be significantly slower. I can't disprove it's possible to do this but it's totally possible for you to prove your claim, so I'd argue that the ball is in your court.

surajrmal26d ago

You must belong to the club of folks who use hashmaps to store 100 objects. It's amazing how much we've brainwashed folks to focus on algorithms and lose sight of how to actually properly optimize code. Being aware of how your code interacts with cache is incredibly important. There are many cases of using slower algorithms to do work faster purely because it's more hardware friendly.

The reason that some more modern tools, like jj, really blow git out of the water in terms of performance is because they make good choices, such as doing a lot of transformations entirely in memory rather than via the filesystem. It's also because it's written in a language that can execute efficiently. Luckily, it's clear that modern tools like jj are heavily inspired by mercurial so we're not doomed to the ux and performance git binds us with.

inejge26d ago

> You must belong to the club of folks who use hashmaps to store 100 objects.

Apparently I belong to the same club -- when I'm writing AWK scripts. (Arrays are hashmaps in a trenchcoat there.) Using hashmaps is not necessarily an indictment you apparently think it is, if the access pattern fits the problem and other constraints are not in play.

> It's amazing how much we've brainwashed folks to focus on algorithms and lose sight of how to actually properly optimize code. Being aware of how your code interacts with cache is incredibly important.

By the time you start worrying about cache locality you have left general algorithmic concerns far behind. Yes, it's important to recognize the problem, but for most programs, most of the time, that kind of problem simply doesn't appear.

It also doesn't pay to be dogmatic about rules, which is probably the core of your complaint, although unstated. You need to know them, and then you need to know when to break them.

jstimpfle25d ago

Most code most people work on isn't about algorithms at all. The most straightforward algorithm will do. Maybe put some clever data structure somewhere in the core.But for the vast majority of code, there isn't any clear algorithmic improvement, and even if there was, it wouldn't make a difference for the typically small workloads that most pieces of code are processing.

I'll take it back a little bit, because there _is_ in fact a lot of algorithmically inefficient code out there, which slows down everything a lot. But after getting the most obvious algorithmic problems out of the way -- even a log-n algorithm isn't much of an improvement to a linear scan, if n < 1000. It's much more important to get that 100+x speedup by implementing the algorithm in a straightforward and cache friendly way.

surajrmal25d ago

My core complaint is that folks repeat best practices without understanding them. It's simple to provide API semantics that appear like a map without resorting to using hashmap. I fear python style development has warped people's perception for the sake of simplifying the lives of developers. And all users end up suffering as a result.

forrestthewoods26d ago

They died because everyone knows that Python is infact very very slow. And that’s just totally fine for a vast number of glue operations.

It’s amusing you call Git fast. It’s notoriously problematic for large repos such that virtually every BigTech company has made a custom rewrite at some point or another!

jstimpfle26d ago

Now that is interesting too, because git is very fast for all I have ever done. It may not scale to Google monorepo size, it would ve the wrong tool for that. But if you are talking Linux kernel source scale, it asolutely, is fast enough even for that.

For everything I've ever done, git was practically instant (except network IO of course). It's one of the fastest and most reliable tools I know. If it isn't fast for you, chances are you are on a slow Windows filesysrem additionally impeded by a Virus scanner.

forrestthewoods26d ago

The fact that Git has an extremely strong preference for storing full and complete history on every machine is a major annoyance! “Except for network IO” is not a valid excuse imho. Cloning the Linux kernel should take only a few seconds. It does not. This is slow and bad.

The mere fact that Git is unable to handle large binary files makes it an unusable tool for literally every project I have ever worked on in my entire career.

jstimpfle26d ago

git clone --bare --depth=1 https://github.com/torvalds/linux

Takes 21 seconds on my work laptop, indeed a corporate Windows laptop with antivirus installed. Majority of that time is simply network I/O. The cloned repository is 276 MB large.

Actually checking the kernel out takes 90 seconds. This amounts to creating 99195 individual files, totaling 2 GB of data. Expect this to be ~10 times faster on a Linux file system.

So what's your problem?

1 more reply

spockz26d ago

Git-lfs exists for a while now. Does that fix your issue? Or do you mean that it doesn’t support binary diffs?

1 more reply

pabs326d ago

Git handles large text files and large directories fairly poorly too.

j / k navigate · click thread line to collapse

0 comments

jmalicki26d ago

Every time I've rewritten something from Python into Java, Scala, or Rust it has gotten around ~30x faster. Plus, now I can multithread too for even more speedups.

I love it, but python is almost uniquely a slow language.

It's the difference between a program running overnight vs. in 30 seconds. And if there are problems, the iteration speed from that is huge.

eru26d ago

> [...], it is a uniquely horrible slow language.

jmalicki26d ago

Fair! I guess I didn't mean language as such, but as used.

dtech26d ago

Javascript has a lot of the same theoretical dynamism, yet V8 and WebkitCore were able to make it fast

1 more reply

mike_hearn26d ago

xmcqdpt226d ago

You don't even need to go all V8, you could just build something like LuaJIT and get most of the way there. LuaJIT is like 10k LOCs and V8 is 3M LOC.

byroot26d ago

> every method call is a string dict lookup

Doesn't the Python VM have inline caches? [0]

https://en.wikipedia.org/wiki/Inline_caching

jmalicki26d ago

I think that's a new thing from like python 3.12+ or something after I stopped using Python as much.

It didn't used to.

EDIT: python 3.11+: https://peps.python.org/pep-0659/

kuschku26d ago

I've rewritten a python tool in go, 1:1. And that turned something that was so slow that it was basically a toy, into something so fast that it became not just usable, but an essential asset.

Later on I also changed some of the algorithms to faster ones, but their impact was much lower than the language change.

bonesss26d ago

Which is only to say: that rewrite away from python story can also work to show python doing its job. Risk reduction, scaffolding, MVP validation.

Diggsey26d ago

> git was originally implemented as a handful of low level binaries stitched together with shell scripts.

A bunch of low level binaries stitched together with shell scripts is a lot faster than python, so not really sure what the point of this comparison is.

eru26d ago

Well, you can and people do use Python to stitch together low level C code. In that sense, you could go the early git approach, but use Python instead of shell as the glue.

saghm26d ago

Their point was that by offloading the bottlenecks to C, you've essentially conceded that Python isn't fast enough for them, which was the original point made above

eru26d ago

Fair point!

eru26d ago

> Better algorithms win over 'better' programming languages every single time.

That's often true, but not "every single time".

20k26d ago

Python is by far the slowest programming language, an order of magnitude slower than other languages

One of the reason mercurial lost the dvcs battle is because of its performance - even the mercurial folks admitted that was at least in part because of python

ragall26d ago

> I thought those had died long ago.

No, it's always been true. It's just that at some point people got bored and tired of pointing it out.

bmitc26d ago

You barely have to try to have Python be noticeably slow. It's the only language I have ever used where I was even aware that a programming language could be slow.

ezst26d ago

> Are we back to "programming language X is slow" assertions? thought those had died long ago.

saghm26d ago

> You could reimplement it in Python and I doubt it would see any significant slowness

I doubt it wouldn't be significantly slower. I can't disprove it's possible to do this but it's totally possible for you to prove your claim, so I'd argue that the ball is in your court.

surajrmal26d ago

inejge26d ago

> You must belong to the club of folks who use hashmaps to store 100 objects.

It also doesn't pay to be dogmatic about rules, which is probably the core of your complaint, although unstated. You need to know them, and then you need to know when to break them.

jstimpfle25d ago

surajrmal25d ago

forrestthewoods26d ago

They died because everyone knows that Python is infact very very slow. And that’s just totally fine for a vast number of glue operations.

It’s amusing you call Git fast. It’s notoriously problematic for large repos such that virtually every BigTech company has made a custom rewrite at some point or another!

jstimpfle26d ago

forrestthewoods26d ago

The mere fact that Git is unable to handle large binary files makes it an unusable tool for literally every project I have ever worked on in my entire career.

jstimpfle26d ago

git clone --bare --depth=1 https://github.com/torvalds/linux

Takes 21 seconds on my work laptop, indeed a corporate Windows laptop with antivirus installed. Majority of that time is simply network I/O. The cloned repository is 276 MB large.

Actually checking the kernel out takes 90 seconds. This amounts to creating 99195 individual files, totaling 2 GB of data. Expect this to be ~10 times faster on a Linux file system.

So what's your problem?

1 more reply

spockz26d ago

Git-lfs exists for a while now. Does that fix your issue? Or do you mean that it doesn’t support binary diffs?

1 more reply

pabs326d ago

Git handles large text files and large directories fairly poorly too.

j / k navigate · click thread line to collapse