Are we back to "programming language X is slow" assertions? I thought those had died long ago.
Better algorithms win over 'better' programming languages every single time. Git is really simple and efficient. You could reimplement it in Python and I doubt it would see any significant slowness. Heck, git was originally implemented as a handful of low level binaries stitched together with shell scripts.
Python is absurdly slow - every method call is a string dict lookup (slots are way underused), everything is all dicts all the time, the bytecode doesn't specialize at all to observed types, it is a uniquely horrible slow language.
I love it, but python is almost uniquely a slow language.
Algorithms matter, but if you have good algorithms, or you're already linear time and just have a ton of data, rewriting something from a single-threaded Python program to a multithreaded rust program I've seen 500x speedups, where the algorithms were not improved at all.
It's the difference between a program running overnight vs. in 30 seconds. And if there are problems, the iteration speed from that is huge.
To be fair, Python as implement today is horribly slow. You could leave the language the same but apply all the tricks and heroic efforts they used to make JavaScript fast. The language would be the same, but the implementations would be faster.
Of course, in practice the available implementations are very much part of the language and its ecosystems; especially for a language like Python which is so defined by its dominant implementation of CPython.
But a lot of the monkey-patching kind of things and dynamism of python also means a lot of those sorts of things have to be re-checked often for correctness, so it does take a ton of optimizations off the table. (Of course, those are rare corner cases, so compilers like pypy have been able to optimize for the "happy case" and have a slow fall-back path - but pypy had a ton of incompatibility issues and now seems to be dying).
The real reason is that it is a deliberate choice by the CPython project to prefer extensibility and maintainability to performance. The result is that python is a much more hackable language, with much better C interop than V8 or JVM.
Doesn't the Python VM have inline caches? [0]
It didn't used to.
EDIT: python 3.11+: https://peps.python.org/pep-0659/
Later on I also changed some of the algorithms to faster ones, but their impact was much lower than the language change.
Which is only to say: that rewrite away from python story can also work to show python doing its job. Risk reduction, scaffolding, MVP validation.
A bunch of low level binaries stitched together with shell scripts is a lot faster than python, so not really sure what the point of this comparison is.
Python is an extremely versatile language, but if what you're doing is computing hashes and diffs, and generally doing entirely CPU-bound work, then it's objectively the wrong tool, unless you can delegate that to a fast, native kernel, in which case you're not actually using Python anymore.
That's often true, but not "every single time".
One of the reason mercurial lost the dvcs battle is because of its performance - even the mercurial folks admitted that was at least in part because of python
No, it's always been true. It's just that at some point people got bored and tired of pointing it out.
Yes we are? The slow paths of mercurial have been rewritten in C (and more recently in Rust) and improved the perf story substantially, without taking away from the wild modularity and extensibility hg always had.
I doubt it wouldn't be significantly slower. I can't disprove it's possible to do this but it's totally possible for you to prove your claim, so I'd argue that the ball is in your court.
The reason that some more modern tools, like jj, really blow git out of the water in terms of performance is because they make good choices, such as doing a lot of transformations entirely in memory rather than via the filesystem. It's also because it's written in a language that can execute efficiently. Luckily, it's clear that modern tools like jj are heavily inspired by mercurial so we're not doomed to the ux and performance git binds us with.
Apparently I belong to the same club -- when I'm writing AWK scripts. (Arrays are hashmaps in a trenchcoat there.) Using hashmaps is not necessarily an indictment you apparently think it is, if the access pattern fits the problem and other constraints are not in play.
> It's amazing how much we've brainwashed folks to focus on algorithms and lose sight of how to actually properly optimize code. Being aware of how your code interacts with cache is incredibly important.
By the time you start worrying about cache locality you have left general algorithmic concerns far behind. Yes, it's important to recognize the problem, but for most programs, most of the time, that kind of problem simply doesn't appear.
It also doesn't pay to be dogmatic about rules, which is probably the core of your complaint, although unstated. You need to know them, and then you need to know when to break them.
I'll take it back a little bit, because there _is_ in fact a lot of algorithmically inefficient code out there, which slows down everything a lot. But after getting the most obvious algorithmic problems out of the way -- even a log-n algorithm isn't much of an improvement to a linear scan, if n < 1000. It's much more important to get that 100+x speedup by implementing the algorithm in a straightforward and cache friendly way.
It’s amusing you call Git fast. It’s notoriously problematic for large repos such that virtually every BigTech company has made a custom rewrite at some point or another!
For everything I've ever done, git was practically instant (except network IO of course). It's one of the fastest and most reliable tools I know. If it isn't fast for you, chances are you are on a slow Windows filesysrem additionally impeded by a Virus scanner.
The mere fact that Git is unable to handle large binary files makes it an unusable tool for literally every project I have ever worked on in my entire career.
Takes 21 seconds on my work laptop, indeed a corporate Windows laptop with antivirus installed. Majority of that time is simply network I/O. The cloned repository is 276 MB large.
Actually checking the kernel out takes 90 seconds. This amounts to creating 99195 individual files, totaling 2 GB of data. Expect this to be ~10 times faster on a Linux file system.
So what's your problem?