And here is a table representation of all benchmarks and the geomean and median overall results: http://software.rochus-keller.ch/awfy-bun-summary.ods
The implementation of the same benchmark suite runs around factor 2.4 (geomean) faster on JDK8 than on GraalPython EE 22.3 Hotspot, or 41 times faster than CPython 3.11. GraalPython is thus about 17 times faster than CPython, and about two times faster than PyPy. The Graal Enterprise Edition (EE) seem to be factor 1.31 faster than the Community Edition (CE).
My limited experience was that on re-heavy workload pypy is several times slower than cpython (~3x compared to 3.10) and graal is even worse (~6x compared to 3.11).
Performance does indeed depend on workload. There's a page that compares GraalPy vs CPython and Jython on the Python Performance Suite which aims to be "real world":
https://www.graalvm.org/latest/reference-manual/python/Perfo...
There the speedup is smaller, but this is partly because a lot of real world Python workloads these days spend all their time inside C or the GPU. Having a better implementation is still a good idea though, because it means more stuff can be done by researchers who don't know C++ well or at all. The point at which you're forced to get dedicated hackers involved to optimize gets pushed backwards if you can rely on a good JIT.
24.1. 23 may or may not have been worse, I didn’t take specific notes aside from “too slow to be acceptable”
This is especially important for scripting languages like Python, where a large part of the features are implemented in C or other native languages and called via FFI. That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.
That sounds like the exact opposite of what I would want as a user of the language: the benchmark completely abstracts the actual behaviour of the runtime, claiming purported gains which don’t come anywhere near manifesting when trying to run actual software.
I’m not implementing my own collections when `dict` suffices, and I don’t really care that a pure python version of `re` runs faster in graal than in cpython, because I’m not using that.
So what happens is I see claims that graalpython runs 17 times faster than cpython, I try it out, it runs 6 times slower instead, and I can only conclude that graal is a worthless pile of lies and I should stop caring.
- Maturin doesn't support the graal interpreter, so no Py03 packages
- uv doesn't seem to run, as `fork` and `execve` are missing from the os package?
- Graal seems to have a huge number of patches to popular libraries so that they'll run, most seem to be of the form that patch c files to add additional IFDEFs
I don't think Graal is going to be a viable target for large projects with a huge set of dependencies unfortunately, as the risk of not being able to upgrade to different versions or add newer dependencies is going to be too high.It's impressive what it does seem to support though, and probably worth looking at if you have a smaller scale project.
https://github.com/oracle/graalpython/blob/b907353de1b72a14e...
- self.cython_always = False
+ self.cython_always = True
That's the entire patch. Others are working around bugs in the C extensions themselves that a different implementation happens to expose, and can be upstreamed:https://github.com/oracle/graalpython/blob/b907353de1b72a14e...
Still others exist for old module versions, but are now obsolete:
https://github.com/oracle/graalpython/blob/b907353de1b72a14e...
# None of the patches are needed since 43.0, the pyo3 patches have been upstreamed
And finally, some are just general portability improvements. Fork doesn't exist on Windows. Often it can be replaced with just starting a sub-process.So the patching situation has been getting much better over time, partly due to the GraalPy team actively getting involved with and improving the Python ecosystem as a whole.
It is fair to say that large projects with a huge set of dependencies will likely face some compatibility issues, but we're working on ironing this out. There is GraalPy support in setup-python GitHub action. GraalPy is supported in the manylinux image [3]. Hopefully soon also in cibuildwheel [4].
[0] https://github.com/PyO3/maturin/pull/1645 (merged)
[1] https://github.com/PyO3/pyo3/pull/3247 (merged)
[2] https://github.com/pydantic/jiter/pull/135 (merged)
[3] https://github.com/pypa/manylinux/pull/1520 (merged)
It is a chicken (interpreter) and egg (dependencies) problem. You cannot fix the dependency problems without the interpreter. Neither can you release an interpreter with full dependency support.
So it does have to do with scale but in the opposite direction. Big long projects will want to adopt something like GraalPy because of how long the project will take.
https://www.graalvm.org/dev/reference-manual/python/Native-E...
> CPython provides a native extensions API for writing Python extensions in C/C++. GraalPy provides experimental support for this API, which allows many packages like NumPy and PyTorch to work well for many use cases. The support extends only to the API, not the binary interface (ABI), so extensions built for CPython are not binary compatible with GraalPy. Packages that use the native API must be built and installed with GraalPy, and the prebuilt wheels for CPython from pypi.org cannot be used. For best results, it is crucial that you only use the pip command that comes preinstalled in GraalPy virtualenvs to install packages. The version of pip shipped with GraalPy applies additional patches to packages upon installation to fix known compatibility issues and it is preconfigured to use an additional repository from graalvm.org where we publish a selection of prebuilt wheels for GraalPy. Please do not update pip or use alternative tools such as uv.
We also want to make it easy for Python package maintainers to test and build wheels for GraalPy. It's already available via setup-python, and we are adding GraalPy support to cibuildwheel. If you need any help, please reach out to us!
0: https://devguide.python.org/developer-workflow/c-api/#limite...
Currently it is a mix and match of an herculean engineering effort mostly ignored by the community (PyPy), DSLs for GPGPUs, bunch of C and C++ libraries that people keep referring to as "Python" when any language can have similar bindings, jython, IronPython, GraalPy,...
So it isn't for lack of trying, at least we finally have CPython folks more welcoming to performance improvements, and JITs.
So you gain the perf of a JIT, while losing out on most everything else high-performance in the Python ecosystem.
You can have your cake and eat it too https://github.com/hylang/hy
Add to that the existing excellent ecosystem, the strong culture of scientific stacks and a very good story for providing c-extentions (actually the best one in all scripting languages because of things like cibuildwheel).
It's only in small tech bubbles like HN that devs find it surprising.
The tooling has markedly improved though. Things like typing and compile time checks, great. But its also funny to me that some of the fastest tools for python are being built in rust (eg uv).
As I said it’s anecdotal, but in my experience Python gets a lot of love compared to something like Java or C#. Both of which are often met with real harshness. Hell I’ve ranted unseemly about C# myself.
Finally thanks to data science, and people getting fed up with always writing bindings, this is changing, and Python can join Common Lisp, Scheme, Smalltalk, SELF, JavaScript, Ruby, Lua, Dylan, Julia, BASIC club.
It is a great language in many ways, that's why its shortcomings are so painful.
Perl was the leading tool for scripting and text parsing. Python didn’t really supplant it for a long time — until people started writing more complicated scripts that had to be maintained. Perl reads like line noise after 6 months whereas I can look at Python code from 20 years ago, prettify it with black, and understand it.
Python got picked up by the scientific computing community, which gave it some its earliest libraries like numpy, f2py, scipy. Some of us who were on MATLAB moved over.
Then data science happened. Pandas built off the scientific computation foundations and eventually libraries like scikit and matplotlib (mimicking matlab’s plotting) came along.
Then tensorflow came along and built on the foundation of numerical libraries. PyTorch followed.
Other systems like Django came and made python popular for building database backed websites.
Suddenly there was momentum and today almost all numerical software have a python API — this includes proprietary stuff like CPLEX and what have you.
Python was the glue language that had the lowest barrier of entry. For instance, Spark was written in Scala and has a performant Scala API but everyone uses PySpark because it’s much more accessible, despite the interop cost.
The counterfactual to all this was Ruby. It had much nicer syntax than Python but when I tried to use it in grad school I was quickly stymied by the lack of numerical libraries. Ruby never found a niche outside of Rails and config management.
Essentially Python — like Nvidia today — bet on linear algebra (and more broadly on data processing) and won.
I get why there’s hate for Python — it’s not a perfect language. Yet those of us pragmatists who use it understand the trade offs. You trade off on the metal performance for programmer performance. You trade off packaging difficulties for something that works. You trade off an imperfect syntax for getting things done.
I could have used Ruby — a much more beautiful lanaguage — in grad school and worked around its lacks, but I would have not graduated on time. Python was pragmatic choice for me and continues to be one for me today (outside of situations requiring raw performance)
- Imports in Ruby seriously suck compared to Python. Everything requires into a global scope and an ecosystem like bundler which encourages centralizing all imports for your entire codebase into one file.
- Python has docstrings encouraging in code documentation.
Add common ecosystem things like the Ruby community encouraging generated methods, magical "do what I mean" parameters, and REPL poke-driven development, and this leads to the effect that Python codebases are almost always well documented and easy to understand. You can tell where every symbol comes from, and you can usually find a documentation entry for every single method. It's not uncommon for a Ruby library, even a popular one, to be documented solely through a scattering of sparsely-explained examples with literally no real API documentation. Inheriting a long-lived Ruby project can be a serious ordeal just to discover where all the code that's running is running, why it's running, where things are preloaded into a builtin class, and with Rails and Railties, a Gem can auto insert behavior and Middleware just by existing, without ever being explicitly mentioned in any code or configs other than the Gemfile. It's an absolute headache.
My dream language would be Ruby with Python-style imports and docstrings.
I wrote Ruby when I got started because it was the most accessible and the Rails learning content was top notch. Now I use python when I need more than a few `bash` pipes to accomplish anything, but if I were to solve a capital-P Problem, of course the tool often chooses the project after constraints.
However due to being a interpreted scripting language I never bothered to use Python for anything beyond OS scripting.
Example see the Bioinformatics papers from that period, and the Perl tooling used alongside the research.
Already in 2003 CERN was using Python on some of their build infrastructure (see CMT), Grid Computing scripting efforts, and we had Python trainings available to us.
Now there is a difference between a REPL of sorts, scripting OS tasks, and going full blown applications with a pure interpreter.
More details about this particular release are in the blog post at https://medium.com/graalvm/whats-new-in-graal-languages-24-1...
Happy to answer any additional questions!
EDIT: I tried the native binary command here on a simple hello world script.
It downloaded some stuff in the background, built the entire python and java and embedded it into a 350 MB ELF binary on linux after 15 minutes of using 24 GB RAM and 100% CPU.
But I'd much prefer a smaller jar file which I can distribute cross-platform.
https://www.graalvm.org/uploads/quick-references/GraalPy_v1/...
Although GraalPy can create standalone applications [1], you don't have to turn your hello world script into a self-contained binary. You can, of course, create a JAR that depends on GraalPy, or a fat JAR that contains it, and deploy it just like any other Java application.
We are still updating our docs to mention more details on this and publish some guides, apologies for the delay.
[1] https://www.graalvm.org/latest/reference-manual/python/stand...
If you're into that sort of thing.
Self-interest disclosure: I'm a major contributor and heavy user.
There are certainly some leaky abstractions and there is a general expectation that you understand the quirks of Python and Clojure pretty well, so it's not for everyone. Knowing something about Java would probably help too but I've been using libpython-clj in production since 2017 years and I barely know anything about Java (compared to Python/Clojure).
Also, what's the dev workflow like? When I'm coding python I basically live inside the debugger (a.k.a the carmark method), do you use an IDE that understands both java and python? Whats the debugging experience like? Can you set a breakpoint and then evaluate python code and expressions inside the debugger like you can if it was just solely a python project using VSCode and the python debugger?
I recently moved a large ETL process that was mostly Python runtime processing to pyarrow/Polaris and wrote all the ETL logic in SQL. I've seen processes that used to take a week to run drop to about an hour (no exaggeration).
But having these in Graal would allow more types of applications to be deployed in JVM stacks. As sibling comments note, many data science models are in python but production stacks are in Java.
Ah...makes sense now. I was thinking along the lines of someone switching to the JVM for better performance, but being held back by the absence of those libraries.
YAML and JSON have both tried to replicate the XML tooling experience, only worse.
Schemas, comments, parsing and schema conversions tools.
https://jython.readthedocs.io/en/latest/Concurrency/#no-glob...
https://wiki.python.org/moin/IronPython
The difference between JVM, CLR and C in regards to parallel and concurrent code is that they are built for those kind of workloads, and have a memory model proper, hence not needing a GIL.
CPython built with --disable-gil does not have a GIL (as long as PYTHONGIL=0 and all loaded C extensions are built for --disable-gil mode) https://peps.python.org/pep-0703/#py-mod-gil-slot
"Intent to approve PEP 703: making the GIL optional" (2023) https://news.ycombinator.com/item?id=36913328#36917709 https://news.ycombinator.com/item?id=36913328#36921625
implementation("org.graalvm.polyglot:polyglot:24.1.0")
implementation("org.graalvm.polyglot:python:24.1.0")Additionally if XML isn't your thing Maven is making a push for other formats in Maven 4 like HOCON [1].
[0] https://blog.gradle.org/declarative-gradle-first-eap [1] https://github.com/apache/maven-hocon-extension
May he rest in peace.
Of course this was very time consuming and unrewarding, all because only java applications could be deployed to production due to a stupid top-down decision.
This GraalPy sounds like something I wish existed back then.
I'm using it for similar purposes as you stated and for that it works quite well. A research group I am collaborating with does a lot of their work in one Java application (ImageJ for microscopy), so by integrating my Python processing code into that application, it finds its way a lot quicker into the daily workflows of everyone in that group.
Most recently I've also extended the jep setup to include optional Python version bootstrapping via uv[1], so that I can be sure that the plugins I'm writing have the correct Python version available, without people having to install that manually on the machine.
GraalPy is much more active and more compatible.
Any other Java programs that want a scripting engine could use it as well.
Graal can do pretty advanced JIT-compilation for any Graal language, plus you can mix-and-match languages (with a big chunk of their ecosystems) and it will actually compile across language boundaries. And we haven’t even mentioned Java’s state of the art GCs that can run circles around any tracing GC, let alone the very low throughput reference counting.
Also, what is “messing with the JVM”? That’s like one of the most battle tested technologies out there, right next to the Linux kernel.
Everyone else has to write wrappers to interact with that blackbox. God forbid someone daring to even change the code, because it basically doesn't even need/use junit tests. Eventually the smart person gets bored and moves to something else, that tool then gets rewritten to Java in two days by someone else.
End of story.
Jython is still 2.x and it'd be nice to let my kid write a minecraft mod in python. Not a business use case but a use case.
Not sure if you were wanting Python specifically, but KubeJS lets you use JavaScript for mods. I think there's also a clojure integration.
https://www.graalvm.org/latest/reference-manual/embed-langua...
Thus, while swift and graal both depend on llvm, they use different variants and there's no real way to make inter-op between swift and graal (even using the llvm it which graal is said to be able to consume).
e.g., I believe this announcement represents the work to compile a python (3.11) and some proof-of-concept python packages using graal toolchain, to spur other packages to support the same.
So I'd really love to be wrong, but I believe building under the graal llvm is the common factor.
There is (active, 2K stars) https://github.com/pvieito/PythonKit and I've heard of people being able to deploy apps with python on the app store. YMMV.