Show HN: I built a hardware processor that runs Python (opens in new tab)

(runpyxl.com)

983 pointshwpythonner1y ago265 comments

Hi everyone, I built PyXL — a hardware processor that executes a custom assembly generated from Python programs, without using a traditional interpreter or virtual machine. It compiles Python -> CPython Bytecode -> Instruction set designed for direct hardware execution.

I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz).

The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025.

Demo and explanation here: https://runpyxl.com/gpio Happy to answer any questions

Show HN: I built a hardware processor that runs Python

(runpyxl.com)

983 pointshwpythonner1y ago265 comments

Demo and explanation here: https://runpyxl.com/gpio Happy to answer any questions

265 comments

212 comments · 72 top-level

obitsten1y ago· 17 in thread

Why is it not routine to "compile" Python? I understand that the interpreter is great for rapid iteration, cross compatibility, etc. But why is it accepted practice in the Python world to eschew all of the benefits of compilation by just dumping the "source" file in production?

cchianel1y ago

The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library). Without type annotations, there is very little for a non-JIT compiler to optimize, since:

- The vast majority of code generation would have to be dynamic dispatches, which would not be too different from CPython's bytecode.

- Types are dynamic; the methods on a type can change at runtime due to monkey patching. As a result, the compiler must be able to "recompile" a type at runtime (and thus, you cannot ship optimized target files).

- There are multiple ways every single operation in Python might be called; for instance `a.b` either does a __dict__ lookup or a descriptor lookup, and you don't know which method is used unless you know the type (and if that type is monkeypatched, then the method that called might change).

A JIT compiler might be able to optimize some of these cases (observing what is the actual type used), but a JIT compiler can use the source file/be included in the CPython interpreter.

hwpythonnerOP1y ago

You make a great point — type information is definitely a huge part of the challenge.

I'd add that even beyond types, late binding is fundamental to Python’s dynamism: Variables, functions, and classes are often only bound at runtime, and can be reassigned or modified dynamically.

So even if every object had a type annotation, you would still need to deal with names and behaviors changing during execution — which makes traditional static compilation very hard.

That’s why PyXL focuses more on efficient dynamic execution rather than trying to statically "lock down" Python like C++.

pjmlp1y ago

Solved by Smalltalk, Self, and Lisp JITs, that are in the genesis of JIT technology, some of it landed on Hotspot and V8.

2 more replies

Qem1y ago

> The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library).

When type annotations are available, it's already possible to compile Python to improve performance, using Mypyc. See for example https://blog.glyph.im/2022/04/you-should-compile-your-python...

Someone1y ago

Python doesn’t eschew all benefits of compilation. It is compiled, but to an intermediate byte code, not to native code, (somewhat) similar to the way java and C# compile to byte code.

Those, at runtime (and, nowadays, optionally also at compile time), convert that to native code. Python doesn’t; it runs a bytecode interpreter.

Reason Python doesn’t do that is a mix of lack of engineering resources, desire to keep the implementation fairly simple, and the requirement of backwards compatibility of C code calling into Python to manipulate Python objects.

jerf1y ago

If you define "compiling Python" as basically "taking what the interpreter would do but hard-coding the resulting CPU instructions executed instead of interpreting them", the answer is, you don't get very much performance improvement. Python's slowness is not in the interpreter loop. It's in all the things it is doing per Python opcode, most of which are already compiled C code.

If you define it as trying to compile Python in such a way that you would get the ability to do optimizations and get performance boosts and such, you end up at PyPy. However that comes with its own set of tradeoffs to get that performance. It can be a good set of tradeoffs for a lot of projects but it isn't "free" speedup.

jonathaneunice1y ago

A giant part of the cost of dynamic languages is memory access. It's not possible, in general, to know the type, size, layout, and semantics of values ahead of time. You also can't put "Python objects" or their components in registers like you can with C, C++, Rust, or Julia "objects." Gradual typing helps, and systems like Cython, RPython, PyPy etc. are able to narrow down and specialize segments of code for low-level optimization. But the highly flexible and dynamic nature of Python means that a lot of the work has to be done at runtime, reading from `dict` and similar dynamic in-memory structures. So you have large segments of code that are accessing RAM (often not even from caches, but genuine main memory, and often many times per operation). The associated IO-to-memory delays are HUGE compared to register access and computation more common to lower-level languages. That's irreducible if you want Python semantics (i.e. its flexibility and generality).

Optimized libraries (e.g. numpy, Pandas, Polars, lxml, ...) are the idiomatic way to speed up "the parts that don't need to be in pure Python." Python subsets and specializations (e.g. PyPy, Cython, Numba) fill in some more gaps. They often use much tighter, stricter memory packing to get their speedups.

For the most part, with the help of those lower-level accelerations, Python's fast enough. Those who don't find those optimizations enough tend to migrate to other languages/abstractions like Rust and Julia because you can't do full Python without the (high and constant) cost of memory access.

ModernMech1y ago

Part of the issue is the number of instructions Python has to go through to do useful work. Most of that is unwrapping values and making sure they're the right type to do the thing you want.

For example if you compile x + y in C, you'll get a few clean instructions that add the data types of x and y. But if you compile this thing in some sort of Python compiler it would essentially have to include the entire Python interpreter; because it can't know what x and y are at compile time, there necessarily has to be some runtime logic that is executed to unwrap values, determine which "add" to call, and so forth.

If you don't want to include the interpreter, then you'll have to add some sort of static type checker to Python, which is going to reduce the utility of the language and essentially bifurcate it into annotated code you can compile, and unannotated code that must remain interpreted at runtime that'll kill your overall performance anyway.

That's why projects like Mojo exist and go in a completely different direction. They are saying "we aren't going to even try to compile Python. Instead we will look like Python, and try to be compatible, but really we can't solve these ecosystem issues so we will create our own fast language that is completely different yet familiar enough to try to attract Python devs."

kragen1y ago

You don't need the whole Python interpreter to fall back to dynamic method dispatch for overloaded operators. CPython itself implements them with per-interface vtables for C extensions, very similar to Golang but laboriously constructed by hand.

For most code, you don't need static typing for most overloaded operators to get decent performance, either. From my experience with Ur-Scheme, even a simple prediction that most arithmetic is on (small) integers with a runtime typecheck and conditional jump before inlining the integer version of each arithmetic operation performs remarkably well—not competitive with C but several times faster than CPython. It costs you an extra conditional branch in the case where the type is something else, but you need that check anyway if you are going to have unboxed integers, and it's smallish compared to the call and return you'll need once you find the correct overload to call. (I didn't implement overloading in Ur-Scheme, just exiting with an error message.)

Even concatenating strings is slow enough that checking the tag bits to see if you are adding integers won't make it much slower.

Where this approach really falls down is choosing between integer and floating point math. (Also, you really don't want to box your floats.)

And of course inline caches and PICs are well-known techniques for handling this kind of thing efficiently. They originated in JIT compilers, but you can use them in AOT compilers too; Ian Piumarta showed that.

franga20001y ago

There's no benefit that I know of, besides maybe a tiny cold start boost (since the interpreter doesn't need to generate the bytecode first).

I have seen people do that for closed-source software that is distributed to end-users, because it makes reverse engineering and modding (a bit) more complicated.

Qem1y ago

Check Nuitka: https://nuitka.net/

hwpythonnerOP1y ago

There have been efforts (like Cython, Nuitka, PyPy’s JIT) to accelerate Python by compiling subsets or tracing execution — but none fully replace the standard dynamic model at least as far as I know.

wyldfire1y ago

For python, compilation means emitting some bytecode. And you could conceivably ship that bytecode *. But because it's so terribly dynamic of a language, virtually nothing is bound to anything until you execute this particular line. "What code does this function call resolve to?" -- we'll find out when we get there. "What type does this local use?" -- we'll find out when we get there.

Even type annotations would have to be anointed with semantics, which (IIUC) they have none today (w/CPython AFAIK). They are just annotations for use by static checkers.

Unless you can perform optimizations, the compilation can't make a whole bunch of progress beyond that bytecode.

* In fact, IIRC there was/is some "freeze" program that would do just that: compile your python program. Under the covers it would bundle libpython with your *.pyc bytecode.

dragonwriter1y ago

> Why is it not routine to "compile" Python?

Where’s the AOT compiler that handles the whole Python language?

It’s not routine because its not even an option, and people who are concerned either use the tools that let them compile a subset of Python within a larger, otherwise-interpreted program, or use a different language.

f1shy1y ago

AFAIK, one reason is that if you use "eval()" anywhere you need already a whole python compiler shipped with your program. So, compile is not different as shipping the code with the interpreter.

seanw4441y ago

It's called Nim.

archargelod1y ago

Comparing Nim to compiled Python is almost insulting.

Smaller binaries, faster execution, proper metaprogramming, actual type safety, and you don't need to bundle a whole interpreter just to say "hello world"

1 more reply

rkagerer1y ago· 16 in thread

Back when C# came out, I thought for sure someone would make a processor that would natively execute .Net bytecode. Glad to see it finally happened for some language.

kcb1y ago

For Java, this was around for a bit https://en.wikipedia.org/wiki/Jazelle.

monocasa1y ago

Even better was a complete system rather than a mode for arm processors that ran a subset of the common jvm opcodes.

https://en.wikipedia.org/wiki/PicoJava

varispeed1y ago

Didn't some phones have hardware Java execution or does my memory fail me?

2 more replies

jiehong1y ago

Java got that with smart cards for example. Cute oddities of the past

monocasa1y ago

JavaCard was just implemented as just a regular interpreter last time I checked.

supportengineer1y ago

Does anyone remember the JavaOne ring giveaway?

https://news.ycombinator.com/item?id=8598037

zahlman1y ago

In university, for my undergrad thesis, I wanted to do this for a Befunge variant (choosing the character set to simplify instruction decoding). My supervisor insisted on something more practical, though. :(

zahlman1y ago

I probably should have added a link: https://esolangs.org/wiki/Befunge

The main thing that appealed to me about this idea is that it would require a two-dimensional program counter. As I recall from the original specification, skipping through blank space is supposed to take O(1) time, but I didn't plan on implementing that. I did, however, imagine a machine with 256x256 bytes of memory, where some 80x25 (or 24?) region was reserved as directly memory-mapped to a character display (and protected at boot by surrounding it with jump instructions).

ComputerGuru1y ago

I want to say there was a product that did this circa 2006-2008 but all I’m finding is the .NET Micro Framework and its modern successor the .NET nano Framework.

I’ve been using .NET since 2001 so maybe I have it confused with something else, but at the same time a lot of the web from that era is just gone, so it’s possible something like this did exist but didn’t gain any traction and is now lost to the ether.

duskwuff1y ago

There was Netduino, but that was a STM32 microcontroller running an interpreter, not dedicated hardware which directly executed CLR code.

rcorrear1y ago

Maybe you’re thinking of Singularity OS?

john-h-k1y ago

The tl;dr (I spent lots of time investigating this) is that it just fundamentally isn’t a good bytecode for execution. It’s designed to be small on disk, not hardware friendly

whoomp123421y ago

I'd be surprised if azure app services didn't do this already.

john-h-k1y ago

I’d be willing to bet my net worth that they don’t

1 more reply

actionfromafar1y ago

Wouldn't that be a real scoop?

bongodongobob1y ago

Azure runs on Linux if I'm not mistaken.

1 more reply

zik1y ago· 14 in thread

This is a very cool project but I feel like the claim is overstated: "PyXL is a custom hardware processor that executes Python directly — no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon."

Reading further down the page it says you have to compile the python code using CPython, then generate binary code for its custom ISA. That's neat, but it doesn't "execute python directly" - it runs compiled binaries just like any other CPU. You'd use the same process to compile for x86, for example. It certainly doesn't "take regular python code and run it in silicon" as claimed.

A more realistic claim would be "A processor with a custom architecture designed to support python".

goranmoomin1y ago

Not related to the project in any way, but I would say that if the hardware is running on CPython bytecode, I’d say that’s as far as it can get for executing Python directly – AFAIK running python code with the `python3` executable also compiles Python code into bytecode `*.pyc` files before it runs it. I don’t think anyone claims that CPython is not running Python code directly…

hamandcheese1y ago

I agree with you, if it ran pyc code directly I would be okay saying it "runs python".

However it doesn't seem like it does, the pyc still had to be further processed into machine code. So I also agree with the parent comment that this seems a bit misleading.

I could be convinced that that native code is sufficiently close to pyc that I don't feel misled. Would it be possible to write a boot loader which converts pyc to machine code at boot? If not, why not?

f1shy1y ago

Well it really does not run CPython, but CPython bytecode, compiled down to an assembler. Granted, a very specific, tailored assembler, but still.

Anyway, the project is mega-cool, and very useful (in some specific applications). Is just that the title is a little bit confusing.

hwpythonnerOP1y ago

Fair point if you're looking at it through a strict compiler-theory lens, but just to clarify—when I say "runs Python directly," I mean there is no virtual machine or interpreter loop involved. The processor executes logic derived from Python ByteCode instructions.

What gets executed is a direct mapping of Python semantics to hardware. In that sense, this is more “direct” than most systems running Python.

This phrasing is about conveying the architectural distinction: Python logic executed natively in hardware, not interpreted in software.

franzb1y ago

Wouldn't an AoT Python-to-x86 compiler lead to a similar situation where the x86 processor would "run Python directly"?

_kidlike1y ago

After a quick search I found that even Raspberry makes the same claim...

"runs directly on embedded hardware"

https://www.raspberrypi.com/documentation/microcontrollers/m...

I don't understand why they have the need to do this...

rcxdude1y ago

Micropython does run directly on the hardware, though. It's a bare-metal binary, no OS. Which is a different claim to running the python code you give it 'directly'.

f1shy1y ago

Well, runing python on Raspian, you could toggle a pin at maximum a couple of KHz, not near the 2 MHz you can do with this project. Also it claims predictability, so I assume the time jitter is much less, which is a very important parameter for real time applications.

hwpythonnerOP1y ago

PyXL is a bit more direct :)

dividuum1y ago

Huh? MicroPython literally does exactly that: You copy over Python source(!) code and it runs on the Pico.

wormius1y ago

Yeah that was my first thing. Wait a minute you run a compiler on it? It's literally compiled code, not direct. Which is fine, but yeah, overselling what it is/does.

Still cool, but I would definitely ease back the first claim.

I was going to say it does make me wonder how much a pain a direct processor like this would be in terms of having to constantly update it to adapt to the new syntax/semantics everytime there's a new release.

Also - are there any processors made to mimic ASTs directly? I figure a Lisp machine does something like that, but not quite... Though I've never even thought to look at how that worked on the hardware side.

EDIT: I'm not sure AST is the correct concept, exactly, but something akin to that... Like building a physical structure of the tree and process it like an interpreter would. I think something like that would require like a real-time self-programming FPGA?

hwpythonnerOP1y ago

PyXL deliberately avoids tying itself to Python’s high-level syntax or rapid surface changes.

The system compiles Python source to CPython ByteCode, and then from ByteCode to a hardware-friendly instruction set. Since it builds on ByteCode—not raw syntax—it’s largely insulated from most language-level changes. The ByteCode spec evolves slowly, and updates typically mean handling a few new opcodes in the compiler, not reworking the hardware.

Long-term, the hardware ISA is designed to remain fixed, with most future updates handled entirely in the toolchain. That separation ensures PyXL can evolve with Python without needing silicon changes.

BiteCode_dev1y ago

Which is what nuitka does. But the result doesn't allow for real time python programs, andy you don't get direct access to the hardware like here.

rytill1y ago

The phrasing “<statement> — no X, Y, Z, just <final simplified claim>” is cropping up a lot lately.

4o also ends many of its messages that way. It has to be related.

JadoJodo1y ago· 14 in thread

I'd like to invite any Python devs to go on a tangent with me:

Can you give me the scoop on Python, the language? I see things like this project, and it seems very impressive, but being an outsider to the language, I don't "get" it. More specifically: I'm curious to hear thoughts on a) what made this difficult prior to now (with Python), b) why Python is useful for this, and c) what are your thoughts on Python itself?

To add some more context:

I know a lot of developers who work with Python (Flask); Some love it, some hate it (as with any language). My experience has been mainly via homelab/OSS tools that all seem to embrace the language. And yet while the language itself seems very straight forward and easy to use, my experience with the Python _ecosystem_ (again, as an outsider) has been... difficult.

Python 2 vs 3, virtual environments, libraries for each version, etc. It feels as though anytime I've had to use it outside a pre-built Docker container, these issues result in throwing spaghetti at the wall trying to figure out how to even get it working at all. As a PHP/Go dev, it's one of the languages for which I could see myself having a real interest, but this has so far made me hesitant (and I don't want to be).

spprashant1y ago

The gist is that basic Python at its very core is -

a) simple b) limited

The language really took off when developers took this simple limited language and pushed it to its very limits using C extensions. The data science explosion opened up the language to a very wide user base.

So to answer your 3 questions: a) Python is not a fast language by any means. There is a lot of overhead in every function call that makes it almost impossible for low latency/real-time use cases. b) I don't think Python is particularly the best language for this. This is just a demonstration of someone building their own custom toolchain to show what is possible with just pure Python. The author has highlighted why they think this is interesting on the website. c) I keep thinking Python will go away soon, and we will see a much better alternative. But the reality is Python is entrenched deeply just like JavaScript. Lot of smart people are putting in a lot of effort to make it better. Personally the ecosystem and packaging story does not annoy me much, but the lack of proper threading (GIL) has hurt my projects more than once.

For your particular pain point, the current community recommended solution is to use uv (https://github.com/astral-sh/uv). There were several detours (pip, pyenv, pipenv, poetry etc.) the community took before they got behind this.

miohtama1y ago

Before data science Python was already heavily used in web backend e.g. Instagram, others.

1 more reply

PaulHoule1y ago

My impression was that if you had a problem with Python and then added Docker now you have two problems. I worked at one place where the data sci's had an amazing ability to find defective Pythons.

Python is going in the right directions in terms of all the deployability and big issues but it should have been where it is now 7 years ago. Specifically, I sketched out a system that worked like uv but was written in pure Python, I didn't start on it for two reasons: (a) the bootstrapping problem that I couldn't ever stop devs from trashing the Python that it runs in, and (b) from lots of trying it didn't seem possible to convince most Pythoners that pip was broken or that it mattered... uv solved (a) by removing Python from the bootstrap and (b) by being crazy fast.

Sohcahtoa821y ago

> Python 2 vs 3

This should not be an issue this day and age. Python 2 shouldn't be used for anything. If you're trying to do something that only works in Python 2, then you're likely doing something very wrong, likely reading for very out-dated material.

> virtual environments

Also shouldn't be an issue today. Ideally, every project has its own virtual environment. Packages should not be installed globally unless managed by your operating system, not pip or other Python package manager.

> libraries for each version

Rarely an issue, but I've certainly run into it, but only with Torch and other AI/ML libraries, all of which are very cutting-edge. The solution usually is to make sure everything is up to date, especially your operating system. If you're on Ubuntu 18.04, you're gonna have a bad time with something that requires Python 3.11 to work.

Python has its warts, sure, but I like it because it's so easy to get stuff done in. It's slow, yes, but that's rarely an issue. I find the syntax the most easy to read of any language I've ever worked with.

VWWHFSfQ1y ago

Python is just brutally slow. Anything performance-sensitive has to be done with a native module and now that requires all the same compilation and build tooling that everything else does.

The ecosystem is massive and the core team just keeps adding more and more dubious language features and syntax.

Realistically, Python should have been "done" after async/await and fixing str vs bytes.

__MatrixMan__1y ago

There are parts of python that chafe, but if I switch to a language which has solved those problems, the set of people I can help falls to... very small. These are people we fought tooth and nail to drag away from excel, we're not going to get them all the way to haskell.

em3rgent0rdr1y ago

b: while Python is not a high-performance language, python coding is easier than high-performance languages. And programmer time is valuable. But if after coding a project in python, the developer may then find that they need higher performance than what interpreted python offers, and thus might be tempted to redo their program in a high-performance language. But a non-interpreted python processor provides a more appealing alternative to just spend money on an FPGA (or in the future maybe even an ASIC) python co-processor which may be fast enough, rather than wasting programmer time porting their python code to a high-performance language.

whatnow373731y ago

Old-timer here, used Python for about ten years professionally (Go now).

c) It’s a monstrous dumpster fire and getting worse over time, but so is everything else (in the same space). I like Go, but I can see how it’s not for everyone.

TheFlyingFish1y ago

I've used Python a lot over the last ~10 years. It's probably my favorite language, although I'm not immune to its weak points.

To answer your questions in order,

a) I haven't done much work with embedded Python, but like any dynamically-typed language that runs in a VM there's a lot of runtime infrastructure that adds latency, complexity, energy consumption, bundle size, etc. It sounds like this project aims to remove the vast majority of that. So take startup time, for instance: Normal Python takes ~50ms to fire up the interpreter and get into actual user code. If I'm understanding it correctly, with PyXL that would be vastly lower. Although I guess the ARM chip still has to load the code onto the FPGA, so maybe not, idk.

b) and c) are kind of the same question, to me - at least, "why use Python for embedded" is a subset of "why use Python at all."

For me, Python more than any other language is great at getting out of its own way, so that you can spend your precious brain energy on whatever problem you're solving and less on the tool you're using to solve it. This is maybe less true in recent years, as later Pythons have added a lot more complex features (like async/await, for instance, which I actually really like in Python but definitely adds complexity to the language).

Finally, I think a lot of it comes down to personal style/taste/chance (i.e. if Python is the first language you encounter, you're probably more likely to end up liking Python.) The Zen of Python[0], which you may have seen, does a good job of explaining the Python way of approaching problems, although like I said a few of those principles have been less-rigidly adhered to in recent years (like "there should be only one way to do it.")

If you hang out in Python circles, you'll probably come across the phrase "Python fits your brain." I'm not sure where it was originally coined but it very definitely describes my experience with Python: it (mostly) just works like I expect it to, whether that's with regard to syntax, semantics, stdlib, etc.

Not that it doesn't have its bad points, of course. Dependency management, as you mentioned, can be a bit hellish at times. A lot of it comes down to the fact that dependencies in Python were originally conceived as systemwide state, much like dynamically-loaded C libs on Linux. This works fine until you need to use two different, mutually-incompatible versions of the same lib, at which point all hell breaks loose. There have been various attempts to improve on this more recently, so far uv[1] looks pretty promising, but time will tell.

The one saving grace of Python dependencies is that it has a very rich standard library, so the average Python project tends to have way fewer total dependencies than the average project in, say, JS or Rust.

The typing story for Python is also a bit lacking. Yes, there are now optional type hints and things like MyPy to make use of them, but even if your own code is all completely typed, in my experience it's usually not long before you need to call out to something that isn't well-typed and then your whole house of cards starts to fall apart.

Anyway, just my rambling $0.02.

[0] https://peps.python.org/pep-0020/

JadoJodo1y ago

Not all rambling, but the exact kind of input I was hoping for. Thank you!

carabiner1y ago

This just seems like a complaint about python package management disguised as a question (aka concern trolling). Yes it's bad. No, it probably won't be improved any time soon.

JadoJodo1y ago

That wasn't my intention at all, but I appreciate that it came across that way to you. Please know that I was/am sincere in my desire to hear the thoughts of others while this is a current topic.

willvarfar1y ago

Yeah python has become more and more version and deps hell. Honestly 3 was all cost and no benefit and we'd all be fine if we'd stuck with 2. There were also some early missteps in api design like async and pandas and matplotlib that we all now have to live with. I even ran into problems with PIL changing API for textsize recently. Just a thousand cuts.

And yet for simple little standalone programs and notebooks, particularly for science, it is super simple and natural to turn to it.

nonameiguess1y ago

Factors I personally think led to Python's popularity:

1) Perl kind of shooting itself in the foot 20 years ago and Python becoming the de facto scripting language for Linux distributions that needed to do anything more complicated than was suitable for shell scripts but didn't require entirely new compiled software projects.

2) The above meant Python is almost always available and a good tool to have handy if you need to do something one-off and simple but more complicated than what you can do with a built-in calculator app. For instance, ever curious if you can pull the exponents off of x509 certificates and manually verify signatures by hand? Pretty easy to do in Python.

3) The C API and compiled modules made it possible to link against pre-existing BLAS implementations, and the extensible syntax and user-defined operators made it possible to mimic the style of MATLAB and R. Thus, Python became a popular choice as a lingua franca for engineers, scientists, and stats geeks who just wanted to do some data exploration or modeling and weren't trying to create shippable software.

4) MIT decided to make Python its primary teaching language in the early 2000s or so and a lot of CS programs in the US followed suit.

5) It became possible at some point to write Microsoft Office macros in Python, giving marginally technical business types a nice option to learn that was more broadly useful than VB script to automate their own workflows.

Why it ever became so popular among actual software developers I have a harder time answering, but for research, exploratory work, prototyping, scripting, workflow automation, it's as good as anything else you can come up with, usually already available, and it has an extremely "batteries included" standard library that means you probably don't need to worry about the kind of ecosystem dependency hell you're envisioning here.

Possibly some factors include the rise of LeetCode, as Python's "executable pseudocode" style means it is very easy to find or translate examples of algorithm implementations into Python solutions for learning, and the fact that a large trend of the post big data era is trying to turn exploratory data analysis pipelining tasks into real software, along with people who used to brand themselves as "data scientists" deciding to become software developers instead, and already knowing Python.

Python also gives you a pretty good first order approximation of a solution when you want to turn some researcher's data model into a service, provided your app is also written in Python. This has become far less important these days with data APIs, ML APIs, standardized formats for model serialization, but previously, a very popular solution to the so-called "two language problem" was just making Python fast enough to let it be both languages itself rather than trying to add web app frameworks to Julia.

Y_Y1y ago· 11 in thread

Are there any limitations on what code can run? (discounting e.g. memory limitations and OS interaction)

I'd love to read about the design process. I think the idea of taking bytecode aimed at the runtime of dynamic languages like Python or Ruby or even Lisp or Java and making custom processors for that is awesome and (recently) under-explored.

I'd be very interested to know why you chose to stay this, why it was a good idea, and how you went about the implementation (in broad strokes if necessary).

hwpythonnerOP1y ago

Thanks — really appreciate the interest!

There are definitely some limitations beyond just memory or OS interaction. Right now, PyXL supports a subset of real Python. Many features from CPython are not implemented yet — this early version is mainly to show that it's possible to run Python efficiently in hardware. I'd prefer to move forward based on clear use cases, rather than trying to reimplement everything blindly.

Also, some features (like heavy runtime reflection, dynamic loading, etc.) would probably never be supported, at least not in the traditional way, because the focus is on embedded and real-time applications.

As for the design process — I’d love to share more! I'm a bit overwhelmed at the moment preparing for PyCon, but I plan to post a more detailed blog post about the design and philosophy on my website after the conference.

mikepurvis1y ago

In terms of a feature-set to target, would it make sense to be going after RPython instead of "real" Python? Doing that would let you leverage all the work that PyPy has done on separating what are the essential primitives required to make a Python vs what are the sugar and abstractions that make it familiar:

https://doc.pypy.org/en/latest/faq.html#what-is-pypy

ammar21y ago

> I'd prefer to move forward based on clear use cases

Taking the concrete example of the `struct` module as a use-case, I'm curious if you have a plan for it and similar modules. The tricky part of course is that it is implemented in C.

Would you have to rewrite those stdlib modules in pure python?

1 more reply

bokchoi1y ago

There were a few chips that supported directly executing JVM bytecodes. I'm not sure why it didn't take off, but I think it is generally more performant to JIT compile hotspots to native code.

https://en.wikipedia.org/wiki/Java_processor

teruakohatu1y ago

It did take off just in a different direction:

https://en.m.wikipedia.org/wiki/Java_Card

To the point where most adult humans in the world probably own a Java-supported processor on a SIM card. Or at least an emulator (for eSIMs).

On example of a CPU arch used on JavaCard devices is the ARM926EJ-S that I believe can execute Java byte code.

tsukikage1y ago

Running bytecode directly on hardware has certainly been tried (e.g. ARM's Jazelle).

In today's world this is generally not great.

Interpreted languages often include bytecode instructions that actually do very complex things and so do not nicely map to operations that can be sanely implemented in hardware. So you end up with all the usual boring alu, branch etc operations implemented in hardware, and anything else traps and runs a software handler.

Separately, interpreted language bytecode is often a poor fit for hardware execution; e.g. for dotnet (and python) bytecode many otherwise trivial operations do not explicitly encode information about types, and therefore the hardware must track type information in order to do the right thing (floating point addition looks very very different from integer addition!)

A lot of effort has been spent on compiler optimisation for x86 and ARM code. JIT compilers benefit massively from this. Meanwhile, interpreted language bytecode is often very lightly optimised, where it is optimised at all (until relatively recently, explicit Python policy as set by Guido van Rossum was to never optimise!) Optimisation has the side effect of throwing away potentially valuable high level / semantic information; optimising at the bytecode level hinders debuggability for interpreted code (which is a primary goal in Python) and can also be detrimental to JIT output; and the results are underwhelming compared to JIT since your small team of plucky bytecode optimisers isn't really going to compete with decades of x86 compiler development; and so the incentive is to not do much of that.

So if you're running bytecode in hardware, on top of all the obvious costs, you are /running unoptimised code/. This is actually the thing that kills these projects - everything else can ultimately be solved by throwing more silicon at it, but this can only really be solved by JITting, and the existing JIT+x86 / JIT+ARM solution is cheap and battle tested.

f1shy1y ago

I understand that is the reason Lisp Machines were dropped (even in the time where Lisp was still a very good seen language). At least I understand so in the SICP videos, like in 1986 it was already clear it was much better to compile to ASM.

checker6591y ago

Forth CPU (in SystemVerilog): https://www.youtube.com/watch?v=DRtSSI_4dvk

hermitShell1y ago

JVM I think I can understand, but do you happen to know more about LISP machines and whether they use an ISA specifically optimized for the language, or if the compilers for x86 end up just doing the same thing?

In general I think the practical result is that x86 is like democracy. It’s not always efficient but there are other factors that make it the best choice.

kragen1y ago

They used an ISA specifically optimized for the language. At the time it was not known how to make compilers for Lisp that did an adequate job on normal hardware.

The vast majority of computers in the world are not x86.

1 more reply

f1shy1y ago

When the RISC processors were available (for the same reason RISC started to grow) it was better to just compile to ASM.

rthomas61y ago· 6 in thread

* What HDL did you use to design the processor?

* Could you share the assembly language of the processor?

* What is the benefit of designing the processor and making a Python bytecode compiler for it, vs making a bytecode compiler for an existing processor such as ARM/x86/RISCV?

hwpythonnerOP1y ago

Thanks for the question.

HDL: Verilog

Assembly: The processor executes a custom instruction set called PySM (Not very original name, I know :) ). It's inspired by CPython Bytecode — stack-based, dynamically typed — but streamlined to allow efficient hardware pipelining. Right now, I’m not sharing the full ISA publicly yet, but happy to describe the general structure: it includes instructions for stack manipulation, binary operations, comparisons, branching, function calling, and memory access.

Why not ARM/X86/etc... Existing CPUs are optimized for static, register-based compiled languages like C/C++. Python’s dynamic nature — stack-based execution, runtime type handling, dynamic dispatch — maps very poorly onto conventional CPUs, resulting in a lot of wasted work (interpreter overhead, dynamic typing penalties, reference counting, poor cache locality, etc.).

pak9rabid1y ago

Wow, this is fascinating stuff. Just a side question (and please understand I am not a low-level hardware expert, so pardon me if this is a stupid question): does this arch support any sort of speculative execution, and if so do you have any sort of concerns and/or protections in place against the sort of vulnerabilities that seem to come inherent with that?

1 more reply

ammar21y ago

> it includes instructions for stack manipulation, binary operations

Your example contains some integer arithmetic, I'm curious if you've implemented any other Python data types like floats/strings/tuples yet. If you have, how does your ISA handle binary operations for two different types like `1 + 1.0`, is there some sort of dispatch table based on the types on the stack?

kragen1y ago

Python the language isn't stack-based, though CPython's bytecode is. You could implement it just as well on top of a register-based instruction set. You may have a point about the other features that make it hard to compile, though.

larusso1y ago

This sounds like your ‚arch‘ (sorry don‘t 100% know the correct term here) could potentially also run ruby/js if the toolchain can interpret it into your assembly language?

1 more reply

tlb1y ago

How do you deal with instructions that iterate through variable amounts of memory, like concatenating strings? Are such instructions interruptible?

Perhaps they don't need to be interruptible if there's no virtual memory.

How does it allocate memory? Malloc and free are pretty complex to do in hardware.

froh1y ago· 5 in thread

Do I get this right? this is an ASIC running a python-specific microcontroller which has python-tailored microcode? and together with that a python bytecode -> microcode compiler plus support infrastructure to get the compiled bytcode to the asic?

fun :-)

but did I get it right?

hwpythonnerOP1y ago

You're close: It's currently running on an FPGA (Zynq-7000) — not ASIC yet — but yeah, could be transferable to ASIC (not cheap though :))

It's a custom stack-based hardware processor tailored for executing Python programs directly. Instead of traditional microcode, it uses a Python-specific instruction set (PySM) that hardware executes.

The toolchain compiles Python → CPython Bytecode → PySM Assembly → hardware binary.

cchianel1y ago

As someone who did a CPython Bytecode → Java bytecode translator (https://timefold.ai/blog/java-vs-python-speed), I strongly recommend against the CPython Bytecode → PySM Assembly step:

- CPython Bytecode is far from stable; it changes every version, sometimes changing the behaviour of existing bytecodes. As a result, you are pinned to a specific version of Python unless you make multiple translators.

- CPython Bytecode is poorly documented, with some descriptions being misleading/incorrect.

- CPython Bytecode requires restoring the stack on exception, since it keeps a loop iterator on the stack instead of in a local variable.

I recommend instead doing CPython AST → PySM Assembly. CPython AST is significantly more stable.

2 more replies

bangaladore1y ago

Have you considered joining the next tiny tapeout run? This is exactly the type of project I'm sure they would sponsor or try to get to asic.

In case you weren't aware, they give you 200 x 150 um tile on a shared chip. There is then some helper logic to mux between the various projects on the chip.

https://tinytapeout.com/

froh1y ago

fascinating :-) how do you do GC/memory management?

relistan1y ago

Not an ASIC, it’s running on an FPGA. There is an ARM CPU that bootstraps the FPGA. The rest of what you said is about right.

IlikeKitties1y ago· 5 in thread

Is this running on an FPGA or were you able to fab a custom chip?

hwpythonnerOP1y ago

Just running on FPGA at the moment.

This is still an early-stage project — it's not completed yet, and fabricating a custom chip would involve huge costs.

I'm a solo developer worked on this in my spare time, so FPGA was the most practical way to prove the core concepts and validate the architecture.

Longer term, I definitely see ASIC fabrication as the way to unlock PyXL’s full potential — but only once the use case is clear and the design is a little more mature.

IlikeKitties1y ago

Oh, my comment wasn't meant as a criticism just curiosity because I would have been extremely surprised to see such a project being fabricated.

I find the idea of a processor designed for a specific very high level language quite interesting. What made you choose python and do you think it's the "correct" language for such a project? It sure seems convenient as a language but I wouldn't have thought it is best suited for that task due to the very dynamic nature of it. Perhaps something like Nim which is similar but a little less dynamic would be a better choice?

jamesfmilne1y ago

Could be a candidate for Tiny Tapeout in the future.

https://tinytapeout.com

1 more reply

ActorNightly1y ago

Im not super versed in hardware, but whats the reason you can't adapt this to run on an ARM microprocessor chip? Why go with FPGA?

Like if I could buy a Cortex board and write Python, hit compile, and have the thing run, this would be INSANELY useful to me, cause cortex chips have pretty great A/D converters for sensing.

throwawaymaths1y ago

there are several free asic shuttle runs available for hobbyists iirc

nynx1y ago· 3 in thread

This is cool for sure. I think you’ll ultimately find that this can’t really be faster than modern OoO cores because python instructions are so complex. To execute them OoO or even at a reasonable frequency (e.g. to reduce combinatorial latency), you’ll need to emit type-specialized microcode on the fly, but you can’t do that until the types are known — which is only the case once all the inputs are known for python.

hwpythonnerOP1y ago

Thanks — appreciate it!

You're right that dynamic typing makes high-frequency execution tricky, and modern OoO cores are incredibly good at hiding latencies. But PyXL isn't trying to replace general-purpose CPUs — it's designed for efficient, predictable execution in embedded and real-time systems, where simplicity and determinism matter more than absolute throughput. Most embedded cores (like ARM Cortex-M and simple RISC-V) are in-order too — and deliver huge value by focusing on predictability and power efficiency. That said, there’s room for smart optimizations even in a simple core — like limited lookahead on types, hazard detection, and other techniques to smooth execution paths. I think embedded and real-time represent the purest core of the architecture — and once that's solid, there's a lot of room to iterate upward for higher-end acceleration later.

IshKebab1y ago

Very cool! Nobody who really wants simplicity and determinism is going to be using Python on a microcontroller though.

2 more replies

gavinsyancey1y ago

Sure, but for embedded use cases (which this is targeting), the goal isn't raw speed so much as being fast enough for specific use cases while minimizing power usage / die area / cost.

gadys1y ago· 3 in thread

Look impressive How does this compare to pypy?

hwpythonnerOP1y ago

PyPy is a JIT compiler — it runs on a standard CPU and accelerates "hot" parts of a program after runtime analysis.

This is a great approach for many applications, but it doesn’t fit all use cases.

PyXL is a hardware solution — a custom processor designed specifically to run Python programs directly.

It's currently focused on embedded and real-time environments where JIT compilation isn't a viable option due to memory constraints, strict timing requirements, and the need for deterministic behavior.

wiesbadener1y ago

That a interesting project! I have some follow up:

> No VM, No C, No JIT. Just PyXL.

Is the main goal to achive C-like performance with the ease of writing python? Do you have a perfomance comparision against C? Is the main challenge the memory management?

> PyXL runs on a Zynq-7000 FPGA (Arty-Z7-20 dev board). The PyXL core runs at 100MHz. The ARM CPU on the board handles setup and memory, but the Python code itself is executed entirely in hardware. The toolchain is written in Python and runs on a standard development machine using unmodified CPython.

> PyXL skips all of that. The Python bytecode is executed directly in hardware, and GPIO access is physically wired to the processor — no interpreter, no function call, just native hardware execution.

Did you write some sort of emulation to enable testing it without the physical Arty board?

2 more replies

nurettin1y ago

this project takes bytecode, maps it to fpga instructions. pypy can't do that.

TickleSteve1y ago· 3 in thread

There is a long history of CPUs tailored to specific languages:

- Lisp/lispm

- Ada/iAPX

- C/ARM

- Java/Jazelle

Most don't really take off or go in different directions as the language goes out of fashion.

pjmlp1y ago

Well, one could argue that modern CPUs are designed as C Machine, even more so that now everyone is adding hardware memory tagging as means to fix C memory corruption issues.

1 more reply

Symmetry1y ago

Also a fairly interesting Haskell efforts.

https://mn416.github.io/reduceron-project/

These range from a few instructions to accelerate certain operations, to marking memory for the garbage collector, to much deeper efforts.

jonathaneunice1y ago

Also: UCSD p-System, Symbolics Lisp-on-custom hardware, ...

Historically their performance is underwhelming. Sometimes competitive on the first iteration, sometimes just mid. But generally they can't iterate quickly (insufficient resources, insufficient product demand) so they are quickly eclipsed by pure software implementations atop COTS hardware.

This particular Valley of Disappointment is so routine as to make "let's implement this in hardware!" an evergreen tarpit idea. There are a few stunning exceptions like GPU offload—but they are unicorns.

1 more reply

sunray21y ago· 2 in thread

Very interesting!

What's the fundamental physical limits here? Namely, timing precision, latency and jitter? How fast could PyXL bytecode react to an input?

For info, there is ARTIQ: vaguely similar thing that effectively executes Python code with 'embedded level' performance:

https://m-labs.hk/experiment-control/artiq/

ARTIQ is quite common in quantum physics labs. For that you need very precise and determining timing. Imagine you're interfering two photons as they reach a piece of glass, so that they can interact. It doesn't get faster than photons! That typically means nanosecond timing, sub-microsecond latency.

How ARTIQ does it is also interesting. The Python code is separate from the FPGA which actually executes the logic you want to do. In a hand-wavy way, you're then 'as fast' as the FPGA. How, though? The catch is, you have to get the Python code and FPGA gateware talking to each other, and that's technically difficult and has many gotchas. In comparison, although PyXL isn't as performant, if it makes it simpler for the user, that's a huge win for everyone.

Congrats once again!

sunray21y ago

(minor edit: for observing experimental signatures of photon interference, nanosecond precision is the minimum to see anything when synchronising your experimental bits and pieces, but to see a useful signal needs precision at the 10s of picoseconds! So, beyond what's immediately possible here.)

brcmthrowaway1y ago

Did you work at Rigetti?

1 more reply

bieganski1y ago· 2 in thread

it would be nice to have some peripheral drivers implemented (UART, eMMC etc).

having this, the next tempting step is to make `print` function work, then the filesystem wrapper etc.

btw - what i'm missing is a clear information of limitations. it's definitely not true that i can take any Python snippet and run it using PyXL (for example threads i suppose?)

hwpythonnerOP1y ago

Great points!

Peripheral drivers (like UART, SPI, etc.) are definitely on the roadmap - They'd obviously be implemented in HW. You're absolutely right — once you have basic IO, you can make things like print() and filesystem access feel natural.

Regarding limitations: you're right again. PyXL currently focuses on running a subset of real Python — just enough to show it's real python and to prove the core concept, while keeping the system small and efficient for hardware execution. I'm intentionally holding off on implementing higher-level features until there's a real use case, because embedded needs can vary a lot, and I want to keep the system tight and purpose-driven.

Also, some features (like threads, heavy runtime reflection, etc.) will likely never be supported — at least not in the traditional way — because PyXL is fundamentally aimed at embedded and real-time applications, where simplicity and determinism matter most.

throwup2381y ago

Are you planning on licensing the IP core? It would be great to have your core integrated with ESP32, running alongside their other architectures, so they can handle the peripheral integration, wifi, and Python code loading into your core, while it sits as another master on the same bus as the other peripherals.

Do you plan to have AMBA or Wishbone Bus support?

1 more reply

wodenokoto1y ago· 2 in thread

I can totally see a future where you can select “accelerated python” as an option for your AWS lambda code.

hwpythonnerOP1y ago

When I first started PyXL, this kind of vision was exactly on my mind.

Maybe not AWS Lambda specifically, but definitely server-side acceleration — especially for machine learning feature generation, backend control logic, and anywhere pure Python becomes a bottleneck.

It could definitely get there — but it would require building a full-scale deployment model and much broader library and dynamic feature support.

That said, the underlying potential is absolutely there.

petra1y ago

This sounds brilliant.

What's missing so you could create a demo for vc's or the relevant companies , proving the potential of this as competitive server-class core ?

1 more reply

swoorup1y ago· 2 in thread

How does garbage collection work here? Are they just set of PySM code?

hwpythonnerOP1y ago

GC is still a WIP, but the key idea is the system won't stall — garbage collection happens asynchronously, in the background, without interrupting PyXL execution.

jy148981y ago

Sounds similar to something one of my classmates worked on at uni https://www.bristol.ac.uk/research/groups/trustworthy-system...

tgtweak1y ago· 2 in thread

Have you tested it on any faster FPGAs? I think Azure has instances with xilinx/AMD accelerators paired.

>Standard_NP10s instance, 1x AMD Alveo U250 FPGA (64GB)

Would be curious to see how this benchmarks on a faster FGPA since I imagine clock frequency is the latency dictator - while memory and tile can determine how many instances can run in parallel.

hwpythonnerOP1y ago

Not yet — I'm currently testing on a Zynq-7000 platform (embedded-class FPGA), mainly because it has an ARM CPU tightly integrated (and it's rather cheap). I use the ARM side to handle IO and orchestration, which let me focus the FPGA fabric purely on the Python execution core, without having to build all the peripherals from scratch at this stage.

To run PyXL on a server-class FPGA (like Azure instances), some adaptations would be needed — the system would need to repurpose the host CPU to act as the orchestrator, handling memory, IO, etc.

The question is: what's the actual use case of running on a server? Besides testing max frequency -- for which I could just run Vivado on a different target (would need license for it though)

For now, I'm focusing on validating the core architecture, not just chasing raw clock speeds.

zoobab1y ago

You can get cheap Zynq boards on Aliexpress, like old mining boards.

I have a Paralella board here with a Zynq.

tuetuopay1y ago· 2 in thread

So basically you took the idea of Jazelle extensions that can run Java bytecode natively, but for python?

This is amazing, great work!

hwpythonnerOP1y ago

Thanks you very much. I learned of Jazelle after started working on it and this is a good thing, because Jazelle didn't become too popular AFAIK, so it would just make me quit. Glad I didn't though :)

mid-kid1y ago

The significant difference between Jazelle and your project is how Jazelle sits on top of a CPU that can already run a java interpreter without the instruction set extensions, said instruction set didn't implement all of java (it still required a runtime to implement the missing opcodes, in ARM), and java runtimes quickly got better optimized than doing the same thing with the instruction set.

I think building a CPU that can only do this is a really novel idea and am really interested in seeing when you eventually disclose more implementation details. My only complaint is that it isn't Lua :P

dec0dedab0de1y ago· 2 in thread

Congratulations!

This is so cool, I have dreamt about doing this but wouldn't know where to start. Do you have a plan for releasing it? What is your background? Was there anything that was way more difficult than you thought it would be? Or anything that was easier than you expected?

hwpythonnerOP1y ago

Thanks so much — really appreciate it!

Right now, the plan is to present it at PyCon first (next month) and then publish more about the internals afterward. Long-term, I'm keeping an open mind, not sure yet.

My background is in high-frequency trading (HFT), high-performance computing (HPC), systems programming, and networking. I didn't come from HW background — or at least, I wasn't when I started — but coming from the software side gave me a different perspective on how dynamic languages could be made much more efficient at the hardware level.

Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...

Easy - Not sure if categorize as easy, but more surprising: The current implementation is rather simple and elegant (at least I think so :-) ), so still no special advanced CPU design stuff (branch prediction, super-scalar, etc). So even now, I'm getting a huge improvement over CPython or MicroPython VMs in the known python bottlenecks (branchings, function calls, etc)

dec0dedab0de1y ago

Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...

Alright well those dots are begging me to ask what they mean, or at least one specific story for the nerds :-)

Long-term, I'm keeping an open mind, not sure yet.

Well please consider open source, even if you charge for access to your open source code. And even if you don't go open source, atleast make it cheap enough that a solo developer could afford to build on it without thinking.

redox991y ago· 2 in thread

What's the logic behind going for stack based?

hwpythonnerOP1y ago

Python’s execution model is already very stack-oriented — CPython bytecode operates by pushing and popping values almost constantly. Building PyXL as a stack machine made it much more natural to map Python semantics directly onto hardware, without forcing an unnatural register-based structure on it. It also avoids a lot of register allocation overhead (renaming and such).

bhasi1y ago

What other models are there? Would love to learn about them.

1 more reply

boutell1y ago· 1 in thread

This is very, very cool. Impressive work.

I'm interested to see whether the final feature set will be larger than what you'd get by creating a type-safe language with a pythonic syntax and compiling that to native, rather than building custom hardware.

The background garbage collection thing is easier said than done, but I'm talking to someone who has already done something impressively difficult, so...

rangerelf1y ago

> I'm interested to see whether the final feature set will be larger than what you'd get by creating a type-safe language with a pythonic syntax and compiling that to native, rather than building custom hardware.

It almost sounds like you're asking for Nim ( https://nim-lang.org/ ); and there are some projects using it for microcontroller programming, since it compiles down to C (for ESP32, last I saw).

Jean-Papoulos1y ago· 1 in thread

>PyXL is a custom hardware processor that executes Python directly — no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon.

So, no using C libraries. That takes out a huge chunck of pip packages...

hwpythonnerOP1y ago

You're absolutely right — today, PyXL only supports pure Python execution, so C extensions aren’t directly usable.

That said, in future designs, PyXL could work in tandem with a traditional CPU core (like ARM or RISC-V), where C libraries execute on the CPU side and interact with PyXL for control flow and Python-level logic.

There’s also a longer-term possibility of compiling C directly to PyXL’s instruction set by building an LLVM backend — allowing even tighter integration without a second CPU.

Right now the focus is on making native Python execution viable and efficient for real-time and embedded systems, but I definitely see broader hybrid models ahead.

yanniszark1y ago· 1 in thread

Great work! :D I had a question about that though. Instead of compiling to PySM, why not compile directly to a real assembly like ARM? Is the PySM assembly very special to accomodate python features in a way that can't be done efficiently in existing architectures like ARM?

hwpythonnerOP1y ago

Thanks — appreciate it!

Good question. In theory, you can compile anything Turing-complete to anything else — ARM and Python are both Turing-complete. But practically, Python's model (dynamic typing, deep use of the stack) doesn't map cleanly onto ARM's register-based, statically-typed instruction set. PySM is designed to match Python’s structure much more naturally — it keeps the system efficient, simpler to pipeline, and avoids needing lots of extra translation layers.

willvarfar1y ago· 1 in thread

Fantastic work! :D Must be super-satisfying to get it up and running! :D

Is it tied to a particular version of python?

hwpythonnerOP1y ago

Thanks — it’s definitely been incredibly satisfying to see it run on real hardware!

Right now, PyXL is tied fairly closely to a specific CPython version's bytecode format (I'm targeting CPython 3.11 at the moment).

That said, the toolchain handles translation from Python source → CPython bytecode → PyXL Assembly → hardware binary, so in principle adapting to a new Python version would mainly involve adjusting the frontend — not reworking the hardware itself.

Longer term, the goal is to stabilize a consistent subset of Python behavior, so version drift becomes less painful.

jrexilius1y ago· 1 in thread

Amazing work! Is the primary goal here to allow more production use of python in an embedded context, rather than just prototyping?

hwpythonnerOP1y ago

Thank you! And yes, exactly.

chippiewill1y ago· 1 in thread

Very cool. There's a similar project, Polyphony (https://github.com/polyphony-dev/polyphony) that translates Python directly into Verilog - no processor (A bit like what HLS does for C++). As part of my degree dissertation I tacked on AXI bus support to it to facilitate communication between the CPU and FPGA on a Zynq as a PoC of doing hardware/software co-design with Python.

I'd definitely be interested in how this project progresses, particularly if it adds support for integration to the CPU. Some tie-in to the Pynq project could be super fun.

brcmthrowaway1y ago

You should have used a FOSS fabric bus instead of axi

pjmlp1y ago· 1 in thread

This is kind of cool, basically a Python Machine. :)

boutell1y ago

I see what you did there! There's a LISP Machine with its guts on display at the MIT Museum. I recall we had one in the graduate student comp sci lab at University of Delaware (I was a tolerated undergrad). By then LISP was faster on a Sun workstation, but someone had taught it to play Tetris.

hermitShell1y ago· 1 in thread

fantastic project. Do you envision this as living on FPGA's forever, or getting into silicon directly? Maybe an extension of RISC-V?

hwpythonnerOP1y ago

Oh boy, I definitely considered that — turning PyXL into a RISC-V extension was an early idea I thought of.

It could probably be adapted into one.

But I ultimately decided to build it as its own clean design because I wanted the flexibility to rethink the entire execution model for Python — not just adapt an existing register-based architecture.

FPGA is for prototyping. although this could probably be used as a soft core. But looking forward, ASIC is definitely the way to go.

igtztorrero1y ago· 1 in thread

Amazing, I'm sure many programmers would join to contribute to your great project, which could become as big as a Python-based operating system, which due to the simplicity of the code would advance very quickly.

hwpythonnerOP1y ago

Thank you! Right now I'm focusing on keeping the core simple, efficient, and purpose-driven — mainly to run Python well on hardware for embedded and real-time use cases.

As for the future, I’m keeping an open mind. It would be exciting if it grew into something bigger, but my main focus for now is making sure the foundation is as solid and clean as possible.

actinium2261y ago· 1 in thread

So first of all, this is awesome and props to you for some great work.

I have what may be a dumb question, but I've heard that Lua can be used in embedded contexts, and that it can be used without dynamic memory allocation and other such things you don't want in real time systems. How does this project compare to that? And like I said it's likely a dumb question because I haven't actually used Lua in an embedded context but I imagine if there's something there you've probably looked at it?

woodrowbarlow1y ago

with embedded scripting languages (including lua and micropython) the CPU is running a compiled interpreter (usually written in C, compiled to the CPU's native architecture) and the interpreter is running the script. on PyXL, the CPU's native architecture is python bytecode, so there's no compiled interpreter.

boxed1y ago· 1 in thread

How big a deal would it be to include the bytecode->PySM translation into the ISA? It seems like it would be even cooler if the CPU actually ran python bytecode itself.

hwpythonnerOP1y ago

That's a great question! I actually thought a lot about that early on.

In theory, you could build a CPU that directly interprets Python bytecode — but Python bytecode is quite high-level and irregular compared to typical CPU instructions. It would add a lot of complexity and make pipelining much harder, which would hurt performance, especially for real-time or embedded use.

By compiling the Python bytecode ahead of time into a simpler, stack-based ISA (what I call PySM), the CPU can stay clean, highly pipelined, and efficient. It also opens the door in the future to potentially supporting other languages that could target the same ISA!

echoangle1y ago· 1 in thread

Would this be able to handle an exec()- or eval()-call? Is there a Python byte code compiler available as python byte code to include in this processor?

IshKebab1y ago

Yeah this is surely a subset of Python.

tsukikage1y ago· 1 in thread

> A custom toolchain compiles a .py file into CPython ByteCode, translates it to a custom assembly, and produces a binary that runs on a pipelined processor built from scratch.

> Runs a subset of Python

What's the advantage of using a new custom toolchain, custom instruction set and custom processor over existing tools that compile a subset of Python for existing CPUs? - e.g. Cython, Nuitka etc?

hwpythonnerOP1y ago

Compilers and optimizers are great tools for some use cases, but not all.

Just to name a few limitations:

- Many rely heavily on the CPython runtime, meaning garbage collection, interoperability, and object semantics are still governed by CPython’s model.

- They’re rarely designed with embedded or real-time use cases in mind: large binaries, non-deterministic execution (due to the underlying architecture or GC behavior), and limited control over timing.

If these solutions were truly turnkey and broadly capable, CPython wouldn't still dominate—and there’d be no reason for MicroPython to exist either.

focusgroup01y ago· 1 in thread

Incredible work. This is a paradigm shift for ML and embedded workflows. And congratulations, you are going to ring the bell with this one.

hwpythonnerOP1y ago

Thank you so much — that really means a lot!

It's still early days and there’s a lot more work ahead, but I'm very excited about the possibilities.

I definitely see areas like embedded ML and TinyML as a natural fit — Python execution on low-power devices opens up a lot of doors that weren't practical before.

crest1y ago· 1 in thread

A "480ns GPIO roundtrip" @ 100MHz implies 48 cycles for a single GPIO access. I would understand one or two cycles, but what does it spend the other ~46 cycles on? Does Python really have a >40x overhead compared to assembler or C even on optimised hardware or is the benchmark code that bad?

hwpythonnerOP1y ago

Great question!

You're right that it can definitely be faster — there's real room for optimization.

When I have time, I may write a blog post that will explain where the cycles go, why it's different from raw assembler toggling, and how it could be improved.

Also, just to keep things in perspective — don't forget to compare apples to apples: On a Pyboard running MicroPython, a simple GPIO roundtrip takes about 14 microseconds. PyXL is already achieving 480 nanoseconds, so it’s a very different baseline.

Thanks for raising it — it's a very good point.

_JamesA_1y ago· 1 in thread

It would be interesting to see something like this that runs WASM as a universal bytecode.

IshKebab1y ago

I'm sure it's been done. I doubt it really is any better though because you can do a lot of optimisations in software that you can't do in hardware.

freeone30001y ago· 1 in thread

This is amazing! Is the “microcode” compiled to final native on the host or the coprocessor?

I’m guessing due to the lack of JIT, it’s executed on the host?

hwpythonnerOP1y ago

The microcode or the ISA of the system actually runs on the co-processor (PyXL custom cpu)

If you refer to the ARM part as the host (did you?) it's just orchestrating the whole thing, it doesn't run the actual Python program

two_handfuls1y ago· 1 in thread

This is a one-person project? I'm impressed!

hwpythonnerOP1y ago

Thanks so much — really appreciate it! Yes, it's been a one-person project so far — just a lot of spare time, persistence, and iteration.

davidkwast1y ago· 1 in thread

Wow. Congratz

hwpythonnerOP1y ago

Thank you!

yeahwhatever101y ago· 1 in thread

How are you simulating the designs for the FPGA? Are you paying for ModelSim?

hwpythonnerOP1y ago

No, I'm not paying for ModelSim. I've been using free tools like Icarus Verilog — it was good enough for my needs so far. If I need more performance later, I might migrate to Verilator. I could also use Vivado’s built-in XSim, but coming from a software background, I generally prefer more Unix-style tools rather than heavier hardware IDEs.

rangerelf1y ago· 1 in thread

Incredible work :-)

Congratulations!!

hwpythonnerOP1y ago

Thank you!

jollyllama1y ago· 1 in thread

Name's a bit confusing when XLWings exists

dragonwriter1y ago

> Name's a bit confusing when XLWings exists

How? XLWings is not a similar name to pyxl. However, even so, the name is... Heavily overloaded:

https://pyxl.com/ (some kind of strategy/CRM/AI thing)

https://pyxl.ai/ (AI website builder)

https://www.pyxl.pro/ (AI image generator)

https://github.com/dropbox/pyxl (Inline HTML extension for Python)

https://openpyxl.readthedocs.io/en/stable/ (A Python library to read/write Excel files)

https://www.pyxll.com/ (Excel Add-in to support add-ins written in Python)

1 more reply

dcreater1y ago· 1 in thread

Very impressive! Can it run on RISC V?

SpaceNoodled1y ago

This is a unique architecture, not just software.

1 more reply

UncleOxidant1y ago· 1 in thread

Is the source code available?

hwpythonnerOP1y ago

The source isn’t public at this stage. I'm still deciding the best path forward after PyCon.

HPsquared1y ago· 1 in thread

Not to be confused with openpyxl, a library for working with Excel files.

That then makes me wonder if someone could implement Excel in hardware! (Or something like it)

hwpythonnerOP1y ago

I just had to give it a name. Didn't really search for vacancies. Maybe I need to rename :)

brap1y ago· 1 in thread

Up next: a processor that will directly execute your prompt

growthwtf1y ago

genuinely not a bad idea

jimbokun1y ago· 1 in thread

What's your development background that prepared you to take on a project like this?

Clearly you know a lot about both low level Python internals and a fair amount about hardware design to pull this off.

hwpythonnerOP1y ago

I'm a software engineer by background, mostly in high-frequency trading (HFT), HPC, systems programming, and networking — so a lot of focus on efficiency and low-level behavior. I had played a bit with FPGAs before, but nothing close to this scale — most of the hardware and Python internals work I had to figure out along the way.

hwpythonnerOP1y ago

I built a hardware processor that runs Python programs directly, without a traditional VM or interpreter. Early benchmark: GPIO round-trip in 480ns — 30x faster than MicroPython on a Pyboard (at a lower clock). Demo: https://runpyxl.com/gpio

jonjacky1y ago

A much earlier (2012) attempt at a Python bytecode interpreter on an FPGA:

https://pycpu.wordpress.com/

"Running a very small subset of python on an FPGA is possible with pyCPU. The Python Hardware Processsor (pyCPU) is a implementation of a Hardware CPU in Myhdl. The CPU can directly execute something very similar to python bytecode (but only a very restricted instruction set). The Programcode for the CPU can therefore be written directly in python (very restricted parts of python) ..."

thenobsta1y ago

Amazing work! This is a great project!

Every time I see a project that has a great implementation on an FPGA, I lament the fact that Tabula didn’t make it, a truly innovative and fast FPGA.

<https://en.m.wikipedia.org/wiki/Tabula,_Inc.>

asford1y ago

The benchmark results presented in this page are extremely misleading; you're not comparing to the actual baseline gpio performance available in micropython.

Micropython already exposes "viper", which transpiles byte code to machine instructions for highly timing or performance critical code paths. This is reasonably well explained in the micropython docs, which has an example explaining how to ... trigger a gpio and very rapidly.

https://docs.micropython.org/en/latest/reference/speed_pytho...

Viper runs on device and directly emits native machine code for decorated micropython functions. If you have serious timing requirements for gpio, then this is how you do it.

Of course, this is restricted subset of the language compatible with direct native code gen, notably just supporting integer datatypes. However, I would be shocked if this project wasn't also restricted to a subset of the language functionality for your transpilation pipeline.

The benchmark should be rewritten to compare against a baseline in micropython using viper. Though this project is pretty neat, the over inflated performance claims would rapidly deflate against a strong baseline.

kristianpaul1y ago

This always mede think back to J1 Forth CPU https://excamera.com/files/j1.pdf

fluorinerocket1y ago

Makes me think of LabVIEW FPGA, where you could run LabVIEW code directly on FPGA, more like generate vhdl or verilog from LabVIEW, and do very high loop rate deterministic control systems. Very cool. Except with that you were locked down to the national instruments ecosystem and no one really used it.

M4R5H4LL1y ago

I love this kind of project, this is wonderful work. I guess the challenge is to now make it work for general purpose Python. In any case it looks very much like a marketable product already. I would seek financing to see how far this can go.

bluelightning2k1y ago

I am a pretty smart person. But once in a while I see something like this which reminds me there's always someone far smarter.

Absolutely incredible.

hoistbypetard1y ago

It seems worth noting that the board you're comparing it to costs <$30 where the dev board you're running on costs $250+.

That said... awesome work! I wish I could get to PyCon this year to see your talk.

Are you planning to post your core so others can replicate your work?

simonw1y ago

This looks incredible.

Do you have any open source code available for this yet?

Are you planning to release this as open source? If not, do you have a rough idea for how you plan to commercial license this tech?

ConanRus1y ago

> the program is compiled to a CPython Bytecode and then compiled again to PyXL assembly. It is then linked together and a binary is generated.

why are we not doing this for a standard python? i think LLVM is just for that, no?

jay-barronville1y ago

This type of project is why I love HN. This work is brilliant!

Almost every question I had, you already answered in the comments. The only one remaining at the moment: How long exactly have you been working on PyXL?

startupsfail1y ago

Nice, next step could be rolling out that bytecode compiler in Python, so it’s self-contained. And a port to some LLM-on-silicon, so we could have it executing Python as the inference goes :-P

zoobab1y ago

To reflash ch32v003 chips, I need to create bits of 250ns, so with 480ns it's not enough. Is there a way to make it faster?

warble1y ago

Wow, these FPGAs are not cheap. Don't they also have a couple of ARM cores attached on the SOC?

vrighter1y ago

you created a custom processor and made a compiler for it. The source language happens to be python, but the generated bytecode is not what executes eon the cpu. A custom ISA is not the python bytecode

globalnode1y ago

Great idea and frankly I'm surprised it hasn't been done before. Probably because you would have to sell an awful lot of them to make $. But there would definitely be a market I think. For example if they were cheap, say much cheaper than a Pi, I'd go for something like this over a full Linux machine for dedicated projects. But then how would you do complex things like interfacing to cameras and leveraging encoders etc? Or is this sort of device just not for that type of project.

esseph1y ago

This seems super, super cool!

sneak1y ago

How long did you work on this?

psychip1y ago

it was cool until i read the line "what is gpio"

actinium2261y ago

This is awesome

ingen0s1y ago

Thats great!

ktimespi1y ago

Kind of insane that you achieved this. Does your processor support all python bytecode at this point? How do you implement ref counting and garbage collection?

hoseja1y ago

I wonder if silicon can feel pain.

igtztorrero1y ago

Amazing,

flmontpetit1y ago

For a minute there I was imagining Python as the actual instruction set and my brain was segfaulting.

Very cool project still

j / k navigate · click thread line to collapse

265 comments

212 comments · 72 top-level

obitsten1y ago· 17 in thread

cchianel1y ago

- The vast majority of code generation would have to be dynamic dispatches, which would not be too different from CPython's bytecode.

A JIT compiler might be able to optimize some of these cases (observing what is the actual type used), but a JIT compiler can use the source file/be included in the CPython interpreter.

hwpythonnerOP1y ago

You make a great point — type information is definitely a huge part of the challenge.

I'd add that even beyond types, late binding is fundamental to Python’s dynamism: Variables, functions, and classes are often only bound at runtime, and can be reassigned or modified dynamically.

So even if every object had a type annotation, you would still need to deal with names and behaviors changing during execution — which makes traditional static compilation very hard.

That’s why PyXL focuses more on efficient dynamic execution rather than trying to statically "lock down" Python like C++.

pjmlp1y ago

Solved by Smalltalk, Self, and Lisp JITs, that are in the genesis of JIT technology, some of it landed on Hotspot and V8.

2 more replies

Qem1y ago

> The primary reason, in my opinion, is the vast majority of Python libraries lack type annotations (this includes the standard library).

When type annotations are available, it's already possible to compile Python to improve performance, using Mypyc. See for example https://blog.glyph.im/2022/04/you-should-compile-your-python...

Someone1y ago

Python doesn’t eschew all benefits of compilation. It is compiled, but to an intermediate byte code, not to native code, (somewhat) similar to the way java and C# compile to byte code.

Those, at runtime (and, nowadays, optionally also at compile time), convert that to native code. Python doesn’t; it runs a bytecode interpreter.

jerf1y ago

jonathaneunice1y ago

ModernMech1y ago

Part of the issue is the number of instructions Python has to go through to do useful work. Most of that is unwrapping values and making sure they're the right type to do the thing you want.

kragen1y ago

Even concatenating strings is slow enough that checking the tag bits to see if you are adding integers won't make it much slower.

Where this approach really falls down is choosing between integer and floating point math. (Also, you really don't want to box your floats.)

franga20001y ago

There's no benefit that I know of, besides maybe a tiny cold start boost (since the interpreter doesn't need to generate the bytecode first).

I have seen people do that for closed-source software that is distributed to end-users, because it makes reverse engineering and modding (a bit) more complicated.

Qem1y ago

Check Nuitka: https://nuitka.net/

hwpythonnerOP1y ago

wyldfire1y ago

Even type annotations would have to be anointed with semantics, which (IIUC) they have none today (w/CPython AFAIK). They are just annotations for use by static checkers.

Unless you can perform optimizations, the compilation can't make a whole bunch of progress beyond that bytecode.

* In fact, IIRC there was/is some "freeze" program that would do just that: compile your python program. Under the covers it would bundle libpython with your *.pyc bytecode.

dragonwriter1y ago

> Why is it not routine to "compile" Python?

Where’s the AOT compiler that handles the whole Python language?

f1shy1y ago

AFAIK, one reason is that if you use "eval()" anywhere you need already a whole python compiler shipped with your program. So, compile is not different as shipping the code with the interpreter.

seanw4441y ago

It's called Nim.

archargelod1y ago

Comparing Nim to compiled Python is almost insulting.

Smaller binaries, faster execution, proper metaprogramming, actual type safety, and you don't need to bundle a whole interpreter just to say "hello world"

1 more reply

rkagerer1y ago· 16 in thread

Back when C# came out, I thought for sure someone would make a processor that would natively execute .Net bytecode. Glad to see it finally happened for some language.

kcb1y ago

For Java, this was around for a bit https://en.wikipedia.org/wiki/Jazelle.

monocasa1y ago

Even better was a complete system rather than a mode for arm processors that ran a subset of the common jvm opcodes.

https://en.wikipedia.org/wiki/PicoJava

varispeed1y ago

Didn't some phones have hardware Java execution or does my memory fail me?

2 more replies

jiehong1y ago

Java got that with smart cards for example. Cute oddities of the past

monocasa1y ago

JavaCard was just implemented as just a regular interpreter last time I checked.

supportengineer1y ago

Does anyone remember the JavaOne ring giveaway?

https://news.ycombinator.com/item?id=8598037

zahlman1y ago

I probably should have added a link: https://esolangs.org/wiki/Befunge

ComputerGuru1y ago

I want to say there was a product that did this circa 2006-2008 but all I’m finding is the .NET Micro Framework and its modern successor the .NET nano Framework.

duskwuff1y ago

There was Netduino, but that was a STM32 microcontroller running an interpreter, not dedicated hardware which directly executed CLR code.

rcorrear1y ago

Maybe you’re thinking of Singularity OS?

john-h-k1y ago

The tl;dr (I spent lots of time investigating this) is that it just fundamentally isn’t a good bytecode for execution. It’s designed to be small on disk, not hardware friendly

whoomp123421y ago

I'd be surprised if azure app services didn't do this already.

john-h-k1y ago

I’d be willing to bet my net worth that they don’t

1 more reply

actionfromafar1y ago

Wouldn't that be a real scoop?

bongodongobob1y ago

Azure runs on Linux if I'm not mistaken.

1 more reply

zik1y ago· 14 in thread

A more realistic claim would be "A processor with a custom architecture designed to support python".

goranmoomin1y ago

hamandcheese1y ago

I agree with you, if it ran pyc code directly I would be okay saying it "runs python".

However it doesn't seem like it does, the pyc still had to be further processed into machine code. So I also agree with the parent comment that this seems a bit misleading.

f1shy1y ago

Well it really does not run CPython, but CPython bytecode, compiled down to an assembler. Granted, a very specific, tailored assembler, but still.

Anyway, the project is mega-cool, and very useful (in some specific applications). Is just that the title is a little bit confusing.

hwpythonnerOP1y ago

What gets executed is a direct mapping of Python semantics to hardware. In that sense, this is more “direct” than most systems running Python.

This phrasing is about conveying the architectural distinction: Python logic executed natively in hardware, not interpreted in software.

franzb1y ago

Wouldn't an AoT Python-to-x86 compiler lead to a similar situation where the x86 processor would "run Python directly"?

_kidlike1y ago

After a quick search I found that even Raspberry makes the same claim...

"runs directly on embedded hardware"

https://www.raspberrypi.com/documentation/microcontrollers/m...

I don't understand why they have the need to do this...

rcxdude1y ago

Micropython does run directly on the hardware, though. It's a bare-metal binary, no OS. Which is a different claim to running the python code you give it 'directly'.

f1shy1y ago

hwpythonnerOP1y ago

PyXL is a bit more direct :)

dividuum1y ago

Huh? MicroPython literally does exactly that: You copy over Python source(!) code and it runs on the Pico.

wormius1y ago

Yeah that was my first thing. Wait a minute you run a compiler on it? It's literally compiled code, not direct. Which is fine, but yeah, overselling what it is/does.

Still cool, but I would definitely ease back the first claim.

hwpythonnerOP1y ago

PyXL deliberately avoids tying itself to Python’s high-level syntax or rapid surface changes.

BiteCode_dev1y ago

Which is what nuitka does. But the result doesn't allow for real time python programs, andy you don't get direct access to the hardware like here.

rytill1y ago

The phrasing “<statement> — no X, Y, Z, just <final simplified claim>” is cropping up a lot lately.

4o also ends many of its messages that way. It has to be related.

JadoJodo1y ago· 14 in thread

I'd like to invite any Python devs to go on a tangent with me:

To add some more context:

spprashant1y ago

The gist is that basic Python at its very core is -

a) simple b) limited

miohtama1y ago

Before data science Python was already heavily used in web backend e.g. Instagram, others.

1 more reply

PaulHoule1y ago

My impression was that if you had a problem with Python and then added Docker now you have two problems. I worked at one place where the data sci's had an amazing ability to find defective Pythons.

Sohcahtoa821y ago

> Python 2 vs 3

> virtual environments

> libraries for each version

VWWHFSfQ1y ago

Python is just brutally slow. Anything performance-sensitive has to be done with a native module and now that requires all the same compilation and build tooling that everything else does.

The ecosystem is massive and the core team just keeps adding more and more dubious language features and syntax.

Realistically, Python should have been "done" after async/await and fixing str vs bytes.

__MatrixMan__1y ago

em3rgent0rdr1y ago

whatnow373731y ago

Old-timer here, used Python for about ten years professionally (Go now).

c) It’s a monstrous dumpster fire and getting worse over time, but so is everything else (in the same space). I like Go, but I can see how it’s not for everyone.

TheFlyingFish1y ago

I've used Python a lot over the last ~10 years. It's probably my favorite language, although I'm not immune to its weak points.

To answer your questions in order,

b) and c) are kind of the same question, to me - at least, "why use Python for embedded" is a subset of "why use Python at all."

Anyway, just my rambling $0.02.

[0] https://peps.python.org/pep-0020/

JadoJodo1y ago

Not all rambling, but the exact kind of input I was hoping for. Thank you!

carabiner1y ago

This just seems like a complaint about python package management disguised as a question (aka concern trolling). Yes it's bad. No, it probably won't be improved any time soon.

JadoJodo1y ago

That wasn't my intention at all, but I appreciate that it came across that way to you. Please know that I was/am sincere in my desire to hear the thoughts of others while this is a current topic.

willvarfar1y ago

And yet for simple little standalone programs and notebooks, particularly for science, it is super simple and natural to turn to it.

nonameiguess1y ago

Factors I personally think led to Python's popularity:

4) MIT decided to make Python its primary teaching language in the early 2000s or so and a lot of CS programs in the US followed suit.

Y_Y1y ago· 11 in thread

Are there any limitations on what code can run? (discounting e.g. memory limitations and OS interaction)

I'd be very interested to know why you chose to stay this, why it was a good idea, and how you went about the implementation (in broad strokes if necessary).

hwpythonnerOP1y ago

Thanks — really appreciate the interest!

mikepurvis1y ago

https://doc.pypy.org/en/latest/faq.html#what-is-pypy

ammar21y ago

> I'd prefer to move forward based on clear use cases

Taking the concrete example of the `struct` module as a use-case, I'm curious if you have a plan for it and similar modules. The tricky part of course is that it is implemented in C.

Would you have to rewrite those stdlib modules in pure python?

1 more reply

bokchoi1y ago

There were a few chips that supported directly executing JVM bytecodes. I'm not sure why it didn't take off, but I think it is generally more performant to JIT compile hotspots to native code.

https://en.wikipedia.org/wiki/Java_processor

teruakohatu1y ago

It did take off just in a different direction:

https://en.m.wikipedia.org/wiki/Java_Card

To the point where most adult humans in the world probably own a Java-supported processor on a SIM card. Or at least an emulator (for eSIMs).

On example of a CPU arch used on JavaCard devices is the ARM926EJ-S that I believe can execute Java byte code.

tsukikage1y ago

Running bytecode directly on hardware has certainly been tried (e.g. ARM's Jazelle).

In today's world this is generally not great.

f1shy1y ago

checker6591y ago

Forth CPU (in SystemVerilog): https://www.youtube.com/watch?v=DRtSSI_4dvk

hermitShell1y ago

In general I think the practical result is that x86 is like democracy. It’s not always efficient but there are other factors that make it the best choice.

kragen1y ago

They used an ISA specifically optimized for the language. At the time it was not known how to make compilers for Lisp that did an adequate job on normal hardware.

The vast majority of computers in the world are not x86.

1 more reply

f1shy1y ago

When the RISC processors were available (for the same reason RISC started to grow) it was better to just compile to ASM.

rthomas61y ago· 6 in thread

* What HDL did you use to design the processor?

* Could you share the assembly language of the processor?

* What is the benefit of designing the processor and making a Python bytecode compiler for it, vs making a bytecode compiler for an existing processor such as ARM/x86/RISCV?

hwpythonnerOP1y ago

Thanks for the question.

HDL: Verilog

pak9rabid1y ago

1 more reply

ammar21y ago

> it includes instructions for stack manipulation, binary operations

kragen1y ago

larusso1y ago

This sounds like your ‚arch‘ (sorry don‘t 100% know the correct term here) could potentially also run ruby/js if the toolchain can interpret it into your assembly language?

1 more reply

tlb1y ago

How do you deal with instructions that iterate through variable amounts of memory, like concatenating strings? Are such instructions interruptible?

Perhaps they don't need to be interruptible if there's no virtual memory.

How does it allocate memory? Malloc and free are pretty complex to do in hardware.

froh1y ago· 5 in thread

fun :-)

but did I get it right?

hwpythonnerOP1y ago

You're close: It's currently running on an FPGA (Zynq-7000) — not ASIC yet — but yeah, could be transferable to ASIC (not cheap though :))

The toolchain compiles Python → CPython Bytecode → PySM Assembly → hardware binary.

cchianel1y ago

As someone who did a CPython Bytecode → Java bytecode translator (https://timefold.ai/blog/java-vs-python-speed), I strongly recommend against the CPython Bytecode → PySM Assembly step:

- CPython Bytecode is poorly documented, with some descriptions being misleading/incorrect.

- CPython Bytecode requires restoring the stack on exception, since it keeps a loop iterator on the stack instead of in a local variable.

I recommend instead doing CPython AST → PySM Assembly. CPython AST is significantly more stable.

2 more replies

bangaladore1y ago

Have you considered joining the next tiny tapeout run? This is exactly the type of project I'm sure they would sponsor or try to get to asic.

In case you weren't aware, they give you 200 x 150 um tile on a shared chip. There is then some helper logic to mux between the various projects on the chip.

https://tinytapeout.com/

froh1y ago

fascinating :-) how do you do GC/memory management?

relistan1y ago

Not an ASIC, it’s running on an FPGA. There is an ARM CPU that bootstraps the FPGA. The rest of what you said is about right.

IlikeKitties1y ago· 5 in thread

Is this running on an FPGA or were you able to fab a custom chip?

hwpythonnerOP1y ago

Just running on FPGA at the moment.

This is still an early-stage project — it's not completed yet, and fabricating a custom chip would involve huge costs.

I'm a solo developer worked on this in my spare time, so FPGA was the most practical way to prove the core concepts and validate the architecture.

Longer term, I definitely see ASIC fabrication as the way to unlock PyXL’s full potential — but only once the use case is clear and the design is a little more mature.

IlikeKitties1y ago

Oh, my comment wasn't meant as a criticism just curiosity because I would have been extremely surprised to see such a project being fabricated.

jamesfmilne1y ago

Could be a candidate for Tiny Tapeout in the future.

https://tinytapeout.com

1 more reply

ActorNightly1y ago

Im not super versed in hardware, but whats the reason you can't adapt this to run on an ARM microprocessor chip? Why go with FPGA?

Like if I could buy a Cortex board and write Python, hit compile, and have the thing run, this would be INSANELY useful to me, cause cortex chips have pretty great A/D converters for sensing.

throwawaymaths1y ago

there are several free asic shuttle runs available for hobbyists iirc

nynx1y ago· 3 in thread

hwpythonnerOP1y ago

Thanks — appreciate it!

IshKebab1y ago

Very cool! Nobody who really wants simplicity and determinism is going to be using Python on a microcontroller though.

2 more replies

gavinsyancey1y ago

Sure, but for embedded use cases (which this is targeting), the goal isn't raw speed so much as being fast enough for specific use cases while minimizing power usage / die area / cost.

gadys1y ago· 3 in thread

Look impressive How does this compare to pypy?

hwpythonnerOP1y ago

PyPy is a JIT compiler — it runs on a standard CPU and accelerates "hot" parts of a program after runtime analysis.

This is a great approach for many applications, but it doesn’t fit all use cases.

PyXL is a hardware solution — a custom processor designed specifically to run Python programs directly.

wiesbadener1y ago

That a interesting project! I have some follow up:

> No VM, No C, No JIT. Just PyXL.

Is the main goal to achive C-like performance with the ease of writing python? Do you have a perfomance comparision against C? Is the main challenge the memory management?

Did you write some sort of emulation to enable testing it without the physical Arty board?

2 more replies

nurettin1y ago

this project takes bytecode, maps it to fpga instructions. pypy can't do that.

TickleSteve1y ago· 3 in thread

There is a long history of CPUs tailored to specific languages:

- Lisp/lispm

- Ada/iAPX

- C/ARM

- Java/Jazelle

Most don't really take off or go in different directions as the language goes out of fashion.

pjmlp1y ago

Well, one could argue that modern CPUs are designed as C Machine, even more so that now everyone is adding hardware memory tagging as means to fix C memory corruption issues.

1 more reply

Symmetry1y ago

Also a fairly interesting Haskell efforts.

https://mn416.github.io/reduceron-project/

These range from a few instructions to accelerate certain operations, to marking memory for the garbage collector, to much deeper efforts.

jonathaneunice1y ago

Also: UCSD p-System, Symbolics Lisp-on-custom hardware, ...

1 more reply

sunray21y ago· 2 in thread

Very interesting!

What's the fundamental physical limits here? Namely, timing precision, latency and jitter? How fast could PyXL bytecode react to an input?

For info, there is ARTIQ: vaguely similar thing that effectively executes Python code with 'embedded level' performance:

https://m-labs.hk/experiment-control/artiq/

Congrats once again!

sunray21y ago

brcmthrowaway1y ago

Did you work at Rigetti?

1 more reply

bieganski1y ago· 2 in thread

it would be nice to have some peripheral drivers implemented (UART, eMMC etc).

having this, the next tempting step is to make `print` function work, then the filesystem wrapper etc.

btw - what i'm missing is a clear information of limitations. it's definitely not true that i can take any Python snippet and run it using PyXL (for example threads i suppose?)

hwpythonnerOP1y ago

Great points!

throwup2381y ago

Do you plan to have AMBA or Wishbone Bus support?

1 more reply

wodenokoto1y ago· 2 in thread

I can totally see a future where you can select “accelerated python” as an option for your AWS lambda code.

hwpythonnerOP1y ago

When I first started PyXL, this kind of vision was exactly on my mind.

It could definitely get there — but it would require building a full-scale deployment model and much broader library and dynamic feature support.

That said, the underlying potential is absolutely there.

petra1y ago

This sounds brilliant.

What's missing so you could create a demo for vc's or the relevant companies , proving the potential of this as competitive server-class core ?

1 more reply

swoorup1y ago· 2 in thread

How does garbage collection work here? Are they just set of PySM code?

hwpythonnerOP1y ago

GC is still a WIP, but the key idea is the system won't stall — garbage collection happens asynchronously, in the background, without interrupting PyXL execution.

jy148981y ago

Sounds similar to something one of my classmates worked on at uni https://www.bristol.ac.uk/research/groups/trustworthy-system...

tgtweak1y ago· 2 in thread

Have you tested it on any faster FPGAs? I think Azure has instances with xilinx/AMD accelerators paired.

>Standard_NP10s instance, 1x AMD Alveo U250 FPGA (64GB)

Would be curious to see how this benchmarks on a faster FGPA since I imagine clock frequency is the latency dictator - while memory and tile can determine how many instances can run in parallel.

hwpythonnerOP1y ago

To run PyXL on a server-class FPGA (like Azure instances), some adaptations would be needed — the system would need to repurpose the host CPU to act as the orchestrator, handling memory, IO, etc.

The question is: what's the actual use case of running on a server? Besides testing max frequency -- for which I could just run Vivado on a different target (would need license for it though)

For now, I'm focusing on validating the core architecture, not just chasing raw clock speeds.

zoobab1y ago

You can get cheap Zynq boards on Aliexpress, like old mining boards.

I have a Paralella board here with a Zynq.

tuetuopay1y ago· 2 in thread

So basically you took the idea of Jazelle extensions that can run Java bytecode natively, but for python?

This is amazing, great work!

hwpythonnerOP1y ago

mid-kid1y ago

dec0dedab0de1y ago· 2 in thread

Congratulations!

hwpythonnerOP1y ago

Thanks so much — really appreciate it!

Right now, the plan is to present it at PyCon first (next month) and then publish more about the internals afterward. Long-term, I'm keeping an open mind, not sure yet.

Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...

dec0dedab0de1y ago

Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...

Alright well those dots are begging me to ask what they mean, or at least one specific story for the nerds :-)

Long-term, I'm keeping an open mind, not sure yet.

redox991y ago· 2 in thread

What's the logic behind going for stack based?

hwpythonnerOP1y ago

bhasi1y ago

What other models are there? Would love to learn about them.

1 more reply

boutell1y ago· 1 in thread

This is very, very cool. Impressive work.

The background garbage collection thing is easier said than done, but I'm talking to someone who has already done something impressively difficult, so...

rangerelf1y ago

It almost sounds like you're asking for Nim ( https://nim-lang.org/ ); and there are some projects using it for microcontroller programming, since it compiles down to C (for ESP32, last I saw).

Jean-Papoulos1y ago· 1 in thread

>PyXL is a custom hardware processor that executes Python directly — no interpreter, no JIT, and no tricks. It takes regular Python code and runs it in silicon.

So, no using C libraries. That takes out a huge chunck of pip packages...

hwpythonnerOP1y ago

You're absolutely right — today, PyXL only supports pure Python execution, so C extensions aren’t directly usable.

There’s also a longer-term possibility of compiling C directly to PyXL’s instruction set by building an LLVM backend — allowing even tighter integration without a second CPU.

Right now the focus is on making native Python execution viable and efficient for real-time and embedded systems, but I definitely see broader hybrid models ahead.

yanniszark1y ago· 1 in thread

hwpythonnerOP1y ago

Thanks — appreciate it!

willvarfar1y ago· 1 in thread

Fantastic work! :D Must be super-satisfying to get it up and running! :D

Is it tied to a particular version of python?

hwpythonnerOP1y ago

Thanks — it’s definitely been incredibly satisfying to see it run on real hardware!

Right now, PyXL is tied fairly closely to a specific CPython version's bytecode format (I'm targeting CPython 3.11 at the moment).

Longer term, the goal is to stabilize a consistent subset of Python behavior, so version drift becomes less painful.

jrexilius1y ago· 1 in thread

Amazing work! Is the primary goal here to allow more production use of python in an embedded context, rather than just prototyping?

hwpythonnerOP1y ago

Thank you! And yes, exactly.

chippiewill1y ago· 1 in thread

I'd definitely be interested in how this project progresses, particularly if it adds support for integration to the CPU. Some tie-in to the Pynq project could be super fun.

brcmthrowaway1y ago

You should have used a FOSS fabric bus instead of axi

pjmlp1y ago· 1 in thread

This is kind of cool, basically a Python Machine. :)

boutell1y ago

hermitShell1y ago· 1 in thread

fantastic project. Do you envision this as living on FPGA's forever, or getting into silicon directly? Maybe an extension of RISC-V?

hwpythonnerOP1y ago

Oh boy, I definitely considered that — turning PyXL into a RISC-V extension was an early idea I thought of.

It could probably be adapted into one.

FPGA is for prototyping. although this could probably be used as a soft core. But looking forward, ASIC is definitely the way to go.

igtztorrero1y ago· 1 in thread

hwpythonnerOP1y ago

Thank you! Right now I'm focusing on keeping the core simple, efficient, and purpose-driven — mainly to run Python well on hardware for embedded and real-time use cases.

As for the future, I’m keeping an open mind. It would be exciting if it grew into something bigger, but my main focus for now is making sure the foundation is as solid and clean as possible.

actinium2261y ago· 1 in thread

So first of all, this is awesome and props to you for some great work.

woodrowbarlow1y ago

boxed1y ago· 1 in thread

How big a deal would it be to include the bytecode->PySM translation into the ISA? It seems like it would be even cooler if the CPU actually ran python bytecode itself.

hwpythonnerOP1y ago

That's a great question! I actually thought a lot about that early on.

echoangle1y ago· 1 in thread

Would this be able to handle an exec()- or eval()-call? Is there a Python byte code compiler available as python byte code to include in this processor?

IshKebab1y ago

Yeah this is surely a subset of Python.

tsukikage1y ago· 1 in thread

> A custom toolchain compiles a .py file into CPython ByteCode, translates it to a custom assembly, and produces a binary that runs on a pipelined processor built from scratch.

> Runs a subset of Python

What's the advantage of using a new custom toolchain, custom instruction set and custom processor over existing tools that compile a subset of Python for existing CPUs? - e.g. Cython, Nuitka etc?

hwpythonnerOP1y ago

Compilers and optimizers are great tools for some use cases, but not all.

Just to name a few limitations:

- Many rely heavily on the CPython runtime, meaning garbage collection, interoperability, and object semantics are still governed by CPython’s model.

If these solutions were truly turnkey and broadly capable, CPython wouldn't still dominate—and there’d be no reason for MicroPython to exist either.

focusgroup01y ago· 1 in thread

Incredible work. This is a paradigm shift for ML and embedded workflows. And congratulations, you are going to ring the bell with this one.

hwpythonnerOP1y ago

Thank you so much — that really means a lot!

It's still early days and there’s a lot more work ahead, but I'm very excited about the possibilities.

I definitely see areas like embedded ML and TinyML as a natural fit — Python execution on low-power devices opens up a lot of doors that weren't practical before.

crest1y ago· 1 in thread

hwpythonnerOP1y ago

Great question!

You're right that it can definitely be faster — there's real room for optimization.

When I have time, I may write a blog post that will explain where the cycles go, why it's different from raw assembler toggling, and how it could be improved.

Thanks for raising it — it's a very good point.

_JamesA_1y ago· 1 in thread

It would be interesting to see something like this that runs WASM as a universal bytecode.

IshKebab1y ago

I'm sure it's been done. I doubt it really is any better though because you can do a lot of optimisations in software that you can't do in hardware.

freeone30001y ago· 1 in thread

This is amazing! Is the “microcode” compiled to final native on the host or the coprocessor?

I’m guessing due to the lack of JIT, it’s executed on the host?

hwpythonnerOP1y ago

The microcode or the ISA of the system actually runs on the co-processor (PyXL custom cpu)

If you refer to the ARM part as the host (did you?) it's just orchestrating the whole thing, it doesn't run the actual Python program

two_handfuls1y ago· 1 in thread

This is a one-person project? I'm impressed!

hwpythonnerOP1y ago

Thanks so much — really appreciate it! Yes, it's been a one-person project so far — just a lot of spare time, persistence, and iteration.

davidkwast1y ago· 1 in thread

Wow. Congratz

hwpythonnerOP1y ago

Thank you!

yeahwhatever101y ago· 1 in thread

How are you simulating the designs for the FPGA? Are you paying for ModelSim?

hwpythonnerOP1y ago

rangerelf1y ago· 1 in thread

Incredible work :-)

Congratulations!!

hwpythonnerOP1y ago

Thank you!

jollyllama1y ago· 1 in thread

Name's a bit confusing when XLWings exists

dragonwriter1y ago

> Name's a bit confusing when XLWings exists

How? XLWings is not a similar name to pyxl. However, even so, the name is... Heavily overloaded:

https://pyxl.com/ (some kind of strategy/CRM/AI thing)

https://pyxl.ai/ (AI website builder)

https://www.pyxl.pro/ (AI image generator)

https://github.com/dropbox/pyxl (Inline HTML extension for Python)

https://openpyxl.readthedocs.io/en/stable/ (A Python library to read/write Excel files)

https://www.pyxll.com/ (Excel Add-in to support add-ins written in Python)

1 more reply

dcreater1y ago· 1 in thread

Very impressive! Can it run on RISC V?

SpaceNoodled1y ago

This is a unique architecture, not just software.

1 more reply

UncleOxidant1y ago· 1 in thread

Is the source code available?

hwpythonnerOP1y ago

The source isn’t public at this stage. I'm still deciding the best path forward after PyCon.

HPsquared1y ago· 1 in thread

Not to be confused with openpyxl, a library for working with Excel files.

That then makes me wonder if someone could implement Excel in hardware! (Or something like it)

hwpythonnerOP1y ago

I just had to give it a name. Didn't really search for vacancies. Maybe I need to rename :)

brap1y ago· 1 in thread

Up next: a processor that will directly execute your prompt

growthwtf1y ago

genuinely not a bad idea

jimbokun1y ago· 1 in thread

What's your development background that prepared you to take on a project like this?

Clearly you know a lot about both low level Python internals and a fair amount about hardware design to pull this off.

hwpythonnerOP1y ago

jonjacky1y ago

A much earlier (2012) attempt at a Python bytecode interpreter on an FPGA:

https://pycpu.wordpress.com/

thenobsta1y ago

Amazing work! This is a great project!

Every time I see a project that has a great implementation on an FPGA, I lament the fact that Tabula didn’t make it, a truly innovative and fast FPGA.

<https://en.m.wikipedia.org/wiki/Tabula,_Inc.>

asford1y ago

The benchmark results presented in this page are extremely misleading; you're not comparing to the actual baseline gpio performance available in micropython.

https://docs.micropython.org/en/latest/reference/speed_pytho...

Viper runs on device and directly emits native machine code for decorated micropython functions. If you have serious timing requirements for gpio, then this is how you do it.

kristianpaul1y ago

This always mede think back to J1 Forth CPU https://excamera.com/files/j1.pdf

fluorinerocket1y ago

M4R5H4LL1y ago

bluelightning2k1y ago

I am a pretty smart person. But once in a while I see something like this which reminds me there's always someone far smarter.

Absolutely incredible.

hoistbypetard1y ago

It seems worth noting that the board you're comparing it to costs <$30 where the dev board you're running on costs $250+.

That said... awesome work! I wish I could get to PyCon this year to see your talk.

Are you planning to post your core so others can replicate your work?

simonw1y ago

This looks incredible.

Do you have any open source code available for this yet?

Are you planning to release this as open source? If not, do you have a rough idea for how you plan to commercial license this tech?

ConanRus1y ago

> the program is compiled to a CPython Bytecode and then compiled again to PyXL assembly. It is then linked together and a binary is generated.

why are we not doing this for a standard python? i think LLVM is just for that, no?

jay-barronville1y ago

This type of project is why I love HN. This work is brilliant!

Almost every question I had, you already answered in the comments. The only one remaining at the moment: How long exactly have you been working on PyXL?

startupsfail1y ago

Nice, next step could be rolling out that bytecode compiler in Python, so it’s self-contained. And a port to some LLM-on-silicon, so we could have it executing Python as the inference goes :-P

zoobab1y ago

To reflash ch32v003 chips, I need to create bits of 250ns, so with 480ns it's not enough. Is there a way to make it faster?

warble1y ago

Wow, these FPGAs are not cheap. Don't they also have a couple of ARM cores attached on the SOC?

vrighter1y ago

globalnode1y ago

esseph1y ago

This seems super, super cool!

sneak1y ago

How long did you work on this?

psychip1y ago

it was cool until i read the line "what is gpio"

actinium2261y ago

This is awesome

ingen0s1y ago

Thats great!

ktimespi1y ago

Kind of insane that you achieved this. Does your processor support all python bytecode at this point? How do you implement ref counting and garbage collection?

hoseja1y ago

I wonder if silicon can feel pain.

igtztorrero1y ago

Amazing,

flmontpetit1y ago

For a minute there I was imagining Python as the actual instruction set and my brain was segfaulting.

Very cool project still

j / k navigate · click thread line to collapse