This was also my first experience with Rust. The Rust community is absolutely fantastic and the documentation is great. I had very little trouble with the "learning curve hell" that I hear associated with the language. It was definitely a great choice for this work.
I also included PyPy in my validation section and "WOW". It blew both Cannoli and CPython out of the water in performance. The work they're doing is very interesting and it definitely showed on the benchmarks I worked with.
It seems Google and Dropbox are not interested. Google is working on Grumpy, Dropbox worked on Pyston.
Remember Unladen Swallow?
Managers enjoy avoiding conflicts.
Very rarely someone in a position of power will point out to this kind of solution, which anyway is going to be against wishes of many employees.
I think one reason is that the community is doing too good of a job. The language is pretty sane, it solves most problems right, the libs and docs are good, and the general direction thinks take is reasonable. And it's free not only as beer and freedom, but also free from business influences. The PSF is really giving away pretty much everything.
Everybody contribues a little (we have the brett canon team from ms, the guido team from dropbox, the alexi martelli team from google, mozilla even donated for pypy, etc). But it's nothing massive. Nobody said "ok here is 10 millions euros, solve the packaging problem".
Compare to JS: the language started as slow, with terrible design, and no consensus on the direction to take. So eventually, people (Google first) pourred a load of money to it until it became usable, and they had a cleaner leadership. They had huge problem to solve on the ever expending market that is the web plateform. Of course JS as the unfair advantage of a captive audience and total monopoly on its field.
Remember Unladen shallow ? "Google" attempt to JIT Python ? It was just one guy during his internship (http://qinsb.blogspot.fr/2011/03/unladen-swallow-retrospecti...).
And look at the budget the PSF had in 2011 to help the community: http://pyfound.blogspot.fr/2012/01/psf-grants-over-37000-to-... I mean, even today they have to go though so many shenanigans for barely 20k (https://www.python.org/psf/donations/2018-q2-drive/).
But at the same time you hear people complaining they yet can't migrate to Python 3 because they have millions of lines of Python. You hear of them when they want to extend the support for free, but never to support the community.
It's ridiculous.
Also compare to PHP: the creators made a business out of it, plain and simple.
Compare to Java/C#/Go: it's owned by huge players that have a lot of money engaged.
Python really needs a sugar daddy so that we can tackle the few items remaining on the list:
- integrated steps to make an exe/rpm/deb/.app
- JIT that works everywhere
- mobile dev
- multi-core with fast and safe memory sharing
There are projects for that (nuikta, pyjion, kivi, etc), but they all lack of human power, money and hence integration, perfs, features, etc.
You need a simple way to code some GUI, make it work on mobile or desktop, turn it into and exe and distribute it.
You need a simple way to say "this is a long running process, JIT the hell out of it".
> The goal of the thesis was to evaluate language features of Python that were hypothesized to cause performance issues.
In another life I did something similar using a similar compiler simulation technique, looking at other Python features like redundant reference count operations, boxing of numbers, dynamic type checks etc. See G. Barany, Python Interpreter Performance Deconstructed. Dyla'14. http://www.complang.tuwien.ac.at/gergo/papers/dyla14.pdf
After obtaining the numbers in that paper, the work didn't really go anywhere; there were no really obvious optimizations to try based on the data. But it was fun!
Anyway, questions:
1. If I understand the source on GitHub correctly, you parse Python source code yourself. I'm fairly sure your simulation would be a lot more faithful if you compiled Python bytecode instead. Did you consider this, and if yes, was there a particular reason not to do it that way?
I ask this in particular because if I understand your thesis correctly, you look up local variables in hash tables every time they are referenced. This is not what Python does: It maps variable names to integer indices during compilation to bytecode, and the bytecode just takes those embedded constant indices and indexes into an array to obtain a local variable's value. That's a lot faster. And you would get it automatically if you started from bytecode. (Plus, it would be easier to parse, but if you have fun parsing stuff, that's reasonable too.)
2. Where do you actually make useful use of Rust's static ownership system? I've only skimmed that part of the thesis very quickly, but I missed how you track ownership in Python programs and can be sure that things don't escape. Can you give an example of a Python program using dynamic allocation that your compiler maps to Rust with purely static ownership tracking and freeing of the memory when it's no longer used?
3. Related to 2: Why bother with any notion of ownership at all? Did you try mapping everything to Rust's reference counting and just letting it do its best? I'm wondering how much slower that would be. Python is also reference counted, after all, and I guess the Rust compiler should have more opportunities to optimize reference counting operations.
4. In general, do you have an idea why your code is slower than Python, besides the hash table variable lookup issue I mentioned above?
"The Python interpreter now uses a 16-bit wordcode instead of bytecode which made a number of opcode optimizations possible."
They haven't been shy about changing it in the past either, since there's no guarantee of stability, so it's likely to continue to change in incompatible ways.
[1] https://docs.python.org/3/whatsnew/3.6.html#optimizations
This is only true for function arguments right? Module level bindings and class and object attributes are looked up in dictionaries. I think the same for variables used in closures too?
That's really too bad.
> there were no really obvious optimizations to try based on the data.
Is that because Python already is the way it is? In other words, if you started from scratch, how would you design a language differently so that it doesn't run into these issues?
Asking for a friend ;-)
> We have presented the first limit study that tries to quantify the costs of various dynamic language features in Python.
This is spot on what we were doing as well, that's great to have this as a reference.
> 1. If I understand the source on GitHub correctly, you parse Python source code yourself. I'm fairly sure your simulation would be a lot more faithful if you compiled Python bytecode instead. Did you consider this, and if yes, was there a particular reason not to do it that way?
We did not consider this actually. This would be a very interesting concept to explore. For the unoptimized version of Cannoli we do look up variables in a list of hash tables (which represent the current levels of scope). We did perform a scope optimization that then uses indices to access scope elements and this was much faster. However, it meant that the use of functions like `exec` and `del` were no longer permitted since we would not be able to statically determine all scope elements at run time (consider `exec(input())`, this could introduce anything into scope and we can't track that).
If you know, how does CPython resolve scope if it maps variable names to indices? In the case of `exec(input())` and say the input string is `x = 1`, how would it compile bytecode to allocate space for x and index into the value? I don't have much experience with the CPython source, so please excuse me if the question seems naive :)!
> 2. Where do you actually make useful use of Rust's static ownership system? I've only skimmed that part of the thesis very quickly, but I missed how you track ownership in Python programs and can be sure that things don't escape. Can you give an example of a Python program using dynamic allocation that your compiler maps to Rust with purely static ownership tracking and freeing of the memory when it's no longer used?
Elements of the Value enum (that encapsulates all types) relied on `Rc` and `RefCell` to defer borrow checking to run time. Consider a function who has a local variable that instantiates some object. Once that function call has finished Cannoli will pop that local scope table and all mappings will be dropped when it goes out of scope. The object encapsulated in a `Rc` will have it's reference count decremented to 0 and be freed.
This is how I've interpreted the Rust borrow checker, I will say that this was the first time I had ever used Rust so it's possible that I am not completely right on this. But once that table goes out of scope, all elements should be dropped by the borrow checker and any Rc should be decremented/dropped.
> 3. Related to 2: Why bother with any notion of ownership at all? Did you try mapping everything to Rust's reference counting and just letting it do its best? I'm wondering how much slower that would be. Python is also reference counted, after all, and I guess the Rust compiler should have more opportunities to optimize reference counting operations.
I did defer a lot of borrow checking to run time with Rc, but I tried to use this as little as possible to maximize optimizations that may result from static borrow checking.
> 4. In general, do you have an idea why your code is slower than Python, besides the hash table variable lookup issue I mentioned above?
If you remove the 3 outlier benchmarks (that are slow because of Rust printing and a suboptimal implementation of slices), Cannoli isn't too far off from CPython. And in fact, with the ray casting benchmark, Cannoli began to outperform CPython at scale. This leads me to believe that the computations in Cannoli are faster than CPython. However, there is still a lot of work to do to create a more performant version of Cannoli. The compiler itself was only developed for ~4 months, I have no doubt that more development time would yield a better results.
That being said, I think the biggest slowdown comes from features of Rust that might not have been utilized. This is just speculation, but I think the use of lifetimes could benefit the compiled code a lot. I also think there may be more elegant solutions to some of the translations (e.g. slices), that could provide speedup. But I can't say that there is one thing causing the slowdown, and profiling the benchmarks (excluding the outliers) support that.
That being said, I did leave a few suggestions in the "future work" section that talk about writing an AOT compiler for RPython (the version of Python that PyPy's interpreter is written in). This would provide more information at compile time and would be an interesting comparison between a Python interpreter compiled AOT versus a Python interpreter with a JIT (PyPy).
I think this compiler also makes this particular optimization, but this is just one of many many optimizations PyPy does. I imagine that with sufficient work, this compiler could be brought up to speed with PyPy, but as it stands right now, PyPy simply benefits from having years of optimization work that a new project doesn't.
So, for example, you might see that the last 100 calls to a function were done with integers, so you can generate a variant of the function that only works for integers, and check if it's applicable when you enter the function. If that function stops getting used, you can throw it away.
Doing that well ahead of time requires an extremely good idea of how the program will behave at run time, and even with good information, is still very likely to bloat up your binary hugely. (Facebook used to compile their PHP codebase to a multi-gigabyte binary before moving to HHVM, for example).
More information means better optimizations. JITs FTW.
I thought the main benefits were safer code. Is it just the fact that you looked at what needed optimized and put some effort in or did the language choice help?
The author says that this is a research project, and adding the standard library (if possible at all) would be a humongus task by itself.
Dude, you're doing a thesis in computer science. Of course it's easy for you.
Do you plan to support this down the line? I feel like compilers that compile down to Rust could become really good tools depending on how the popularity of the language goes.
I feel like the postfix is something languages lose the bigger they get, I hope this happens to rust too.
[1]: https://en.wikipedia.org/wiki/Pike_(programming_language) †: I have a particular fondness for Pike, because it's derived from the first programming language I ever used: LPC. Anyone else an LPC hacker?
Spun up a quick dashboard of the project here: https://chart.ly/github-dashboard/joncatanio/cannoli
Not tons of revelations there, but cool to see your longest streak was 7 days straight committing to the repo. Also cool to know this is part of your thesis.
What are your plans after Cal Poly?
I'm actually moving out to NYC this July to work for Major League Baseball. The Advanced Media division (MLBAM). I'll be doing some software engineering there, mainly API work for various apps, I'm very excited about it!
I'll have to work on compilers in my free time haha, I really enjoyed the work I did on this thesis.
It does complicate the generated code, I don't know if Rust is the greatest intermediate representation. But I do think it was a better choice than C. Debugging the generated code was so great because of the detail that the Rust compiler displays for warnings/errors.
I'd be interested in seeing how a Python interpreter written in Rust would compare to CPython, this would probably make use of more Rust optimizations (than trying to generate code).
The same analysis could be done on JS or Ruby, it would be cool to see if a similar compiler would yield the same performance results for restricting features in JS/Ruby. It would also validate this work nicely as well.
> Leave the Features: Take the Cannoli - Jonathan Catanio
That's pretty good
Made me laugh out loud.