- It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.
- If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?
Basically, we needed to support a large existing Python 2.7 codebase. See discussion here: https://github.com/google/grumpy/issues/1
> It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.
There are restrictions. I'll update the README to make note of them. Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.
> If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?
It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.
What about stuff like literal_eval? Or even just monkeypatching with name.__dict__[param] = value ?
> It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.
Would there be a succinct theoretical description of exactly how that's implemented anywhere? What about things like numpy arrays.
I'm guessing pretty much the entire AST module is a no-go?
I managed to run into 2 trying to build a 5 line program :-)
$ cat t.py; ./tools/grumpc t.py > t.go;go build t.go;echo '----';./t
import sys
print sys.stdin.readline()
----
AttributeError: 'module' object has no attribute 'stdin'
$
$ cat t.py ;./tools/grumpc t.py
c = {}
top = sorted(c.items(), key=lambda (k,v): v)
Traceback (most recent call last):
File "./tools/grumpc", line 102, in <module>
sys.exit(main(parser.parse_args()))
File "./tools/grumpc", line 60, in main
visitor.visit(mod)
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
return visitor(node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 302, in visit_Module
self._visit_each(node.body)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 632, in _visit_each
self.visit(node)
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
return visitor(node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stin visit_Assign
with self.expr_visitor.visit(node.value) as value:
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
return visitor(node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 101, in visit_Call
values.append((util.go_str(k.arg), self.visit(k.value)))
File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
return visitor(node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 246, in visit_Lambda
return self.visit_function_inline(func_node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 388, in visit_function_inline
func_visitor = block.FunctionBlockVisitor(node)
File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/block.py", line 432, in __init__
args = [a.id for a in node_args.args]
AttributeError: 'Tuple' object has no attribute 'id'Couldn't they supported with a slower runtime implementation? I mean I still love the idea and actually like the idea.
Python 2.7 is what's running at Google. Not really surprising they're looking at this considering the fast approaching end of (core dev) support for Python 2.7.
Write an interpreter in another language and programatically port modules to Go. Seems pretty sensible to me.
I'd prefer that all new Python tools that need to support 2.x also support 3.x. It's an additional development cost, but IMHO, a worthwhile investment in the future.
Does it really imply many restrictions? Common Lisp, for example, is probably more dynamic than Python and it's been a compiled language for ~20 years.
Common Lisp was designed for interpretation and compilation from day one. The first implementations from 1984/85 had already compilers.
> Common Lisp, for example, is probably more dynamic than Python
Some parts are more dynamic than Python, some not. For example everything that uses CLOS+MOP is probably more dynamic. Also some stuff one can do when using a Lisp interpreter may be more dynamic. CL is more static, where one uses non-extensible functions, type declarations, static compilation, inlining, ... The parts where a CL compiler achieves good runtime speed may not be very 'dynamic' anymore.
Nobody uses the features of Python which make it a dynamic language? Google must write some really weird Python if their compiler is that strict.
Grumpy doesn't even seem to try to implement that. That's a good thing. If you restrict Python a little, it's much easier to compile.
Python has TONS of dynamicity besides those (eval and co), who are seldom used by anyone anyway....
If you think eval is what makes Python dynamic you're doing it wrong...
Frameworks do lots of such dynamic tricks in order to provide nice DSLs for building apps.
"Upgrade to Python3" is the usual defense to that, but it's not really practical for large companies with software such as YouTube completely written in Python 2.x.
There's a reason why the large companies often end up working on new runtimes/interpreters/compilers like HipHop for PHP, Hack, and so on, rather than working on the code bases written in those languages. It is very easy for it to not just be easier to leverage in at that level, but an order of magnitude or two easier. Or three.
Added up, it is easily more than 30s per line.
Of course, if it started out as a "this seems like a cool project", that skews the "a is more efficient than b" ratio significantly.
For a project the size of YouTube, that will be millions of dollars of engineering hours and weeks/months of lost productivity for an unknown gain and almost guaranteed bugs. It's a terrible value proposition so it's better to squeeze every last drop of performance out of the code base you have, which at the scale if this project includes paying engineer(s) to work on a completely new runtime.
And there's a lot more side benefit in a better multithreading Python runtime for Google for other Python code Google has (or hosts), whereas the benefits of a YouTube rewrite are more narrowly limited to YouTube.
It's interesting to see the same pain has now made caused the runtime itself to be implemented in Go.
It's a pity C extensions (often used in scientific computing) are not supported but Go does have support via CGO, so maybe some approach can be worked out to access C routines in the future.
With what I learned about Go and concurrency, I would say that currently in Python, writing concurrent code is not very hard, and is as close to Go as you can get without actually just writing Go.
Now, you may be saying "but Python has the GIL, how can concurrency be easy in Python?" I'd say, you're definitely not wrong that the GIL is a problem, but it's not much of a problem for concurrency.
This goes back to the heart of Rob Pike's classic talk, "Concurrency Is Not Parallelism"[2]. To quote Wikipedia:
In computer science, concurrency is the decomposability property of a
program, algorithm, or problem into order-independent or partially-ordered
components or units.
In Python, you can pretty easily emulate the conceptual properties of
Goroutines and Go channels with Python threads and queues. The problem is that
doing this in Python won't net you the performance increases you get with Go.
And I believe that is an important distinction. There are plenty of cases where
you don't care so much about the performance benefits of parallelism, but you
want the conceptual and implementation benefits of concurrency.In closing, concurrency in Python is pretty easy to work with, it just performs very poorly.
[1] - https://github.com/lelandbatey/defuse_division
[2] - https://blog.golang.org/concurrency-is-not-parallelism
https://github.com/rcarmo/python-utils/blob/master/taskkit.p...
Obviously, it wasn't amazingly performant. But it did help a lot for doing concurrent stuff, and I've been pondering re-doing it for asyncio.
In principle it's possible to implement something like JyNI (http://jyni.org/) or CPyExt (https://morepypy.blogspot.de/2010/04/using-cpython-extension...) to bridge the CPython and Grumpy APIs. In practice, marshalling data across the interface can be very expensive.
If this is good enough to run YouTube's python code already it's honestly super impressive. Well done.
Also, further signs that GC may not be the reason, is that D also has GC, but can link to C libraries somewhat easily (not sure about all cases or how far the ease goes).
Grumpy likely doesn't support the C extensions due to time, and complexity of having to actually emulate the GIL since Python does not have fine grained locking for structures. C extensions that work with Python data structures need to first hold the GIL.
It's because Python's C API is inherently non-thread safe. The API lacks passing an interpreter pointer as a parameter (as Lua's API does for example). So Python is forced to use a terrible thread local storage hack involving the Global Interpreter Lock to swap interpreter instances which is insanely inefficient and limits compute-bound programs to a single thread.
Python 3.x had a chance to fix the API and do away with the GIL once and for all, but inexplicably they did not. There was a misguided notion that C extensions between 2.x and 3.x could be interoperable.
My experience with Tcl teached me to stay away from languages that don't have either a JIT or AOT compiler on their reference implementation.
How many interpreters are there now? And how many of them have even close to 100% compatibility with Python 2.7 or 3.N? Guido has lost control of the language, but has he's still officially the BDFL there's no real standardization body. His stubborn view on functional mechanisms have held the language back syntactically, breaking BC with Python 3 without fixing the language's fundamental problems... it really feels like Python is lost in the desert.
Which doesn't mean the language is dead, but it's rudderless. I think we were all hopeful when Guido joined Google that we'd see real direction for Python, but that obviously didn't happen.
Not that Python is dead, obviously - still lots of great projects are written in Python. But I don't like the language's future.
It is a good time to jump off the Python train in general, and I say that as someone invested in Python who loves it. If possible I'd recommend people reach for Go or Elixir depending on their needs or requirements.
I will admit I'm a little shocked how much of a failure Python3 adoption has been. I think if it had been Grumpy from the start it would've been a huge success. This is exactly what people want and Google should be commended for sharing this.
Here's to hoping Grumpy takes on a life of its own and is the new de facto Python.
As a python 3 user everything seems fine on my end. Though 3 has its own new warts. They are smaller and more forgivable warts for now but its probably not a good sign.
I do agree the direction python is heading is not very interesting anymore but that doesn't mean its dead or useless now.
I can appreciate you have that opinion, but I'll be livid if that's true - the last thing in the entire world I want is to do battle with dependencies and the very, very strict/opinionated Go build system.
If your idea is that all Py27 is just transpiled into Go and then jettisoned, that's fine, but keeping one foot in each world sounds terrible.
People will upgrade if they make py3 more appealing, something like a 20% speed boost would be nice.
Reference, for a non-Python dev who hasn't kept up with it?
Also, map and reduce were removed from the standard global namespace and into the functools module.
I still think Unladen Swallow should have been based on V8, but as I recall that project had very strict compatibility goals that would have made a V8-based implementation impossible.
They also have the benefit of being able to push features into the Go core that they might need for this.
A lot of big software companies do this nowadays. Google and Facebook both have a lot of purpose-built software, some of which gets released as open source, that meets their needs well but is hard to use for other purposes. I guess it's still strictly better than them not open-sourcing the code, but it's definitely an existence proof that just making something open source doesn't make magic happen.
It will be silently abandoned in 18 months....
They port their python libraries so they can reference them from their Go code, and then module by module will rewrite it in pure Go.
I can see this happening for some core modules sure, but I think you've underestimated the work required to convert the sheer amount of Python code at Google. There is a tool inside Google which graphs the number of lines of each language in Piper; I don't think I can quote numbers from my time there but it would be no small feat, even for Google.
Interesting - first time I have heard such an opinion. Why do you think it may be so? One reason I can think of, is they get more control over the languages they use.
Look at AngularJs for a more recent example.
Add to this that Google has some of the worst versioning practices I've ever seen and you get a recipe for destruction.
class Test(object):
def __init__(self, value):
self.value = value
def method(self):
print(self.value)
class Test2(Test):
pass
t = Test("hello")
t.method()
Pythonistas, note I had to have "class Test(object):" and not just "class Test:". The former compiled successfully into a Go program but that program then failed at runtime with "TypeError: class must have base classes".Yeah, Grumpy does not currently support old-style classes. Since all of our code internally requires new-style classes, this was not a high priority feature. It is something that we'll get to.
All that stuff with "switch" seems to be to handle Python exceptions in a language that doesn't have exceptions. Maybe later, analysis can tell that some function can't raise an exception, and translated calls for such functions can be simpler.
I was hoping for something more aggressive even, like compiling Python classes to Go structs so long as the program doesn't need the dynamic behavior. Alternatively, Grumpy could support declaring native Go types via some sort of pragma or a new `struct` keyword or some such, which would be treated like a normal Go object (rather than defining your Go objects in a separate Go package).
But it gives different output; the print() prints a tuple whereas the function print() prints a newline.
Also compare print(1) and print(1,2) with and without the __future__ import.
https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...
Basically, the language doesn't have a "spec" per-se. The language is whatever the defacto CPython implementation happens to do within it's giant eval loop.
Another great talk about CPython internals:
It does[1]. And process of improving it is called PEP[2].
[2]: https://en.wikipedia.org/wiki/Python_(programming_language)#...
A) Not well optimised.
B) Touting features before the spec/standard.
EDIT: people really dislike that I said this, and I'm having trouble finding my original citation- it was on one of the many python books I own. Most likely "Learn Python The Hard Way" but I'll dig out the exact chapter where they compare pypy to cpython and mention that because cpython is the reference implementation it values code clarity over performance optimisation.
CPython is 25 years old -- people have been making it faster for a long time. Python 3.6, the latest release, has many performance improvements, cf. http://www.infoworld.com/article/3120952/application-develop...
(edit: I'm just being polemic about your statement here. CPython is reasonably optimized within the constraints it currently has).
The main reason why Grumpy's slower for most single threaded benchmarks is that most Python workloads involve creating and freeing a bunch of small Python objects. In Go, these objects are garbage that need to be GC'd in a very general way. In CPython, there are free lists, arenas and other optimizations for allocating small (especially immutable) objects. And cleaning up garbage in CPython involves pushing unreferenced objects back onto the free lists for later reuse.
Right, I suppose I assumed that straightforward numerical-looking code would be translated to Go numerical code. Perhaps they just aren't that ambitious yet.
I am not sure, any implementation of Python will beat the single threaded performance of Cpython..
> To solve this problem, we investigated a number of other Python runtimes. Each had trade-offs and none solved the concurrency problem without introducing other issues.
Not strictly true, http://doc.pypy.org/en/latest/stm.html. In general for the main project however, this is true.
There's not a ton there. The next place I'd go is open issues: https://github.com/google/grumpy/issues
It looks like at the moment they're mostly random bug reports from people who have tried Grumpy since this announcement, rather than ones filed by people working on Grumpy since before it was made public. So that's a bit trickier.
The last place I'd look is then, the README: https://github.com/google/grumpy not a ton there about ways they wish to have people contribute.
At this point, what I'd do personally is open an issue asking how you can get involved; possibly by improving this documentation on how to get involved!
Anyway, that's what I'd do. Hope that helps!
to "what's good etiquette for contributing to open source projects?", I'd say: add as much information as possible to your PR, why it does what it does, how it does it, etc.
Just tried this out on a reasonably complex project to see what it outputs. Looks like it only handles individual files and not any python imports in those files. So for now you have to manually convert each file in the project and put them into the correct location within the build/src/grumpy/lib directory to get your dependencies imported. Unless I missed something somewhere.. The documentation is a bit sparse.
Overall I think the project has a lot of potential and I'm hoping it continues to be actively developed to smooth out some of the rough edges.
Your assessment is right: the grumpc compiler takes a single Python file and spits out a Go package. Incidentally, this means you can import a Python module into Go code pretty easily.
I don't have a ready solution for building a large existing project but I'll write up a quick doc to outline the process. The trickiest bit is that the Python statement "import foo.bar" translates to a Go import: import "prefix/foo/bar". Currently prefix always points at the grumpc/lib directory so that's one way to integrate your code, but I need to make it more configurable.
I question the transpiler. I think I'd much rather prefer a solution like Jython.
Compilers like this, from almost-Python to say C/C++, have existed for a while: Cython, Shedskin, Nuitka are some examples.
True Python runtimes fail the CPython test suite. They have some work to do!
Some work on that is discussed here. I would love a dropbox google colab (though also targeting 3.x :) )
http://www.jython.org/jythonbook/en/1.0/Concurrency.html
It can also handle Python's dynamic aspects.
In part it has exposed CPython "implementation quirks" that people were wittingly or otherwise taking advantage of. In other cases there doesn't seem to be obvious reasons for the differences and has required special-casing the python code to handle it.
It has been great with code written from scratch, specifically for it.
Not exactly an endorsement.
Sadly, the code was just dumped into a new Git repo, so no way to tell how many people contributed internally so far.
... but then I don't see Python going anywhere anytime soon. Didn't Microsoft just start a project to get Python's runtime to use CoreCLR's JIT?
There was an article, can't find it now, about an upcoming Python renaissance saying there may be an influx of new interpreters. There's PyPy, Microsoft's CoreCLR thing, now this, etc. It seems people really want to program in Python so there is an effort to make it faster.
*edit: found the article: https://lwn.net/Articles/691070/
The biggest surprise for me is that the Go runtime would be a good fit for Python, performance wise, considering the very different object and dispatch model.
The post also mentions runtime reflection, which used to be painfully slow last time I used it. (Go 1.5, i think).
Has this improved in the latest releases?
If you know Go or are willing to learn about Go and reflection, you can learn a lot about how dynamic languages work under the hood by implementing:
func Add(interface{}, interface{}) interface{} { ... }
using the reflect module to accept all types of numbers, including for a bit of extra fun the math.Big* number types, and returning upgraded numbers as appropriate, or panicking on types you can't Add with. That's not all there is to writing a dynamic language interpreter, but I'd say you can learn the core idea this way, shorn away from a lot of accidental complexity and with a lot of the grunt work plumbing of setting up (type, value) pairs already done for you.> To solve this problem, we investigated a number of other Python runtimes. Each had trade-offs and none solved the concurrency problem without introducing other issues.
One way to leverage C's widespread availability and high-performance while side-stepping some of its deficiencies was simply to use it as a target for a different language. The Cfront C++ compiler which generated C code is probably the most famous example, but I recall that there were many others.
Maybe Go, like C, will make a fruitful target for other language implementations.
It was designed with the notion that all some people really need is just a better C.
For other people there's Swift, Rust, etc.
As always with Google...
What if you humbly contribute to open source projects rather than creating new stuffs, labelled with your own brand, controlled by your own engineers, and with your own design choices, however good they may be ?
Who would use 3 if you could have CPython2 for existing code and write new code in Grumpy Python? This is the dream language for me. Python on the Go runtime.
* It seems like writing a translator to deal with all the use cases is so much more work and risky than iteratively rewriting portions (in whatever faster more concurrent language) and using some form microservice/process message passing to communicate with legacy pieces.
* Love to know how they compose async operations currently? Is it some sort of object (e.g. Futures, promises, observables, etc)? Is Grumpy going to have some sort of language difference (to Python) to compose async stuff (e.g. async and await)?
Of course being biased towards the JVM (since I know it so well) they could get really fast concurrency if they want with Jython today. Most of the Python tools already work with Jython (assuming 2.7).
With Jython you could always drop down into Java (or any other JVM lang) if you need more speed as well C for cpython (or even C from Java). It is unclear what you do with Grumpy with performance critical code. Can you interface with Go code or is the plan C?
Sorry, can't be very specific, but rewriting all the frontend code would take a lot more effort than writing a new Python runtime :)
> It seems like writing a translator to deal with all the use cases is so much more work and risky than iteratively rewriting portions (in whatever faster more concurrent language) and using some form microservice/process message passing to communicate with legacy pieces.
We do iteratively rewrite components as well. We are pursuing multiple strategies.
> Love to know how they compose async operations currently? Is it some sort of object (e.g. Futures, promises, observables, etc)?
Most async operations are performed out-of-process by other servers.
> Is Grumpy going to have some sort of language difference (to Python) to compose async stuff (e.g. async and await)?
I'd love to support async and await at some point.
> Of course being biased towards the JVM (since I know it so well) they could get really fast concurrency if they want with Jython today. Most of the Python tools already work with Jython (assuming 2.7).
We did also do an evaluation of Jython but there were a number of technical issues that made it unsuitable for our codebase and workload. One such example is this longstanding issue: http://bugs.jython.org/issue527524. I just noticed the very recent update on that thread that implemented the workaround outlined in 2010 by Jim Baker. We tried that workaround and found we got a huge performance hit on affected code. There were a few other general performance problems as well but I can't recall all the details.
Please note I'm not at all bashing Jython, I think it's a great project with a sound design, it just wasn't right for us.
> With Jython you could always drop down into Java (or any other JVM lang) if you need more speed as well C for cpython (or even C from Java). It is unclear what you do with Grumpy with performance critical code. Can you interface with Go code or is the plan C?
You can interface with Go code directly, e.g. from the blog post:
from __go__.net.http import ListenAndServe, RedirectHandler
handler = RedirectHandler('http://github.com/google/grumpy', 303)
ListenAndServe('127.0.0.1:8080', handler)I suppose the easy concurrency and ability to inter-operate with Go libraries is really the driver for Go over that.
Edit: for example, the fib benchmark they cite is cpu-bound. If the python code used multiprocessing, the performance would scale almost linearly with the number of processes.
Probably, but consider how much bare iron Google has lying around, much of it likely with few cores-per-CPU. Using it efficiently and not accelerating it's obsolescence is probably a priority for them.
BTW, does anyone have data that would suggest how long it does take an org at the scale of an Amazon, Google, or Facebook to entirely replace their HW? I assume that it isn't only through attrition, and that Google for example currently has no servers running that date back to Y2k, but I have no idea what the "half-life" of a server is at their scale.
Around 2011 the beta releases of the static typing additions to Apache Groovy were called "grumpy" but the name was dropped after objections from the Grails crowd. I think the quality of the product is more important than the name, so Grumpy should do OK regardless if it's built and maintained properly.
As soon as something can see into the guts of the interpreter you have to maintain compatibility which is a pain/waste.
Worse than that is that view wasn't designed for multi-threading which is why the GIL exists. The C extensions were t designed to be multi-threaded because that wasn't a thing in Python so they're not safe. You either have to drop them, define a new interface layer that would be safe, or I suppose somehow sandbox their little view of the world but keep it coherent between threads.
If you have a codebase where you can make the choice to drop C extensions and you're trying to accelerate Python it seems like a very smart choice.
I wrote a random sentence generator in Python several years ago. A bit later, I wrote my blog using Java EE. Early on, I had an idea: put that generator in Jython, and spit out a random sentence on every request. It's probably the one feature that I was OK to let go, should I switch platforms.
Since Oracle has only gotten more evil and Java more stagnant over the years (especially in light of TLS features), I've been thinking of possible alternatives. I've been intrigued by newer compiled languages, and it's come down to either Go or Rust, but I've yet to dig too far into them. I might have a winner.
Optimisation. This is a smart move, hard though. A compiler, written well allows the back end to improve the code. So the whole code base can improve with improved analysis.
"The biggest advantage is that interoperability with Go code becomes very powerful and straightforward: Grumpy programs can import Go packages just like Python modules!"
Extending Python (youtube codebase) with Go modules. That's interesting.
Here's a great article from a couple of years ago by Chris Seaton on this topic.
Good? It's funny that a rabbit hole is the place you go to make obvious choices. This culture must seem strange to outsiders.
(Sure, none of my code will work, because it's all Python3 and usually uses C/asm, but it's still early and I'm hopefull.)
https://blog.heroku.com/see_python_see_python_go_go_python_g...
[0] https://github.com/google/grumpy/commit/f60ee257db9d7996e3f8...
Google: Python -> Go
So has YouTube already migrated over to using Grumpy (and no longer running python in production)?
that would be awesome.
yes yes I know tf now "does" 3 but we all know what Google really cares about.
"I'd like to support 3.x at some point" - trotterdylan.
Read: nice to have but when it gets down to brass tacks, 2.7 is where it's at for Google.