Because our implementation does not explicitly depend on Python we are able to overcome many of the shortcomings of the Python runtime such as running without the GIL and utilising real threads to dispatch custom Numba kernels running at near C speed without the performance limitations of Python.
In case many people didn't reach the bottom here are the links to the repo and the docs. The project is still in early stages, but is public and released under a BSD license.
I'm starting to run into some performance bottlenecks with Python, and so I'm just now looking at Cython, PyPy, Psyco, and... gasp... C.
From what little I've read, Cython is supposed to be as easy as adding some typing and modifying a few loops here and there, and you are in business.
http://ianozsvald.com/2012/03/18/high-performance-python-1-f...
Before you go that far, I'd recommend making sure you know all the Python gotchas (for example, maybe you have some inner loop that does for x in range(100000) all the time), that you algorithms are in order. Sometimes even silly microoptimization can make a difference if a small function is a significant amount of your runtime. Using multiple processes with e.g. the multiprocessing module can be an option too.
Depending on what data types you operate on, numpy (and now this new thing) can do some amazing things.
PS: check things like http://packages.python.org/line_profiler/ beyond the ordinary profiling.
http://benchmarksgame.alioth.debian.org/u32/which-programs-a...
http://en.wikipedia.org/wiki/Lua_(programming_language)#C_AP...
[1] - on the other hand, it comes with a tool explaining exactly how each of your lines of Cython looks in resulting C, with color-coding for high level overview of which pieces translated smoothly
I can narrow down performance in C/C++ quite quickly, but neither I nor anybody I know has done much of this for Python. Many people who I work with consider a Python implementation a prototype, while Fortran/C/C++ is mature real code worthy of attention.
The only real downside is that C/C++ requires a little knowledge of the POSIX/LINUX or Windows. This represents a learning curve, but when you are over it, it represents quite durable long lasting skills.
Just be prepared for Drew Houston, Paul Graham et al. to come after you whipping their lashes.. (tongue in cheek)
I wonder whether Blaze is a step towards that direction.
Also, I believe most people would consider Java as an alternative to C++, hence all the Java-based Apache projects, such as Mahout, Solr etc.
http://deeplearning.net/software/theano/
in general, i like (ie i don't see a better solution than) the idea of having an AST constructed via an embedded language that is implemented by a library. but it does have downsides - integration with other python features is going to be much more limited (it gives the illusion of a python solution, but in practice you're off in some other world that only looks like python).
are there more details? i guess the AST is fed to something that does the work. and that something will have an API and be replaceable. but is that something also composable? does it have, say, a part related to moving data and another to evaluating data? so that you can combine "distributed across local machines" with "evaluate on GPU"?
It's quite similar, we just take some of the ideas farther and try to generalize the data storage to include storage backends that data scientists use more frequently ( i.e. SQL, CSV, S3, etc ). We're very friendly with the Theano developers and hope to bridge the projects with a compatibility layer at some point.
> (it gives the illusion of a python solution, but in practice you're off in some other world that only looks like python).
I would argue that's what make Python a great numeric language, and NumPy so succesfull. You get this high level language where you can express domain knowledge but also this 1:1 mapping between fast code execution at the C level. Blaze is the continuation of that vision
> i guess the AST is fed to something that does the work. and that something will have an API and be replaceable.
Precisely, we build up a intermediate form called ATerm out of the construction expression objects, do type inference, graph rewriting, and then pattern match our layout, metadata, and type information against a number of backends to find the most optimal one to perform execution. Or if needed we build a custom kernel with Numba informed by all this type and data layout information we've inferred from the graph.
We don't aim to solve all the subproblems in this area ( expression optimization passes, distributed scheduling ) but I think we have a robust enough system that others can build extensions to Blaze to do expression evaluation in whatever fashion they like.
> are there more details?
Yes! See: http://blaze.pydata.org/