I ended up using a combination of python threads and processes, which was more work I wanted to do, and still relatively slow.
Also, it's trivial to write multithreaded extensions in C or C++ (at least for data processing). You just have to make sure to release the GIL.
IMO the trick is to just use the Python C API, and not use any wrappers like SWIG or ctypes. The Python C API is a little odd but it is fairly explicit. Also, it's better to write plain functions rather than classes, but for data processing that is natural anyway. If you need a class, do that part in Python.
Writing "one way" extension functions (that don't call back into Python) is quite easy and can give you a huge performance boost.
I think people get hung up on Python extensions because they are using TWO unfamiliar languages -- the wrapper language, and C/C++. But if you are just using one additional language, it's not hard to figure out.
I don't know about OCaml, but Rust (supposedly) will interface perfectly fine with anything providing a C API (which includes Python, along with quite a few other scripting languages from that era, like Perl and Ruby). Rust's ABI-compatibility with C is one of the primary selling points.
I don't know off the top of my head if anyone's tried writing Python extensions in Rust, but I don't see why it wouldn't be possible to do so with at least as much capability (if not more) as C/C++.
The talk above was on the front page last week some time and describes one method to write Python extensions in rust.
I'll have to watch the video below, but don't you have to write some sort of Rust wrapper for every definition in Python.h? What about macros like Py_INCREF and Py_DECREF? I'd guess that someone has or will eventually do that work, but it's yet another layer of complexity, which can have the same downsides as SWIG.
For better or worse, the Python C API is tighlty coupled to the C implementation. That makes it very difficult to be more natural and understandable than C. It's a fairly huge API: https://docs.python.org/2/c-api/index.html
People always try to cover it up with "nice" abstractions, but they invariably end up being leaky (e.g. with respect to threads, garbage collection, OS portability, etc.)
Another huge can of worms: the build system. With a plain C extension, all I need is a C compiler on the system, and I can just do "python setup.py build". The situation with Windows is also quite messy -- I can't imagine Rust making it better.
I have experimented with creating normal .o files from OCaml and linking them with .o files from C/C++. That is a great feature of the OCaml toolchain. Still, I remember the documentation being sparse and I don't even remember how I did it at the moment.
FWIW, my main language is Python, but I love OCaml for certain things, and have grown to like C++ as well. Rust seems very interesting to me for its security properties and because it has native threads rather than including a mini-OS in the runtime like Go. My understanding is that Go is hopeless for many kinds of Python extensions, precisely because of the runtime issue, and calling from Go back into Python. (at least this was true a year or 2 ago)
I'm always wary of creating more layers than necessary. A system composed of Python and C will necessarily have fewer layers than one composed of Python and Rust, simply because Python is written in C and its interface is defined in C.
EDIT: I left out the BIGGEST point -- a dealbreaker. Python does NOT have an ABI. It has an API. AFAIK, that means you have to write a Rust ABI-compatible wrapper for EVERY VERSION of Python.
https://www.python.org/dev/peps/pep-0384/
To make a generalization, programmers learn about C APIs before they understand what the ABI is. It's just more concepts that you need to know about to write correct code. So if you just want to speed up an existing Python program, I would still recommend using a simple C or C++ extension. This solves both single-threaded speed issues and give you parallelism with multiple cores, so you will get huge speedups.