I'm also looking forward to PEP-554 [0], which allows for "subinterpreters" for running concurrent code without removing the GIL or incurring the overhead of subprocesses.
If anything, user-input strings must be treated as tainted whatever the case.
The current extension module API encourages global state, e.g. types (and objects too) are allocated statically in a global C variable. For example, there is a global C variable `_Py_NoneStruct` that is the Python `None` value, and extension modules are accessing this variable directly.
Every use of this object needs to adjust its reference count, and that reference count is directly stored within the C global variable.
`_Py_NoneStruct` is currently even exposed in the PEP 384 stable ABI. Existing extension module binaries are commonly directly touching `_Py_NoneStruct.ob_refcnt` without any synchronization. Breaking the PEP 384 compatibility promise is fundamentally unavoidable here.
One of two things must happen: Alternative one: All refcount operations must be made atomic for thread-safety. These are really really common in the Python interpreter, but atomic operations are expensive on modern CPUs (especially if there's contention). But multiple threads using the value `None` would be quite common in Python code, so I doubt you'd gain any speed even at today's core counts -- in fact I'd expect the constant inter-CPU-communication for the refcounts to make everything slower than just using a single core with today's Python!
So alternative two, ensure no Python objects are shared between the subinterpreters. That's the plan. But that also means it's a breaking change for extension modules. And it's not just an ABI change (which would be handled by merely recompiling against the new headers). Any extension modules that do not yet support PEP 489 are already incompatible with subinterpreters, so that will take quite a bit of work until the ecosystem is upgraded. But there will probably also be some other breaking API changes. I think type objects are currently shared across subinterpreters, and those are frequently defined as a `Py_TypeObject` global variable in extension module code. Also, if every subinterpreter has its own GIL, extension modules calling `PyGILState_Ensure()` will have to specify which subinterpreter they will be using, so that the appropriate lock can be acquired.
My prediction: 3.9 may have the basic functionality, but it still won't be able to run on multiple cores concurrently. That will take a bunch of more work and breaking changes, and will likely be released as Python 4.0.
There will be another slow upgrade process ("my dependencies must upgrade before I can") until the Python ecosystem is multi-subinterpreter-compatible. But at least this one only affects extension modules.