Everyone must now pay the mental cost of multithreading for the chance that you might want to optimize something.
That hasn't been true for many variable accesses for a very long time. LOAD_FAST, LOAD_CONST, and (sometimes) LOAD_DEREF provide references to variables via pointer offset + chasing, often with caches in front to reduce struct instantiations as well. No hashing is performed. Those access mechanisms account for the vast majority (in my experience; feel free to check by "dis"ing code yourself) of Python code that isn't using locals()/globals()/eval()/exec() tricks. The remaining small minority I've seen is doing weird rebinding/shadowing stuff with e.g. closures and prebound exception captures.
https://github.com/python/cpython/blob/10094a533a947b72d01ed...
https://github.com/python/cpython/blob/10094a533a947b72d01ed...
So too for object field accesses; slotted classes significantly improve field lookup cost, though unlike LOAD_FAST users have to explicitly opt into slotting.
Don't get me wrong, there are some pretty regrettably ordinary behaviors that Python makes much slower than they need to be (per-binding method refcounting comes to mind, though I hear that's going to be improved). But the old saw of "everything is a dict in python, even variable lookups use hashing!" has been incorrect for years.
I'm assuming that by "everyone" you mean everyone who works on the Python implementation's C code? Because I don't see how that makes sense if you mean Python programmers in general. As far as I know, things will stay the same if your program is single-threaded or uses multiprocessing/asyncio. The changes only affect programs that start threads, in which case you need to take care of synchronization anyway.
The mental cost of multithreading is there regardless because GIL is usually at the wrong granularity for data consistency. That is, it ensures that e.g. adding or deleting a single element to a dict happens atomically, but more often than not, you have a sequence of operations like that which need to be locked. In practice, in any scenario where your data is shared across threads, the only sane thing is to use explicit locks already.
That sounds like a fantastic reason to make it run faster on the multi-core CPUs we're commonly running it on today.
So if you care about performance why are you writing that part in python?
> multi-core CPUs we're commonly running it on today.
If you spawn processes to do work you get multi core for free. Think of the whole system, not just your program.