There is a lot of untapped parallelism readily available waiting for the right code.
That said, I think the original comment was rightly pointing out how easy it was to make the change and test it, which in this case did turn out to be noticeably faster.
That's the real issue here! Most language have poor abstractions for parallelism and concurrency. (Many languages don't even differentiate between the two.)
Encapsulating and abstracting is how we make things usable.
Eg letting people roll hash tables by themselves every time they want to use one, would lead to people shooting themselves in the foot more often than not. Compared to that, Python's dicts are dead simple to use. Exactly because they move all the fiddly bits out of the reach of meddling hands.
I think the example case in this subthread was about making some long app operations asynchronous and overlapping, which is a more forgiving use case than trying to make a piece of code faster by utilizing multiple cores.