Just because jemalloc has a mutex.c file, that doesn't mean that common paths aren't meant to be lock free and in the case of lots of little allocations that can go into small bucket sizes in jemalloc they should be.
It is still putting your head in the sand since at some point they have to go to the OS and map in memory which should lock and lots of small allocations are a terrible way to anything for performance, but it is possible to have some paths in an allocator not have locks.
Also if there are thread local heaps, those won't lock either.
> Technically you could replace your allocator with jemalloc or something similar, but most people probably don't.
which suggests that the new/delete "blocking" nature can be solved just by replacing it with the jemalloc. That's nonsense because new/delete in itself is a plain stupid wrapper around the malloc/free. So, I still don't get the point of your commentary. It reads flawed and contradicting.
I didn't say replace new and delete with jemalloc, I said replace your allocator with jemalloc, which would mean new and delete end up calling that instead. This is a common and easy use of a different malloc implementation and is something I and many other people have done. jemalloc is also not the only allocator replacement and not the only one to focus on concurrency (tcmalloc and ptmalloc). There are also allocators like windows' built in thread local heaps. Some default malloc implementations now have some concurrency built in so they don't usually block.
What this comes down to is that new and delete don't have to block (in the common execution path) because the underlying allocator doesn't have to. This is a well worn problem and I have seen first hand parallel programs go from only using a single core while executing due to the default allocator blocking on every call to all cores being used with a new allocator. It is a problem created by too much allocation but the different implementations do deliver on their promise.
This is a separate issue from "lock free data structures" using memory allocation on every transaction which is a poor way to make any data structure and pushes a lot of the concurrency issues on to the memory allocator.
Hope that helps and that you learned something.