Nothing I said contradicts anything. I'm not sure where the disconnect is.
I didn't say replace new and delete with jemalloc, I said replace your allocator with jemalloc, which would mean new and delete end up calling that instead. This is a common and easy use of a different malloc implementation and is something I and many other people have done. jemalloc is also not the only allocator replacement and not the only one to focus on concurrency (tcmalloc and ptmalloc). There are also allocators like windows' built in thread local heaps. Some default malloc implementations now have some concurrency built in so they don't usually block.
What this comes down to is that new and delete don't have to block (in the common execution path) because the underlying allocator doesn't have to. This is a well worn problem and I have seen first hand parallel programs go from only using a single core while executing due to the default allocator blocking on every call to all cores being used with a new allocator. It is a problem created by too much allocation but the different implementations do deliver on their promise.
This is a separate issue from "lock free data structures" using memory allocation on every transaction which is a poor way to make any data structure and pushes a lot of the concurrency issues on to the memory allocator.
Hope that helps and that you learned something.