If you stick to one process per core, the number of TLB flushes doesn't change. You can set processor affinity to make sure of that. If you create more threads/processes than cores, you might be able to get measurable impact.
I don't understand your comment about mmap. It is often used to share memory between related processes.