I think the real argument is more nuanced. Where you see mmap() fail badly on Linux, even for read-only workloads, is under a few specific conditions: very large storage volumes, highly concurrent access, non-trivial access patterns (e.g. high-dimensionality access methods). Most people do not operate data models under these conditions, but if you do then you can achieve large integer factor gains in throughput by not using mmap().
Interestingly, most of the reason for these problems has to do with theoretical limitations of cache replacement algorithms as drivers of I/O scheduling. There are alternative approaches to scheduling I/O that work much better in these cases but mmap() can’t express them, so in those cases bypassing mmap() offers large gains.