> ... so that leaves me confused. My understanding is that the regression is triggered with the 7.0+ kernel and can be mitigated with huge pages turned on.
It gets a bit worse with preempt_lazy - for me just 15% percent or so - because the lock holder is scheduled out a bit more often. But it was bad before.
> My question therefore was how come this regression hasn't been visible with huge pages turned off with older kernel versions? You say that it was but I can't find this data point.
I mean it wasn't a regression before, because this is how it has behaved for a long time.
This workload is not a realistic thing that anybody would encounter in this form in the real world. Even without the contention - which only happens the first time the buffer pool is filled - you lose so much by not using huge pages with a 100gb buffer pool that you will have many other issues.
We (postgres and me personally) were concerned enough about potential contention in this path that we did get rid of that lock half a year ago (buffer replacement selection has been lock free for close to a decade, just unused buffers were found via a list protected by this lock).
But the performance gains we saw were relatively small, we didn't measure large buffer pools without huge pages though.
And at least I didn't test with this many connections doing small random reads into a cold buffer pool, just because it doesn't seem that interesting.