undefined | Better HN

0 pointsStillBored12y ago0 comments

I don't think this is accurate. We have a kernel bypass for disk operations. We use our own memory buffers, and bypass the filesystem, block, and SCSI midlayers. Our stuff is basically what O_DIRECT should be.

There are cases where we are 50% faster than O_DIRECT without any "caching". Furthermore, in high bandwidth applications (>4GB/sec) without O_DIRECT its easy to become CPU limited in the blk/midlayer so again we win.

Now that said, I haven't tried the latest blk-mq, scsi-mq, etc patches which are tuned for higher IOP rates. These patches were driven by people plugging in high performance flash arrays and discovering huge performance issues in the kernel. Still, I expect if you plug in a couple high end flash arrays the kernel is going to be the limit rather than the IO subsystem on a modern xeon.

0 comments

1 comments · 1 top-level

dekhn12y ago

Sure, if you're dealing with extremely high bandwidth apps (4GB/sec is pretty high bandwidth!) what I said doesn't apply.

THe number of people who are sustaining 4GB/sec (on a single machine/device array) is pretty small and they have a reason to go beyond the straightforward approaches the kernel makes available through a simple API (everything you described, like bypass, puts you in a rare category).

Anyway, when I was swapping to SSD, the kswapd process was using 100% of one core while swapping at 500MB/sec. I suspect many kernel threads haven't been CPU-optimized for high throughput.

j / k navigate · click thread line to collapse