undefined | Better HN

0 pointsjoosters8y ago0 comments

Yeah, non-blocking disk I/O is painful. I've never had to write code that did enough disk reading/writing to try to parallelise it, so I've never researched the solutions that much. What I don't understand is why disk I/O should require a different API to be non-blocking. After all, network I/O can be blocking or non-blocking with the same read() and write() calls that disk I/O uses.

There must be some reason why no OS offers non-blocking disk I/O in this way, but I don't know what it is.

0 comments

2 comments · 2 top-level

jabl8y ago

The short version is that disk I/O is integrated with the page cache with all that entails, and allows things like seeking.

To an extent this is a design rooted in a world where disks are much faster than network access. But perhaps that world is coming back with SSD's/NVME/etc..

trentnelson8y ago

NT has asynchronous I/O for disk. And network, and anything I/O-based. In UNIX land, you have read() and write(). These calls translate to system calls, and for most cases, a driver is called upon to process the I/O request. The driver starts the request, and then after some time has passed and the underlying hardware has completed the I/O, "completes" the request, which bubbles back to the user via the return of the read() or write() call.

Now, with NT, you have ReadFileEx() and WriteFileEx(). However, a user can call them in such a way such that the semantics are: "hey, try and read this, if you can do it immediately without blocking, great... if you have to block, then do whatever you need to do in the background to make that happen, but still return to me without blocking".

That, and that alone, is the key difference between the inherently synchronous I/O model of UNIX, and the inherently asynchronous I/O model of NT. The entire NT I/O subsystem, cache manager, driver API, memory management, APCs, scheduling et al is predicated around the notion of every I/O request being asynchronous.

If everything happens to be in the right spot at the right time, sometimes an I/O call can be synchronous (i.e. user->kernel->user without a context switch due to a required wait). In every other case, the kernel won't be able to complete it there and then, so, it checks to see if the user still wants that read or write call to return immediately -- which implies "asynchronous I/O" (referred to as "overlapped" I/O in NT parlance, because you're overlapping an I/O request with more compute).

Windows kernel drivers are fundamentally more complex than corresponding Linux drivers because the kernel's I/O model is fundamentally more sophisticated -- everything is packet driven (the "I/O request packet", or Irp), your driver's read/write entry points need to be able to query the incoming I/O request and determine if the user wants sync/async, how you need to return the call so that the I/O manager can furnish the correct behavior to all the other pieces of the subsystem (and potentially other drivers that are layered higher and lower), and a huge number of other subtle details.

The added complexity is required because the fundamental I/O model is asynchronous. In the UNIX synchronous I/O model, there's simply no semantic concept -- at both the driver level, kernel level, and APIs exposed to the user -- to say "here, read() this and return immediately -- if it can be done synchronously, great, if not, kick it off in the background and give me some opaque structure back I can use in the future to check on the completion of the operation".

The other huge advantage of NT is the notion of thread-agnostic I/O. That is, the thread that initiates one of these asynchronous read requests doesn't have to be the same thread that completes it. Although it sounds simple, that's one of those tip-of-the-iceberg technical things where there are so many pieces behind the scenes that need to cooperate to facilitate the functionality. I talk a little bit about thread-agnostic I/O here: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-....

So, to summarize, all discussions regarding asynchronous I/O and M:N threading on UNIX are sort of fundamentally flawed because the underlying primitives can't express what is actually needed (an asynchronous I/O subsystem at the kernel level, thread-agnostic completion-oriented I/O, and ideally, thread pools + completion ports) to achieve the end goal: optimally using your underlying hardware :-)

(Optimal hardware usage necessitates one thread running per core, and the ability for any one of these threads to continue program logic upon completion of an I/O request, regardless of whether or not they were the thread to initiate that request.)

j / k navigate · click thread line to collapse