What are you referencing here?
This surveys the excellent storage fault research from UW-Madison, and in particular:
“Can Applications Recover from fsync Failures?”
“Protocol-Aware Recovery for Consensus-Based Storage”
Finally, I'd recommend watching “Consensus and the Art of Durability”, our talk from SD24 in NYC last year: [disks are] somewhere between non-byzentine fault tolerance and
Byzantine fault tolerance ... you expect the disk to be almost
an active adversary ...
...
so you start to see just a single disk as a distributed system
My goodness, not at all! If you can't trust the interface to a local disk then you're lost just at a fundamental level. And even ignoring that, a disk is an implementation detail of a node in a distributed system, whatever properties that disk may have to that local node are irrelevant in the context of the broader system, and are the responsibility of the local node to manage before communicating anything with other nodes in that broader system.Combined with https://www.youtube.com/watch?v=tRgvaqpQPwE it seems like the author/presenter is conflating local/disk-related properties/details with distributed/system-based requirements/guarantees. If consensus requires a node to have durably persisted some bit of state before it sends a particular message to other nodes in the distributed system, then it doesn't matter how that persistence is implemented, it only matters how that persistence is observable, disks and FS caches and etc. aren't requirements, they're just one of many possible implementation choices.
It’s a mindbender of a paradigm-shift for how to think about local recovery actions in the context of the global consensus protocol!