You don't need a new hardware interface. You only need a sensible API for write barriers. In particular, you want "semi-permeable" barriers, which is just another way of saying tagged commands, or grouped writes.
1) An fcntl() to set a numerical tag ID on an fd. All subsequent writes on the fd belong to this ID group.
2) No writes in a tagged group are allowed to proceed until all writes from a preceding group are completed.
3) Any untagged writes can complete at any time in any order, as usual.
Done. That's all you need to implement safe filesystem transactions, while eliminating the bottleneck of fsync().