The approach you're describing only works for POD-style "structured data". Once you start using OOP of almost any type (though not every type), you no longer have ... well, POD that you can move to/from a storage medium. You have objects whose in-memory format IS important and compiler dependent.
There are other concerns too. His WAV example (I write pro-audio software for a living) doesn't even begin to touch on the actual complexities of dealing with substantial quantities of audio (or any other data that cannot reasonably be expected to fit into memory). Nor on the cost of changing the data ... does the entire data set need to be rewritten, or just part of it? How would you know which sections matter? Oh wait, the data fundamentally IS a byte stream, so now you have to treat it like that. If you don't care about performance (or storage overhead, but that's less and less of a concern these days), there are all kinds of ways of hiding these sorts of details. But the moment that performance starts to matter, you need to get back the bytestream level.
And so yes, there's no standard API and yes the lowest common denominator is byte streams ... because the __only__ common denominator is byte streams. Thinking about this any other way is a repeat of a somewhat typical geek dream that the world (or a representation of the world or part of it) can be completely ordered and part of a universal system.