Let me add that the OP's experience that HDF5 files were less space efficient than comparable CSV files suggest that something was grossly amiss in his use of HDF5.
Great ideas never fade. They do get reinvented :).
(Why would I need a directory tree inside a file that only one process can write to anyway? Why wouldn't I just use the filesystem I already have?)
If you have multiple "tables" that belong together and you need one table to interpret the data in the other table, wouldn't you want them to be grouped together? If they are separate files on the filesystem there is always the risk of forgetting something when you share the data with somebody.
If you can put all the data of an experiment into one file, I think that is very convenient. After all, you don't have to read the complete HDF5 file if you are interested just in a subset of the data.
If I have multiple data files that need to go together, I would like to put them together with a widely-understood tool that has good APIs in many programming languages and can even be interacted with from the shell.
Also if you have dense data, you can use mmap, which isn't very space efficient but is very fast. I guess it could also be made to be space efficient if you use a filesystem with transparent compression.
If you want your mmap to magically see the uncompressed data, your file system will have to do decompress the data, and that doesn't come for free.
I would try and aim for compression in the application, as data size likely will be the bottleneck in reading and writing such files. If your data isn't very sparse, you could delta encode the indices of the non-zero columns, and use some variable-length encoding for it. Compressing each row of deltas may help after the delta encoding (especially if it is reasonably dense, because you expect the deltas to be small).
Once you go down that route, you have sacrificed simplicity, so you might just as well encode your floats, too.
"First, we knew we only cared about row-by-row access over the entire file; we do not need things like random row or column reads."
It sounds like they don't care about subsets of either columns or rows, and are looking to optimize table size and time for full table scans.
note: after reading a little more I suspect SQL would be faster, in fact.