Hello hello! Pretty sure what actually happened in these is that the HDD's internal cache was still active, while the Crucial M4 SSD has no internal cache. The only other explanation is that I screwed up my partition offset on the SSD but I already double-checked that and the partitions were all 2MB aligned.
I'm going to compile up LMDB and bench it on a 96GB DL380g8 with quad 3TB ioDrive-2s. Should be interesting to see how various database sizes play out, and what the write amp looks like. I am not seeing much about LMDB's NUMA awareness -- guess I need to keep digging.
For reads we get linear scaling out to 64 cores. Using cache-aligned data structures plays a big part in that for NUMA. (At the moment that's the largest machine we have in our lab.) For writes, there's basically no scaling. Write amplification is logN, proportional to tree height.