Except in the case of MR 2100 nodes the entire dataset fit in memory :)
Edit: on the other hand, this is an endorsement of the current wave of "per node performance stinks, let's avoid rewriting software for an extra year or two by throwing SSDs at it." Great for hardware vendors!
Also this was primary network bound. The old record had 2100 nodes with 10Gbps network.
So you have 100TB of disk read, followed by 100TB of disk write, all on HDDs. That's about 100GB/node; and since Hadoop nodes are typically in RAID-6, each write has an associated read and write too.
This does not even include the intermediate files, which (depending on how the kernel parameters have been set), could have been written on disk. Typical dirty_background_ratio is 10; so after 6GB of dirty pages, pdflush will kick in and start writing to the spinning disk.
Another way of looking at this is performance per watt or dollar. The r3.2 has 60GB, so comparing to that, Spark cost the same ~$1.4K while giving a 3X speedup. (Or on host that charges per minute, it'd be the same performance at 3X cheaper.)
This is me not knowing the space: would MR (or more modern things like Tez) perform worse on this HW setup, or is this a reminder that hardware/config tuning matters?