When I first saw the benchmark result I was pleasantly surprised by the performance, you rarely see that on a single large server but it is achievable if the implementation properly does all the hard bits.
Then I saw that it required 140(!) servers to achieve that result and now I’m wondering what all that hardware is actually doing. On a per-server basis, that is very low throughput, even for graphs. Efficiency that low will make it uneconomical for most graph applications.