I'm not sure how he manages to crawl with this speed using such low amount of resources.
We did a benchmark on Nutch and couldn't really pass the 10-14 M(B)ps on a $1200/month machine. Even though we hired a professional to optimize the setup. The same is roughly true about Heritrix.
Just wondering if there is something missing in his setup, such as domain/ip rate limiting.