> This find | xargs mawk | mawk pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.
Well this is great until you need more nodes. :) I am talking about the same scalability while maintaining a much lower ecological and financial footprint.