What collaborative filtering algorithm are you using that requires terabytes of intermediate storage for gigabytes of input data?
I'm familiar with most approaches to CF (SVD, gradient descent, etc) and I can't think of any that require large amounts of intermediate storage.
> By and large scratch data ends up being much, much larger than the original inputs, if for no other reason than that needed during the shuffle/sort stage
I can't think of a single practical situation where you couldn't do your sorting online as you progress through the data. Again, the overhead of moving the data to-and-from S3 would be greater than processing the data locally (unless Amazon's LAN is faster than a SATA bus, which is unlikely).
> The author sounds like someone who may have read the academic papers and a few books but hasn't used these tools in practice.
You keep attacking the author in various ad hominem ways, yet you haven't yet provided a single uncontrived example of the small input data, large intermediate data scenario that your argument relies upon.