Here's the scenario: if you were asked to build out three Linux machines that would be used together in a cluster to perform data mining and machine learning tasks, with the occasional mapreduce job thrown in, how would you spec the machines out? What distro would you use? Any must have software installs?
With regards to the hardware, what is your preference for manufacturer? How much would you expect to pay per machine?
Your thoughts and suggestions are appreciated!