You just need to make sure all nodes have clocks properly set and synchronized, hostnames and DNS names are set for each machine, SE Linux is turned off, and firewall settings (on each node) allow machines to talk to each other.
Also, Dell offers a downloadable Crowbar ISO for Hadoop - and that's Puppet-based, if you are familiar with Puppet.
This is where LLDP comes in handy. Run an LLDP agent on each node and enable LLDP on the switch access ports. Then it's just a matter of the NameNode fetching LLDP neighbor information from the switch (usually by using SNMP) and updating it's Rack Awareness.
(Disclaimer: I know nothing about Hadoop...)
Monitoring is also completely missing: We use Icinga/Nagios and Ganglia. Especially Ganglia is invaluable to adjust the configuration for optimal machine usage in my experience.
Another point worth considering is Security. Hadoop by default is secured like NFS. That means any user that is able to create an hdfs or hadoop user on a machine that has access to the NameNode can delete your HDFS. Hadoop can use Kerberos for security.
Also consider adding Snappy Compression to your setup, it speeds up the shuffle phase.
Last but not least - I've found these slides about Hadoop Tuning invaluable: http://www.slideshare.net/cloudera/mr-perf
@meinuelzen: We use Oracle JDK 7 with en_US.UTF-8 and ntpd on all machines. Ubuntu 10.04 / Ubuntu 12.04 but the OS should not matter. Lot's of RAM and lot's of disks are more important.
https://ccp.cloudera.com/display/DOC/Documentation#Documenta...
Regards,
What would you recommend as default locales settings for the systems? I guess LC_ALL=en_US.UTF-8, right?
And what about server date and time? Using NTP or not?