Yes, noise & regularization seem to be key here. I've gotten a 2-layer with 7/8 neurons down to 0.06 and dropping but only with noise & l1:
http://playground.tensorflow.org/#activation=relu®ulariza... Final loss of 0.051. Interestingly, increasing noise from 10 to 15 destroys performance, loss of 0.47.