The traditional way to avoid overfitting is to reduce the number of independent variables, shrink coefficients towards zero, or otherwise limit the complexity of the model.
With deep neural networks the approach is different. Instead of trying to find global maximum (which is too hard, and will also cause the model to be grossly overfit), the algorithm stops much earlier. Such "underfit" models seem to generalize much better.