I don't deny that knowing the math / theory is useful, but wonder if we sometimes overestimate the degree to which it is essential. For example, backprop with SGD is a good foundation for many, many, many applications of NN's, and pre-built implementations exist that let you use the technique without understanding the details of the math. And with those tools, you can experiment with many different combinations of features, different architectures, etc.
Of course understanding the theory will be helpful in knowing which architectures are most likely to be productive and what-not, but this whole field is very empirical anyway. So if your experimenting is a little less guided my intuition rooted in theory, that's not exactly the end of the world.