You need a stronger bound than this. They have to be possible to approximate govern specific network size, architecture and activation functions. Calculating that (or good statistics that will say so approximately) is a hard problem...
It is solvable for a bunch of activations in a layered perceptron but attempt extending this to something more complex.