Building a hardware convnet is only beneficial when you figured out the exact parameters of the network. Everything is hardwired. Therefore, it's useless if you want to experiment with lots of different parameters, tricks, or architectures to "advance the field".
Moreover, building such a chip is an expensive and long process, and given how fast GPUs are improving, it's not clear that by the time you build it, it will still be competitive.
Finally, if you want to speed up training, try to figure out a better algorithm. For example, humans can learn from very few training examples. Current neural networks need many thousands, or even millions. There's a potential million-fold speed up in training time right here - and to find it, you need the flexibility of the software.