You don't have to. It's just that there are a lot of good image classification network architectures that exist already so you can more or less grab a good network off-the-shelf, then give it more targeted training examples so that it performs well in your specific use-case.
I guess it comes down to the fact that a lot of these signal processing techniques involve convolution.