The lore I've heard is that most new deep learning training algorithms (optimization algorithms) only work better on particular special cases, and it is hard to do better than the established algorithms in general.
I'm also not sure why you're saying they're applicable beyond deep learning--how do you plan to train a PGM or SVM using Adam?
And on what data you decide to move on to NN based model rather than using a simpler models like linear regression etc?
Once upon a time, when I used to hire data people, I'd ask them to tell me about a recent data project. They'd normally mention some kind of complex model, and I'd ask them how much better it was than linear/logistic regression. A really large proportion of candidates (around 50%) couldn't answer this because they'd never compared their approach to anything simpler.
One person told me that linear regression wasn't in the top 10 Kaggle models, so they would never use it.
NN’s are universal function approximators. They can have arbitrary model capacity, and you can sort of control that with architecture decisions, loss function/regularization choices, and early stopping, but depending on the problem they can cause more problems than they solve. Usually you don’t really know if your NN will generalize well outside of your train/test distributions, so many times it’s better to have a simpler, more predictable model that you can control the behavior of. This is all from my personal experience and is completely moot when we’re talking about e.g. NLP or vision tasks or situations where you’re drowning in data. NNs are super interesting and powerful, don’t mean to suggest otherwise but the mantra is: “what is the right solution to my problem”. Lots of great advantages to NN’s as well (you can get them to do anything with enough cajoling and they can be solutions to major headaches you would usually have in e.g. kernel methods).
They are usable everywhere derivative-based optimization is usable. Which certainly means SVM's, though since it's a shallow method you don't need much data to train it, and hence don't need a scalable optimization methods (it would just be unnecessarily slow). But you certainly could do it if you somehow needed to. Here's the first hit on google for "sgd svm': https://scikit-learn.org/stable/modules/generated/sklearn.li...
The fact that you can't use first order optimization methods for graphical models is one answer to the question of why everyone doesn't use them. Though for small models there are deep networks which model them and are trained as per usual for neural networks. I think this is still an active research area.