Yeah of course. I can't explain to you why NNs outperform bayesian approaches, probably just NNs are capturing the correct type of prior for vision/language tasks. And yeah bayesian models are more interpretable but when you have millions of latent variables I'm not sure interpretability is a thing.