One quibble, and really mean only one:
> a high-dimensional probabilistic causal framework
Deep learning models aka neural network type models, are not probabilistic frameworks. While we can measure on the outside a probability of correct answers across the whole training set, or any data set, there is no probabilistic model.
Like a Pachinko game, you can measure statistics about it, but the game itself is topological. As you point out very clearly, these models perform topological transforms, not probabilistic estimations.
This becomes clear when you test them with different subsets of data. It quickly becomes apparent that the probabilities of the training set are only that. Probabilities of the exact training set only. There is no probabilistic carry over to any subset, or for generalization to any new values.
They are estimators, approximators, function/relationship fitters, etc. In contrast to symbolic, hard numerical or logical models. But they are not probabilistic models.
Even when trained to minimize a probabilistic performance function, their internal need to represent things topologically creates a profoundly "opinionated" form of solution, as apposed to being unbiased with respect to the probability measure. The measure never gets internalized.