I agree that it's exciting to move toward a mathematical approach to characterize what can, and what can not be learned by a DNN!
Maybe someone else has the knowledge to give you some more precise answers.
As far as I understand we need to differentiate between higher order relations that can be learned by message parsing and those that cannot.
Some higher order relationships can be represented in a graph (of pairwise relations) and, furthermore, can be learned by iterating out from a focal node through the paths (think centrality). GNNs can learn those.
But GNNs can't learn all Graphs because not all such relations are amendable to message parsing. So, not all higher order information contained in a graph are learned by GNNs.
In the present case, the authors seem to make the point that there are higher-order relations (say, node-set) which are also not representable by a graph (or a collection of graphs).
I actually do not know whether that is obviously true. For instance, hypergraphs can be modeled as a collection of graphs. But if the claim stands, then we'd need topological deep learning to move further. Otherwise, like you say, it is maybe a more general framework to understand information processing in DL - or more efficient to learn node-node or node-set information.
Either way, I have not yet quite figured out how Category Theory and Topological DL relate to each other (each generalising Geometric Deep Learning).