I think that was largely a misunderstanding. 20+ years ago I took an AI class that mentioned using multiple levels was useful for training neural networks. It also mentioned a 2 layer network was only a universal approximator given arbitrarily large numbers of nodes which again seems to be forgotten about.
Though the teacher worked in industry for a while which may have been relevant as we didn’t focus that much on theory.
PS: Deep learning was also more about improving computational power than some major theoretical advancement.