His conclusion is that "It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else".
There is an implicit assumption here that seems obviously false - that this "convergence point" of predictive performance represents the best that can be done with the data, which is to imply that these current models are perfectly modelling the generative process - the human brain.
This seems highly unlikely. If they are perfectly modelling the human brain, then why do they fail so badly at so many tasks? Just lack of training data?