Transformers perform qualitatively better than other architectures, and GPT2 (the most advanced public model at the time) shows near 100% accuracy. The best correlate of performance in the experiment is the next-word prediction accuracy of the model. Other AI performance metrics don't appear significant.
The conclusion is that this is strong evidence that the brain processes language using the same predictive algorithm as transformers. And GPT2 may have an architecture very similar to the language processing areas of the brain.