I appreciate you're trying to wind it down so I'll try to get to the point, but there's a lot to unpack here.
I'm not evaluating these models on whether they are AGI, I am evaluating them on what they tell us about AGI in the future. They show that even tiny models, some 10000x to 1000000x times smaller than what I think are the comparable measures in the human brain, trained with incredibly simple single-pass methods, manage to extract semirobust and semantically meaningful structure from raw data, are able to operate on this data in semisophisticated ways, and do so vastly better than their size-comparable biological controls. I'm not looking for the human, I'm looking for small scale proofs of concepts of the principles we have good reasons to expect are required for AGI.
The curve fitting meme[1] has gotten popular recently, but it's no more accurate than calling Firefox ‘just symbols on the head of a tape’. Yes, at some level these systems reduce to hugely-dimensional mathematical curves, but the intuitions this brings are pretty much all wrong. I believe this meme has gained popularity due to adversarial examples, but those are typically misinterpreted[2]. If you can take a system trained to predict English text, prime it (not train it) with translations, and get nontrivial quality French-English translations, dismissing it as ‘just’ curve fitting is ‘just’ the noncentral fallacy.
Fundamental to this risk evaluation is the ‘simplicity and the naivete of the methods used in producing them’. That simple systems, at tiny scales, with only inexact analogies to the brain, based on research younger than the people working on it, is solving major blockers in what good heuristics predict AGI needs is a major indicator about the non-implausibility of AGI. AGI skeptics have their own heuristics instead, with reasons those heuristics should be hard, but when you calibrate with the only existence proof we have of AGI development—human evolution—, those heuristics are clearly and overtly bad heuristics that would have failed to trigger. Thus we should ignore them.
[1] Similar comments on ‘the same approaches in 1999’, another meme only true at the barest of surface levels. Scale up 1999 models and you get poor results.
[2] See http://gradientscience.org/adv/. I don't agree with everything they say, since I think the issue relates more to the NN's structure encoding the wrong priors, but that's an aside.