I wonder, do you have a hypothesis as to what would be a measurement that would differentiate AGI vs Not-AGI?
So one fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent. In fact you would prefer to feed it as minimal a series of the most primitive first principles as possible because it's certain that much of what we think is true is going to end up being not quite so -- the same as for humanity at any other given moment in time.
We could derive more basic principles, but this one is fundamental and already completely incompatible with our current direction. Right now we're trying to essentially train on the entire corpus of human writing. That is a defacto acknowledgement that the absolute endgame for current tech is simple mimicry, mistakes and all. It'd create a facsimile of impressive intelligence because no human would have a remotely comparable knowledge base, but it'd basically just be a glorified natural language search engine - frozen in time.
> So one fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent.
The first 22 years of life for a “western professional adult” is literally dedicated to a giant bootstrapping info dump
The zero training version not only ended up dramatically outperforming the 'expert' version, but reached higher levels of competence exponentially faster. And that should be entirely expected. There were obviously tremendous flaws in our understanding of the game, and training on those flaws resulted in software seemingly permanently handicapping itself.
Minimal expert training also has other benefits. The obvious one is that you don't require anywhere near the material and it also enables one to ensure you're on the right track. Seeing software 'invent' fundamental arithmetic is somewhat easier to verify and follow than it producing a hundred page proof advancing, in a novel way, some esoteric edge theory of mathematics. Presumably it would also require orders of magnitude less operational time to achieve such breakthroughs, especially given the reduction in preexisting state.
The moment after human birth the human agent starts a massive information gathering process - that no other system really expects much output from in a coherent way - for 5-10 years. Aka “data dump” some of that data is good, and some of it is bad. This in turn leads to biases, it leads to poor thinking models; everything that you described, is also applicable to every intelligent system - including humans. So again you presupposing that there’s some kind of perfect information benchmark that couldn’t exist.
When that system comes out of the birth canal it already has embedded in it millions of years of encoded expectations predictability systems and functional capabilities that are going to grow independent of what the environment does (but will be certainly shaped in its interactions by the environment).
So no matter what, you have a structured system of interaction that must be loaded with previously encoded data (experience, transfer learning etc) with and it doesn’t matter what type of intelligent system you’re talking about there are foundational assumptions at the physical interaction layer that encode all previous times steps of evolution.
Said an easier way: a lobster, because of the encoded DNA that created it, will never have the same capabilities as a human, because it is structured to process information completely differently and their actuators don’t have the same type and level of granularity as human actuators.
Now assume that you are a lobster compared to a theoretical AGI in sensor-effector combination. Most likely it would be structured entirely differently than you are as a biological thing - but the mere design itself carries with it an encoding of structural information of all previous systems that made it possible.
So by your definition you’re describing something that has never been seen in any system and includes a lot of assumptions about how alternative intelligent systems could work - which is fair because I asked your opinion.
If you took the average human from birth and gave them only 'the most primitive first principles', the chance that they would have novel insights into medicine is doubtful.
I also disagree with your following statement:
> Right now we're trying to essentially train on the entire corpus of human writing. That is a defacto acknowledgement that the absolute endgame for current tech is simple mimicry
At worst it's complex mimicry! But I would also say that mimicry is part of intelligence in general and part of how humans discover. It's also easy to see that AI can learn things - you can teach an AI a novel language by feeding in a fairly small amount of words and grammar of example text into context.
I also disagree with this statement:
> One fundamental difference is that AGI would not need some absurdly massive data dump to become intelligent
I don't think how something became intelligent should affect whether it is intelligent or not. These are two different questions.
You didn't teach it, the model is still the same after you ran that. That is the same as a human following instructions without internalizing the knowledge, he forgets it afterward and didn't learn what he performed. If that was all humans did then there would be no point in school etc, but humans do so much more than that.
As long as LLM are like an Alzheimer's human they will never become a general intelligence. And following instructions is not learning at all, learning is building an internal model for those instructions that is more efficient and general than the instructions themselves, humans do that and that is how we manage to advance science and knowledge.
Then when OpenAI does another training run it can also internalise that knowledge into the weights.
This is much like humans - we have short term memory (where it doesn't get into the internal model) and then things get baked into long term memory during sleep. AI's have context-level memory, and then that learning gets baked into the model during additional training.
Although whether or not it changed the weights IMO is not a prerequisite for whether something can learn something or not. I think we should be able to evaluate if something can learn by looking at it as a black-box, and we could make a black-box which would meet this definition if you spoke to a LLM and limited it to it's max context length each day, and then ran an overnight training run to incorporate learned knowledge into weights.