That's what the unsupervised learning is for. GPT doesn't have labels either, just raw data.
What's the medical imaging equivalent to "predict the next word"?
Presumably all these images would be connected with what ended up happening with the patient months or years later
It seems to me that we're basically already "there" in terms of AGI, in the sense that it seems clear all we need to do is scale up, increase the amount and diversity of data, and bolt on some additional "modules" (like allowing it to take action on it's own). Combine that with a better training process that might help the model do things like build a more accurate semantic map of the world (sort of the LLM equivalent of getting the fingers right in image generation) and we're basically there.[1]
Before the most recent developments over the last few months, I was optimistic on whether we would get AGI quickly, but even I thought it was hard to know when it would happen since we didn't know (a) the number of steps or (b) how hard each of them would be. What makes me both nervous and excited is that it seems like we can sort of see the finish line from here and everybody is racing to get there.
So I think we might get there by accident pretty soon (think months and not years) since every major government and tech company are likely racing to build bigger and better models (or will be soon). It sounds weird to say this but I feel like even as over-hyped as this is, it's still under-hyped in some ways.
Would love your input if you'd like to share any thoughts.
[1] I guess I'm agreeing with Nando de Freitas (from DeepMind) who tweeted back in May 2022 that "The Game is Over!" and that now all we had to do was scale things up and tweak: https://twitter.com/NandoDF/status/1525397036325019649?s=20