2023: RL agent trained for multi-task learning solves majority of perfect information games. It's a scaled up decision transformer. Scaling laws for RL agents are discovered, similar to language models.
2024: Large scale RL agents are combined with frozen vision and language models via cross-attention, can be prompted one-shot with language/vision tokens to solve novel tasks.
2025: RL agents enter the real world - first pre-trained in diverse synthetic environments, then via imitation learning from youtube videos, and finally in an online fashion via realtime human interaction.
timeline might be optimistic, but one can hope!
I'm interested to see how the field advances, but it won't lead to AGI, it will lead to cool tricks that the ignorant think are sufficient to replace a real person. That will suck
Still waiting for a followup to https://arxiv.org/abs/2104.03113 ...
Of course, it's not actually performing introspection, and it's just lucky that it guessed the right answer here. Perhaps it's just learned that when conversations discuss a general case (how do humans perform) and then turn to a specific case (how about you?), there is typically some difference between the two that should be noted. But it still gives an illusion of an unbelievable capability.
The thing to bear in mind when reading the dialogue examples in figure 11 is the custom prompt shown in Appendix D:
``` This is a conversation between a human, User, and an intelligent visual AI, Flamingo. User sends images, and Flamingo describes them. User: <a cat image> Flamingo: That is a cat. It’s a tiny kitten with really cute big ears. User: <a dinner image> Flamingo: This is a picture of a group of people having dinner. They are having a great time! User: Can you guess what are they celebrating? Flamingo: They might be celebrating the end of a successful project or maybe a birthday? User: <a graph image> Flamingo: This is a graph, it looks like a cumulative density function graph. ```
My personal opinion would be, once you're doing next token prediction with this description of what Flamingo "is" in the history, then "I am not affected by this difference" is a pretty reasonable completion rather than a lucky guess. It definitely was exciting for the team that this whole example worked so nicely, but if you discard the visual side, this "illusion of an unbelievable capacity" has been seen in other works as well.
I'm just grateful our AI overlords can tell the difference between affected and effected, even if they're not affective or effective.
https://prowritingaid.com/grammar/1000196/Effected-vs-affect...