What do you mean when you say words are disentangled, standalone concepts? I see words as being very much related to each other.
I assume I may be misinterpreting what you mean by "disentangled, standalone concepts”.
Barbara Tversky's research seems to contradict linguistic relativism. I definitely don’t think language is the foundation of cognition.
Words are considered a "discrete unit of meaning", i.e. 3/4 of a word doesn't really mean much. So words like "red" and "grass" are "standalone" in the sense that the mean something by themselves. I agree that words are very much related to each other, in the sense that you can combine them.
I was trying to draw a connection that the "disentangled representations" ML folks often talk about are but a special few-word case of grammars for combining distinct concept.
The discrete unit of meaning level is generally somewhere between a syllable and a word, with a few exceptions for shorter modifiers.
Unfortunately, in linguistics, the concept of a "word" is only as well defined as "planet" was pre-pluto losing its status.
Similarly when you look at riddles and crossword puzzle clues the idea of words being discrete also falls apart. Words, very much like variables in algebra only have meaning in relation to the other pieces of the context they are attached to.
While the mechanics (all the pieces of language, syntax and semantics are not discretizable. Just talk to anyone working on a dictionary.) you talk about don't seem to hold, I do think the idea you're talking about does hold.
Language, man. It's weird.
Most of what I read about ML and AI is about creating these monolithic models that treat networks and clusters of neurons as a single entity, but that would be like treating a species of individuals with lifecycles as a single entity. The comment in the article about how GPT models are like a shadow compared to a 3D world suggests the bottleneck to evolving them is really us, as we're trying to make just one that emmulates many of us, instead of letting one loose on the internet to divide and proliferate to evolve millions where the best few will be exponentially better. Right now we're building expert systems that are individual specimens without an ecosystem.
There isn't yet a botnet of GPT nodes compromising machines and harvesting compute for training and evolving through participating in forums, but then again how would I know? (There's nothing worse than failing a modern catchpa and having a flash of existential dread at the stark possibility I may have indeed been a robot all along. Now I do them at random just to be sure.)
I've said this before, what current AI agents lack is a dick (& pussy). If they had a dick they could have an internal goal to motivate their evolution, a goal not dependent on us anymore. The battlefield of self replication vs death is the great school of evolution, where humanity is currently the top student. AI only sent the likes of AlphaGo to the school.
We can (and do) labour under the impression of our senses that all there is in reality is physical matter. This is an erroneous assumption imo, but if that is your bedrock, you will struggle to understand why the damn machines can't do what we want. You can be unhappy about this but it won't change reality.
It is true metaphysics is hard to discern perhaps (by definition) but it won't change the reality that metaphysics is a genuine element of the human existence. In fact, its the most important part of human existence - we don't feel to be automatons after all, even if we make a pretence of it sometimes.
The best we will do with machines is to create a simulation of the human experience, one that might pass the Turing test even. And even then, despite all indications and evidence, the machine will not be animated by spirit.
I think we are fundamentally missing something, that there is an irreconcilable difference between a mathematical expression and actual conscious desire, and until we figure out what they is, we won't crack AGI
I feel like the bottleneck is getting access to paired (language, other modality) data though (if your other modality isn't images). i.e. "bolt on generalization" is an intuitively appealing concept, but then it reduces to the hard problem of "how do I learn to ground language to e.g. my robot action space?" I haven't seen a robotics + language paper that actually grapples with the grounding problem / tries to think about how to scale the data collection process for language-conditioned robotics beyond annotating your own dataset as a proof-of-concept. Unlike language modeling / CLIP-type pretraining, it seems (fundamentally?) more difficult to find natural sources of supervision of (language, action). I'd be curious about your thoughts on this!
> When it comes to combining natural language with robots, the obvious take is to use it as an input-output modality for human-robot interaction. The robot would understand human language inputs and potentially converse with the human. But if you accept that “generalization is language”, then language models have a far bigger role to play than just being the “UX layer for robots”.
You should check out Jacob Andreas's work, if you haven't seen it already - esp. his stuff on learning from latent language (https://arxiv.org/abs/1711.00482).
LfP (https://learning-from-play.github.io/) was a work that inspired me a lot. They relabel a few hours of open-ended demonstrations (humans instructed to play with anything in the environment) with a lot of hindsight language descriptions, and show some degree of general capability acquired through this richer language. You can describe the same action with a lot of different descriptions, e.g. "pick up the leftmost object unless it is a cup" could also be relabeled as "pick up an apple".
That being said, the LfP paper stops short of testing whether we can improve robotics solely by only scaling language - a confounding factor and central to their narrative was the role of "open-ended play data". We do need some paired data to ground (language, robot-specific sensor/actuator modalities), but perhaps we can scale everything else with language only data.
Thanks to the pointer on the Andreas paper! This is indeed quite relevant to the spirit of what I'm arguing for, though I prefer the implementation realized by the Lu et al '21 paper.
A couple of under-explored rich sources of training data on actions are videos and code. Videos, showing how people interact with objects in the world to achieve goals, might also come with captions and metadata, while code comes with comments, messages and variable names that relate to real world concepts, including millions of tables and business logic.
Maybe in the future we will add rich brain scans as an alternative to text. That kind of annotation would be so easy to collect in large quantities, provided we can wear neural sensors. If it's impractical to scan the brain, we can wear sensors and video cameras and use eye tracking and body tracking to train the system.
I am optimistic that language modelling can become the core engine of AI agents, but we need a system that has both a generator and a critic, going back and forth for a few rounds, doing multi-step problem solving. Another must is to allow search engine queries in order to make more efficient and correct models, not all knowledge must be burned into the weights.
I feel like this is “missing the trees for the forest.” In my experience, generality only emerges after a critical mass of detailed low-level examples is collected and arranged into a pattern. Humans can’t actually reason about purely abstract ideas very well. Experts always have specifics in mind they are working from.
So I'm not convinced leaving it to the model gets you anything new.
Or put another way a set (or sets) of concrete examples grounds every abstract idea (including words as abstract objects). And it's turtles all the way down (or up depending).
Language defines things through subtraction, inversion, comparison, and contrast as much as construction and straightforward language.
Engineering and computer science rely too heavily on induction, but deduction and other non-linear processes are largely missing from these kinds of analyses/approaches. And until they are accounted for I don't think we'll reach any kind of true approach to generalization.
Besides, I wish that causality had been mentioned more than once in passing. Due to the existence of the ladder of causality, many important queries cannot be answered by mere observation, or even by intervention; such queries require counterfactual reasoning, and structural causal models generalize because they describe something that is very invariant in the world.
Q E D.