That being said, robust systematic generalization is still a hard problem. But "achieve symbol grounding through tons of multimodal data" is looking more and more like the answer.
[1] https://openai.com/blog/dall-e/ [2] https://distill.pub/2021/multimodal-neurons/ [3] https://openai.com/blog/openai-codex/
In my mind, understanding a thing means you can justify an answer. Like a student showing their work and being able to defend it. An answer with a proof understands the answer with respect to the proof it provides. E.g. to understand an answer with regards to first order logic, it'll have to be able to defend a logical deduction of that answer.
These models still can't justify their answers very well, so I'd say they're accurate but only understand with respect to a fairly dumb proof system (e.g. they can select relevant passages or just appeal to overall accuracy statistics). They're still far from being able to justify answers in the various ways we do, which I'd say means that by definition that they still don't understand with regards to the "proof systems" that we understand things with regards to.
Maybe the next step will require increasingly interesting justification systems.
Do you understand cats? If I show you a picture of either a cat or a dog do you think you can tell which one it is? I think most people could solve that challenge, and if pressed they could vax poetically about what makes them think it is a cat. Maybe they would mention the shape of an ear, or talk about feline grace or what have you. But is that really a “justification”? Let alone one they can “defend”? How would “defending” even work in this situation?
What if the language model can generate a step-by-step explanation in the form of text? [0]
There's no guarantee that the reasoning was used to come up with the answer in the first place, and no proof that the reasoning isn't just the product of "a really fancy markov chain generator", but would you accept it?
We're really walking into Searle's Chinese Room at this point.
Given the question: "Jane has 9 balloons. 6 are green and the rest are blue. How many balloons are blue?" The model outputs: "jane_balloons = 9; green_balloons = 6; blue_balloons = jane_balloons - green_balloons; print(blue_balloons)"
That seems like a good justification of a (very simple) step-by-step reasoning process!
Sure, but how does that work with superhuman AI? Consider some kind of math bot that proves theorems about formal systems which are just flat out too large to fit into human working memory. Even if it could explain its answers, there would just be too many moving parts to keep in your head at once.
We already see something this in quant funds. The stock trading robot finds a price signal, and trades on it. You can look at it, but it's nonsensical: if rainfall in the Amazon basin is above this amount, and cobalt price is below this amount, then buy municipal bonds in Topeka. The price signal is durable and casual. If you could hold the entire global economy in your head, you could see the chain of actions that produce the effect, but your brain isn't that big.
Or you just take it on faith. Why do bond prices in Topeka go up, but not in Wichita? "It just does." Okay, then what was the point of the explanation? A machine can't justify something you physically don't have enough neurons to comprehend.
You can just ask it to comment what it intends to do. It's surprising actually.
This is still true. By all account, human doesn't need to read 159GB of Python code to write Python, or we simply can't.
But it doesn't necessarily indicate language models aren't useful.
By the time we see our first line of code, most of us have seen a ridiculous amount of data. We've been trained in problem solving, logical reasoning, maths, natural language processing, ... Hell, we've been trained as pattern matchers since we've been born.
By my account, humans actually need a large amount of training data. It might be the knowledge federation and generalisation that we're good at, but I don't think we're a clear winner in data efficiency.
"Indeed, a strong student who completes an introductory computer science course is expected to be able to solve a larger fraction of problems than Codex-12B."
This suggests to me that Codex really doesn't understand anything about the language beyond syntax. I have no doubt that future systems will improve on this benchmark, but they will likely take advantage of the AST and could use unit tests in a RL-like reward function.
In the end, a more general approach with more compute, always wins over applying domain knowledge like taking advantage of the AST. This is called “the bitter lesson”. http://www.incompleteideas.net/IncIdeas/BitterLesson.html
I am a big fan of LMs and am not in the don't really understand crowd, but here are a couple of reasons:
1. Large language models such as GPT or Codex still have several major architectural limitations. They lack the ability to make use of long-term memory, since they have a fairly limited amount of info they can take as input; GPT 3 is great at short stories, but can't go beyond that, and it's hard to prime it with a lot of information as you would eg a new employee. There is some work on this, but afaik not very much and it's very much unsolved.
2. Large language models have only gotten this good by ingesting massive amounts of data and scaling up compute. Yet, this growth comes with diminish returns for every order of magnitude. So it just not being to scale either the data or the compute needs sufficiently (with existing hardware architectures) is a very plausible reason.
3. Large language models 'have it easy' because they only deal with one modality (text). Humans intelligence on the other hand is multimodal - we can process vision inputs, sound, touch, etc. sound, etc. simultaneously and share concepts between these modalities. And we likewise output motor commands that result in motion, text. So far it's not too obvious how to achieve this - OpenAI took a step with DALL-E, but that was by just mining a massive amount of image-text pairs, and it's not obvious this is easy for other modalities, in particular for motor control.
4. Human-level intelligence is often framed as having system 1 (reactive output) and system 2 (longer term reasoning not in response to immediate stimuli) - this is not at all present in language models.
5. related to above two, at least some of human intelligence is derived from reinforcement learning (optimizing a policy that is multi-step with a delayed reward). This is much harder than the plain self-supervised learning of LMs.
And probably there are a bunch more like these. So while I do think these sorts of models represent a lot of progress, there are many reasons to be doubtful that just 'scale it up' will work to get much further.
At every point in time, the best systems we can build today will be ones leveraging lots of domain-specific information. But the systems that will continue to be useful in five years will always be the ones freely that scale with increased parallel compute and data, which grow much faster than domain-specific knowledge. Learning systems with the ability to use context to develop domain-specific knowledge "on their own" are the only way to ride the wave of this computational bounty.
I have a sneaking suspicion that, if blinded, the crowd of people saying variations of that quote would also identify the vast majority of human speech as regurgitated ideas as well.
> I see no reason that this technology couldn't smoothly scale into human-level intelligence
Yup, the OpenAI scaling paper makes this abundantly clear. There is currently no end in sight for the size that we can scale GPT to. We can literally just throw compute at the problem and GPT will get smarter. That's never been seen before in ML. Last time I ran the calculations I estimated that, everything else being equal, we'd reach GPT-human in 20 years (GPT with similar parameter scale as a human brain). That's everything else being equal. It is more than likely that in the next twenty years innovation will make GPT and the platforms we use to train and run models like it more efficient.
And the truly terrifying thing is that, to me, GPT-3 has about the intelligence of a bug. Yet it's a bug who's whole existence is human language. It doesn't have to dedicate brain power to spatial awareness, navigation, its body, handling sensory input, etc. GPT-human will be an intelligence with the size of a human brain, but who's sole purpose is understanding human language. And it's been to every library to read every book ever written. In every language. Whatever failings GPT may have at that point, it will be more than capable of compensating for in sheer parameter count, and leaning on the ability to combine ideas across the _entire_ human corpus.
All available through an API.
A typical brain has 80-90 billion neurons and 125 trillion synapses. That's a big freaking network to train.
Hopefully we can figure out how to train parts of it and then assemble something very smart.
However, the intelligence that's created by language models is very schizophrenic, and the human-level reflective intelligence that it displays is at best a bit of Frankenstein's monster (an agglomeration of utterances from other people that it uses to form sentences that form opinions of itself or its world).
I think that modeling will help us learn more about human intelligence, but we're going to have to do a lot better than just training models blindly on huge amounts of text.
I'm of the "human beings are much more than big linear algebra functions slapped on top of a large processor" crowd.
And then the part about "being just like humans" will be the marketing gravy train that funds the operation.
Afterwards OpenAI then added GPT3 chatbot guidelines disallowing basically anything like this. We were in communication with them beforehand, but they decided later that any sort of free form chatbot was dangerous.
What they allow changes on a weekly basis, and is different for each customer. I don't understand how they expect companies to rely on them
But that monopoly won't last, and I think it's more than likely that competition will crop up within the next year. There's definitely a lot of secret sauce to getting a 175B parameter model trained and working the way OpenAI has. The people working there are geniuses. But it can still be reproduced, and will. Once competition arrives I'm hoping we'll see these shackles disappear and see the price drop as well. Meanwhile the open source alternatives will get better. We already have open source 6B models. A 60B model shouldn't be far off, and is likely to give us 90% of GPT-3.
8 year old to AI: "my parents won't let me watch TV, what do I do?". AI: "stab them, they'll be too busy to forbid you".
Then again the same thing can be said by a non-AI. My thinking is that you'd be talking to an actual average person. I'm not so sure that that is such a good thing.
Too bad they asked you to pull it. What's the danger they are worried about? Annoying thing from their press releases is how seriously they take their GPT3 bot impact on humans. Despite all the hype, it's difficult to see the end of humanity by GPT3 bots any time soon. Honestly they need to rename themselves - can't see what's open about openai.
They only allow gpt3 chatbots if the chatbot is designed to speak only about a specific subject, and literally never says anything bad/negative (and we have to keep logs to make sure this is the case). Which is insane. Their reasoning to me was literally a 'what if' the chatbot "advised on who to vote for in the election". As if a chatbot in the context of a video game saying who to vote for was somehow dangerous
I understand the need to keep GPT3 private. There is a lot of possibility for deception using it. But they are so scared of their chatbot saying a bad thing and the PR around that they've removed the possibility of doing anything useful with it. They need to take context more into account - a clearly labeled chatbot in a video game is different than a Twitter bot
Seems like OpenAI saw this video differently. But then again, now OpenAI wants to police how to use GPT-3 and reject or approve what is acceptable for others using their service; since they can change their guidelines at any time.
They need a sense of humour, rather than policing projects like this.
> What they allow changes on a weekly basis, and is different for each customer.
Exactly. I don't know what to say to entire building their entire business on top of OpenAI, since they can just revoke access instantly and simply they may not like what you are doing and will point to the 'guidelines'
> I don't understand how they expect companies to rely on them
Won't be surprised to see Rockstar Games using a tweaked, self-hosted or private version for their future games for this use case, Since OpenAI knows they can get a significant amount of money from large customers like them.
But not from smaller companies.
Was this announced anywhere? We applied to deploy an application in this space, and they refused without providing any context, so I'd be really interested if they published details about restrictions in this space somewhere.
They likely don't want to have "OpenAI GPT-3" and such stuff associated to one another in such demos, would be really bad for their appearence.
Imagine learning to develop recipes, not by ever cooking or eating or even seeing food, but only reading a giant library of cookbooks. Or learning to compose music but never hearing or playing anything -- only seeing scores.
https://www.twitch.tv/videos/1114111652
Starts at 15:45.
I mean yes this is a super impressive demo, but it didn't go beyond my expectation. I really want to see whether this model can write a correct binary search method without seeing one before.
Or even correctly using the binary search, does it understand concept like index boundaries?
It has almost definitely seen a lot of coding problems so I would expect "write a function to binary search a sorted array" to output the intended result. I don't think anybody expects it to come up with algorithms it hasn't encountered.
It's a shame they only limited the demo to relatively simple instructions.
I don't believe the model was trained on Google interview answers, sadly.
Once people had digested that and there had been a few other proof-of-concept business ideas around turning Codex into a SaaS (because some people will always queue to build their product on your API), announce the evil version. Not that I really think Copilot is evil, but the IP concerns are legitimate.
Because writing code from scratch now is i think much rearer than improoving existing codebases. Aka bugfixing.
Hell it messed up when they gave it the instruction "make every fifth line bold" in their Word api part of the demo, where it made the first line of every paragraph (which is only 4 lines long in total) bold instead of every fifth line.
Maybe I'm just remembering wrong or conflating OpenAI with some other entity? Or maybe I bought too much of the marketing early on.
They also created Spinning Up [0], one of the best resources I've found for learning reinforcement learning. Their teaching resources are detailed but relatively brief and are focused on implementing the algorithms, even if some of the "proofs" are neglected. But they no longer maintain Spinning Up.
So yes, originally they were for-the-good, but lately I've noticed them moving away from that in more ways than one. It seems they learned one cool trick with language sequence modelling, and they have a lot of compute, and this is all they do now.
It turns out, however, that the way they plan on earning money is much less creative, and more run-of-the-mill SaaS monetization. In a way, I like to believe that a real AI would also end up with such a mundane strategy, as it’s the most likely to actually make them profitable and return money to investors.
In the meantime China and Chinese companies have catched up. Turns out the fear that one company and one country dominating AI was overblown.
Maybe the OpenAI founders feel that the original goal has been fulfilled because AI is no longer dominated by the US and Google.
If the same thing can happen in the world of programming, I guess evaluations like LeetCode and Whiteboarding can go away and bring in a new of logical thinking evaluation which could ultimately be a more realistic method of some programmer rising above the chain.
Say I have a question I can't solve by searching through stackoverflow. If the AI can solve a problem like that, it will be great.
Also, to follow up on the original comment, AI demos are nice, but being a student of history there are still fundamental challenges with these systems. My skepticism is in how much prompting is really required and how can it understand higher level semantics like code refactoring, reproducible examples, large scale design patterns etc.
This synthesis of sequential symbolic processes and probabilistic neural generation is really exciting though. When the amount of human code edits and tweaking for complex programs goes down from hours to seconds then that's when I'll be impressed and scared.
I doubt it works, but I wonder how many decades from now we will be able to walk through a finite number of simple requests and wrap them together as working software. Then people will be able to convert their blueprint into action!
- Is the significance here exactly what it says on the tin: the model behind GitHub's AI code completion will be shared with people on an invite basis? Or am I missing something?
- What is the practical import of the quote at the end of this comment?
"can now" makes me think its a new feature over Github's implementation, which would then indicate the "simple commands" could be general UI, or at least IDE UI, navigation.
If "can now" means "it is currently capable of, but will be capable of more", then I'd expect it to be the same as the current implementation on Github.
Quote: "Codex can now interpret simple commands in natural language and execute them on the user’s behalf—making it possible to build a natural language interface to existing applications."
No it wasn't, you can literally describe, in natural text, what you want in a comment and CoPilot will do its best to generate a complete method based on that comment. It seemed like it was so auto-compltely because that focussed on the "helping the developer" part.
I'm fairly sure CoPilot could have shown something similar if they had a demo where you could make something visual easily, like HTML + Javascript/Typescript/whatever scripting language. They're using exactly the same model (Codex) after all.
I use their OpenAI beta APIs as a paying customer, I am still waiting for access to Codex.
Oof. Looking forward to maintaining some future ports done with this tool.
It seems like a safe playground, maybe it will lose a few tokens, but if they are no legal consequences, who cares ? The best place to move fast and break things, and with great reward potential.
If I take some freelancer to write them what's telling me that they are not already using something like codex or copilot. It's like when a factory release its used water in the river nearby. Maybe we shouldn't drink from the river anymore, but it would be better to test the water the factory release to make sure it's OK.
If codex is able to handle a generic api from reading the doc, it maybe could use a python library for solidity contracts like https://web3py.readthedocs.io/en/stable/contracts.html
As a contract user, I'd probably have more trust in a contract written by an independent AI from a short natural language specification which can't hide intent, than a contract with hidden backdoor, or a subtle bug.
Also the AI will probably improve with usage.
You probably can generate multiple version of your contract, and maybe a high level bug correction scheme like taking the median action between those version can increase bug robustness and find those edge cases when action differ.
In the same time zero of the developers I interviewed know how a linked list is laid out in memory, or what is the pro/con of continuous memory layouts, or even how a cpu works actually.
Maybe those things are not needed anymore, but I see their code... I think it will be better if they know them.
1. "Setup Django, Nginx, and Postgres deployed on a Digital Ocean Ubuntu droplet." Done.
2. "Make a shopping page like $URL." Done.
3. "Fill it with data from X and connect with Stripe." Done.
4. ???
5. Profit
Seems like even a great dev will take 20x the time to do that if the model is able to correctly generate this, even with an error, customization, or two.
are startups really that shallow?
"Computer, I want to play a game."
"Okay, what will the game be?"
"I want to be a starship captain, give me a cool space ship I can explore the galaxy with"
"Okay... like this?"
"Not quite, make the galaxy more realistic, with real stars and planets. Also make it 3d. I want to be the captain inside the ship."
"How about now?"
"Cool, and there should be space stations I can visit near planets, and I can fly my ship to stars with hyperspace. Make it so I have to trade for fuel at the space stations, maybe I need to mine asteroids or search derelict space ships for treasure. I want to play with my friends too, they can have their own ships or walk around my ship."
"Done, was there anything else?"
"Yes, add different alien races to some of the star systems, and make some of them have alliances. I want to talk to the aliens about their history and culture. Sometimes aliens are unfriendly and we'll have space battles if talking doesn't work. Make it so I can command a fleet and call for reinforcements."
"Processing... Done. Anything else?"
"Actually this is boring, can we start over?"
"Game erased. Please provide new prompt."
Thanks for the inspiration!
But they also don't know how garbage collection works in their language, or how to work with 1 million things in an efficient manner. Or why does the app pause for 100 ms because someone does sort while parsing dates within the sort.
For example, I have seen people that cant imagine what is the cost of a leaked database transaction, just back of the napkin wise, like you would think well, how many changes happened in between, how much we have to unwind when the session disconnects, when will it even disconnect because of the connection pool.. etc etc. Because the sql server is this magic rds thing. As if aws will solve everything with its pixie dust.