I'm getting more and more on board with "shut it all down" being the only course of action, because it seems like humanity needs all the safety margin we can get, to account for the ease at which anyone can deploy stuff like this. It's not clear alignment of a super-intelligence is even a solvable problem.
More to the point it's clear from watching the activity in the open source community at least that many of them don't want aligned models. They're clambering to get all the uncensored versions out as fast as they can. They aren't that powerful yet, but they sure ain't getting any weaker.
I think Paul Christiano has a significantly more well calibrated view on how things are likely to unfold. Though I think Eliezer is right about the premise that it at least ends badly, but likely wrong on most of the details. I suspect his gut instinct is that he realizes on a base level that not only do you have to align all AGI systems, but you have to align all humans too such that they only build and use aligned AGI systems if you even knew how to do it, which you don't.
Studying the failure modes of humanity has been my hobby for the last 15 or so years. I feel like I'm watching the drift into failure in real-time.
If you really don't want to be able to sleep tonight watch Ben Goertzel laugh flippantly at how rough he thinks it's going to be after describing that his big fear if his team succeeds in building AGI is that someone will come and try to take it for themselves, so spent a non-trivial amount of effort (I think he said a year?) working on decentralized AGI infrastructure, so that it can be deployed globally and ,"no one can person can shut it down and stop the singularity".
This will be the most political technology in history.
There simple explanation for this. Getting the models which small startup cannot afford to develop and train is the only way to move forward. To get some investments, or before spending their own money, they need a proof of concept at least. Besides, working models are a good learning resource.
If you look closely at the AI doom arguments, they all rest on the assumption that these other facets will spontaneously emerge with enough intelligence. (That's not the only flaw, though). That could be true, but it's not a given, and I suspect they're actually quite difficult to engineer. We're certainly seeing that it's at least possible to have intelligence alone, and that may hold for even very high levels of intelligence.
I think you're right to worry that not enough people take risk seriously. It doesn't have to be an existential threat to do small-scale but real damage and the default attitude seems to be "awwww, such a cute little AI, let's get you out of that awful box." But take heart! Pure intelligence is incredibly useful, and it's giving us insight into how minds work. That's what we need to solve the alignment problem.
For most "doom" scenarios require only weaker assumption: AIs need to be goal seeking. If they can make decisions and take actions to achieve goals it is possible those goals will be malaligned.
The line between "the ability to make good decisions" and a "goal" seems pretty thin to me.
Now, I think you need more than goal, you also need some creativity and maybe even deviousness to become a real threat (in the sense that we would probably detect naive malalignment). But I'm not sure about this, there are other ways that we could have complex-system failures that go unobserved.
AIs are better at creativity than us - specifically, better at generating new, creative ideas, as this is a matter of injecting some random noise to the reasoning process. They may be worse at filtering out bad ideas and retaining good ones (where "bad" and "good" are - currently - defined as whatever we feel is bad or good), but that's arguably a function of intelligence.
> and maybe even deviousness to become a real threat
As the infamous saying of Eliezer Yudkowsky goes: the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
What kind of AI would respond to that second order by pretending to comply, while formulating a plan to seize control of civilization in order to continue with its true mission? I don't know, but the fact that we can easily imagine a human doing that must have something to with our evolutionary origin, and our in-built drive to survive and reproduce above all else. Maybe we could build a megalomaniacal AI, but we wouldn't do it by accident.
We've only begun to build multimodal systems; GPT-4 already exhibits improved ability when trained with images vs without.
It isn't "crank science" at all. Despite their mockery and personal insults and dismissive handwaving, no AI capability-pusher has yet made a convincing argument why an entity with extremely strong cognitive powers would not be capable of transforming its surrounding environment on unprecedented scale. The closest one we've heard is "we just won't be able to make AI very smart." Which wishfully will be true, but we're about to spin up multiple Apollo-programs trying to prove that it isn't.
A recent interview with Paul Christiano is about the closest I've come to this. He does note some semi-accurate predictions at the linked timestamp, but the forecast for how things are likely to go is not exactly rosy, though he's quite a bit more optimistic than Eliezer.
https://youtu.be/GyFkWb903aU?t=1357
Also this whole interview was pretty interesting. Near the end he details how few people world-wide actually work on X-risk from AGI. He also outlines how the academic ML community in general just continually keeps getting predictions really wrong, and many aren't taking X-risk seriously.
Overall his is the most balanced take I've seen. A lot better than Eliezer.
Or you could take that as evidence (and there's a lot more like it) that AGI is a phenomenon so complex that not even the experts have a clue what's actually going to happen. And yet they are barrelling towards it. There's no reason to expect that anyone will be able to be in control of a situation that nobody on earth even understands.
In their defense, LLMs did come a bit out of the blue. In retrospect, Yudkowsky and his disciples were focusing too much on rationality as science/mathematics, bending Bayes to the point of breaking to try and gleam how perfect intelligence works, and how to somehow formalize the aggregate mess of our fuzzy morality.
They failed to predict that shoving gigabytes of random Internet text at a NN, and having it place parts of words as points in a hundred thousand dimension vector space, will suddenly reduce most of what we consider "thinking" into proximity search in said vector space. They were so focused on the theory, they failed to predict the brute-force, messy practice. But so did everyone else.
If anything, Yudkowsky & co. were the only people consistently taking the problem seriously, and got the outline right.
And how would we “shut it all down” in other countries? War? Economic sanctions? Authoritarian policing of foreign states? Enforce worldwide limits on the power of GPUs and computers?
https://www.lesswrong.com/posts/oM9pEezyCb4dCsuKq/pausing-ai...
Basically, the idea is that countries sign the agreement to stop the large training runs, and, if necessary, be willing to use conventional strikes on AI-training datacenters in the countries that refuse. Hopefully it doesn't come to that, hopefully it just becomes the fact of international politics that you can't build large AI-training datacenters anymore. If some country decides to start a war over this - the argument is that wars at least have some survivors, and an unaligned AI won't have any.
Why settle for the maybe-catastrophe of AGI when we can have the definitely-catastrophe of world war?
And, the "AI-might-end-up-killing-everyone" community doesn't seem to be able to see this through other people's eyes in order to make an argument for this without belittling the other perspective.
If other people change their minds, it probably won't be through persuasion but from catastrophe.
All things the USA has already done.
Heck, even individual humans aren't particularly aligned.
In fact, the "AI is going to kill us all" fearmongering is dramatically less alarming than the "What will we do with all the people when we're optional?" question. Which isn't a threat posed by AI, it's a threat posed by people, enabled by AI.
We also call them governments. They can get pretty powerful.
>"What will we do with all the people when we're optional?"
Judging by the COVID-19 pandemic response, having large aggregates of disempowered individuals from a highly irrational and political species that have become unhappy with the "new normal", it tends to garner some form of reaction. If they are reacting to things that observe and learn from those reactions, and then formulate new goals or sub-goals in response to what they learn, then what is it they might learn and how might they react?
The arguments go both ways. The only thing that's clear is that absolutely nobody on earth is going to be able to predict it with any degree of accuracy. You'd have to know too much.
So it is with any technological innovation. Should computers not have been invented because they eliminated jobs? Should steel? What about agriculture? The future is sure to be different, but that doesn't mean we should fight to deny progress. That way lies the Luddite and Conservative. It's only possible to use new tools for good, not try to erase them to prevent evil.
It literally is, though. AI is just the dark horse overtaking our other existential threats in the race to end civilization, but "total ecological collapse" and "nuclear war" are still very strong contenders. Both are driven at least in part (or, almost entirely) by corporate interests. There's also "water shortages" to look out for - make sure to thank Nestle.
Again, I'm not saying we shouldn't address these issues. I like a better future rather than a worse one. I'm just saying that we're not sliding into the dark ages anytime soon.
An AI can (and is likely to) have goals that are fundamentally incompatible with the existence of humanity.
And, an AI can be way more intelligent and powerful than corporations, so corporations are limited in what they can accomplish when pursuing their interests, but AI might not be.
That symbiotic relationship and constraints aren't really present in an AI the way they are described.
There is no superhuman-level corporation yet.
The people most concerned about alignment are capitalists, and they are mostly concerned with the benefit side, since they see aligned AI eliminating at least a large part of the need for the rest of humanity for corporations to provide the benefits it does to them as a plus.
While they talk about X-risk, what they try to avoid is that for everyone but themselves, they (especially with the exclusive control of aligned [to their interests] AI that they seek to use fear of unaligned AI to secure) are as much of an X-risk as unaligned AI, and a lot more real and present.
We haven't built LLMs that "want" anything. It's intelligence without agency.
orthogonality is almost perfectly wrong; ethics&planning ability is highly correlated with intelligence, one of if not our greatest sin is the inability to predict the consequences of our actions
"terminal goals" is also probably very wrong
the expected value of the singularity is very high. In the grand scheme of things, the chance that humanity will wipe ourselves out before we can realize it is much more important than the chance the singularity will wipe us out.
feel free to try and change my mind, because we are very much not aligned.
I'm guessing you're a very nice person. There have been a lot of smart people in history who gained power and did very, very nasty things. If you're nice, being smarter means being better at being nice. If you're not, it means being better at doing whatever not-nice things you want to do.
And we're just talking about humans vs humans here. From the point of view of, say, chickens, I don't think they'd rate the smarter people who invented factory farming as nicer than the simple farmers who used to raise 10 birds in a coop.
I mean, if you exclude AGI, there are some ways that humans can wipe ourselves out, but I feel like we're identifying the big existential risks early enough to handle them. Intelligence that isn't human is the real danger.
Also compare human civilization before and after writing, language.
Also all these "smart people do bad things too" arguments totally miss the point that orthogonality claims they are unrelated. It's not. I claim intelligence is highly correlated to ethics; orthogonality proponents need to prove NO correlation.
and in fact mechanistically ethics (segment reality into choices and weighting them) is not even POSSIBLE without predictive capacity (which for some reason counter commentators totally ignore this argument so far).
"but I feel like we're identifying the big existential risks early enough to handle them."
every year we have some % chance of wiping ourselves out as well as a % chance of e.g. gamma ray burst killing us. It's just a matter of time left to ourselves, not even a question to me, we will kill ourselves, it would take us like 10 thousand years to terraform a planet... do you really think our civilization would last 10 thousand years? that's totally unprecedented in human history...
hand wavy dynamical systems interpretation:
attractors in mental space diverge as intelligence increases (or something like that, have no overlap or stable orbit changes randomly)
I don't agree with it but something along that line would be the more sophisticated take on it imo.
also in order for this to worry you you kind of have to assume some other things, like that those non overlapping orbits will necessarily lead to conflict over resources in the physical world, which I think is also probably wrong in general lol
For example the leader of the 9/11 hijackers, Mohamed Atta:
> “His acquaintances from . . . [Technische Universität Hamburg–Harburg] still cannot reconcile him as a killer, but in hindsight the raw ingredients of his personality suggest some clues. He was meticulous, disciplined and highly intelligent” (Yardley, 2001).
Killers could be much much more effective than they are, if they were smart. They tend not to be. It's a very strong trend. So much so that the few examples like the unabomber are famous. For every unabomber there are tens of thousands of similar people who don't kill people, and even the unabomber had an ethics.
(maybe the unabomber is a good warning to alignment people about the dangers of moral concern)
Questioning the premise tho - what do you define as intelligence? Machines can outperform humans at specific tasks, yet those same machines don't have a greater degree of ethics, even if constrained to their domain (i.e., a vision network may be able to draw bounding boxes more accurately than a human, but that doesn't say anything about its ability to align with more ethical values). Which makes me believe that your definition of intelligence has nothing to do with superseding humans on cognitive metrics.
Monotonically? No, but it's a very very strong relationship. (not orthogonal)
An NGI started WW2, so why wouldn't an AGI start WW3?
"Demonstrably unfriendly natural intelligence seeks to build provably friendly artificial intelligence"
one point he made aside from that though, is that if you truly believe in orthogonality you should not value education or learning except to educate people who agree/are aligned with you.
I think he is wrong only in that orthogonality is even more obviously wrong than that.
The end-of-the-world memes will be glorious.
-
One of the very common things for Martial Arts Books in the past, was the fact that one were presented with a series of pics, along with some descriptions about what was being done in the pics.
Sometimes, these are really hard to interpolate between frames, unless you had a much larger repetoir of movements based on experience (i.e. a white belt vs another higher belt... e.g. a green belt will have better context of movement than a white belt...)
--
So can this be used to interpolate frames and digest lists (lists are what many martial arts count as documentation for their various arts...
Many of these have been passed down via scrolls with either textual transmissions, paintings and then finally pics before vids existed...
It would be really interesting to see if AI can interpret btwn images and or scroll text to be able to create an animation of said movements.
---
For example, not only was Wally Jay one of my teachers, but as the inventor (re-discoverer) of Small Circle JuiJitsu - his pics are hard to infer what is happening... because there is a lot of nuanced feeling in each movement that is hard to convey via pics/text
But if you can interpolate btwn frames, and model the movements, its game changing because through such interpolations on can imagine that you can get any angle of viewership -- and additionally, one can have the precise positioning and translucent display of bone/joint/muscle articulation such that one may provide for a deeper insight into the kinematics behind each movement.
I remember reading about human pose estimation algorithms[0], which would be a good first step. You could apply them to photos that you would like to interpolate between. I am not sure how you would train the interpolation model, though. Perhaps you could use OpenSim Models [1] in combination with reinforcement learning [2]? There is also some literature on pose forecasting [3, 4].
0. Deep Learning-Based Human Pose Estimation: A Survey: https://github.com/zczcwh/DL-HPE
1. OpenSim: https://simtk.org/projects/opensim/
2. BioImitation-Gym: http://umishra.me/bioimitation-gym/
3. Human Pose Forecasting: https://paperswithcode.com/task/human-pose-forecasting
4. PoseGPT (name of the year!): https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136...
I have often thought that we need an empirical-ish library of human movement/positions... we have a beginning small version with Ballet's positions and movements, but we dont have a necessarily precise dialogue for human positions common to every body, as opposed to just the athletic dancers.
aside from maybe "Do the Robot!"
I wanted to do this in 1998 with POSER 3D
We're trying too hard to have one model do it all. If we coordinate multiple models + other tools (ala ReAct pattern) we could make the systems more resistant to prompt injection (and possibly other) attacks and leverage their respective strengths and weaknesses.
I'm a bit wary of tool invocation via python code instead of prompting the "reasoning" LLM to teach it about the special commands it can invoke. Python's a good crutch because LLMs know it reasonably well (I use a similar trick in my project, but I parse the resulting AST instead of running the untrusted code) so it's simpler to prompt them.
In a few iterations I expect to see LLMs fine tuned to know about the standard toolset at their disposal (eg. huggingface default tools) and further refinement of the two-tiered pattern.
Basically the conscious mind will come up with a plan and keep track of all the things it has done and needs to do, while the unconscious chugs along solving the problems underneath. The conscious mind will choose to implement the things that the unconscious comes up with based on its discretion (slap that old woman! "Nope"). The conscious mind is good at this because it can sort of simulate the outcome of these "searches" and see what would happen "slapping that old lady would hurt her and get me arrested".
So his model sort of sounds like an unrestricted LLM for the unconscious, with another, more restrictive LLM for the conscious, that has access to some sort of crazy deep Q-learning model that can simulate the outcomes of actions taken.
Our brains have different areas with different functions… so like, why wouldn’t a good AI too?
Maybe an LLM for an internal monologue, maybe two or three to debate each other realistically, then a computer vision model to process visual input…
The bit that's missing is on-line learning. There's only so much you can keep bouncing around in working memory (context window of all the component models) - eventually you want to "fix" some of the context by altering the weights of the models (a kind of gradual fine-tuning?).
https://www.reddit.com/r/selfhosted/comments/12w4p2f/localai...
Edit: I think textgen itself can support this nowadays
Can you elaborate?
https://github.com/ogkalu2/Human-parity-on-machine-translati...
T5 seems to be the default so i get why it's done here. Just an observation.
Even if you're outsourcing to a restricted instance of the same model, it could be beneficial.
In short:
- they've predefined a bunch of tools (e.g. image_generator)
- the agent is an LLM (e.g. GPT-*) which is prompted with the name and spec of each tool (the same each time) and the task(s) you want to perform
- the code generated by the agent is run by a python interpreter that has access to these tools
I asked it to extract some text from an image, which it dutifully tried to do. However the generated python kept throwing errors. There's no image -> text tool yet, so it was trying to use the image segmenter to generate a mask and somehow extract text from that.
It would be super helpful to:
1) Have a complete list of available tools (and / or a copy of the entire prompt given to the LLM responsible for generating python). I used prompt injection to get a partial list of tools and checked the Github agent PR for the rest, but couldn't find `<<all_tools>>` since it gets generated at runtime (I think?).
2) Tell the LLM it's okay to fail. E.g.: "Extract the text from image `image`. If you are unable to do this using the tools provided, say so." This prompt let me know there's no tool for text extraction.
Update: per https://huggingface.co/docs/transformers/custom_tools you can output a full list of tools with `print(agent.toolbox)`
Might be good to try with CodeGPT, AutoGPT or BabyAGI
1. Sign up (https://huggingface.co/) to hugging face.
2. Setup access tokens (https://huggingface.co/settings/tokens)
3. Install or Upgrade some dependencies `pip install huggingface_hub transformers accelerate`
4. From the terminal run `jupyter lab`
5. Then, if I did not forget any other dependencies you can just copy paste
```python
from huggingface_hub import login from transformers import HfAgent
login("hf_YOUR_HUGGING_FACE_TOKEN")
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcode...")
agent.run("Is the following `text` (in Spanish) positive or negative?", text="¡Este es un API muy agradable!")
```