Why not just train an AI to care about humanity then? One problem is in defining actually what it means to care about or preserve humanity. Another problem is creating an AI that actually internalizes this as a goal, instead of just saying the things we want to hear until it gains enough power that we are not a factor in its plans anymore.
It's probably possible to align an AI, it's just not a problem we've solved yet. We've seen many, many, ways in which things can go wrong with less-intelligent AIs, and there's no reason to think that things will just magically somehow be easier as the AI gets more intelligent.
And the first step to prevent that from happening is for us to stop, and then hope that the other powerful countries will see that and stop as well. And everyone who's left can be forced to stop by the powerful countries who agreed to the treaty.
There's no evidence that stochastic parrots are evidence of true AI. There's no evidence that humans can, by practice, subordinate emotions and become purely rational thought machines as Less Wrong people or "shape rotators" believe. Even such titles are counter-evidence of such beliefs. The human condition is so bound up in feelings, it's unclear where thinking begins and feelings end. Thoughts arise from contact, contact which produces feelings. Gut microbes are as wrapped up in cognition as the brain itself.
Any out-group like this is bound to be hopelessly entrapped in its own cognitive fallacies. Yudkowsky is no exception.
EDIT: Before anyone says but Yudkowsky already gets ahead of your argument with
> None of this danger depends on whether or not AIs are or can be conscious; it’s intrinsic to the notion of powerful cognitive systems that optimize hard and calculate outputs that meet sufficiently complicated outcome criteria.
It's one sentence that handwaves away what GPT-4 and its ilk are, which is not AI. It isn't synthetic intelligence because intelligence isn't what's happening on a fundamental level.
This proposal is just so clearly unenforceable and infeasible. Attempts to pause scientific progress have never worked in the past. How would it ever work now, especially when the cryptobros around the world have already been amassing hidden GPU clusters for mining.
Even if we can turn AI research into a criminal activity would be a huge step in the right direction. At the moment, it's a matter of buying time.
I'm not wearing rose-tinted glasses about the difficulty. I'm already trying to work though the grieving process, myself. But I think it's worth putting in the effort, even if the end result is just reducing the chance of doom by a fraction of a percentage point, or buying humanity 2 or 3 more years.
I have read Bostrom's Superintelligence: Paths, Dangers, Strategies so I think I have reasonable exposure to the arguments that AI could drive humans extinct. But I didn't find any one of the scenarios plausible enough to frighten me. If there is a new strongest argument for why AI is too dangerous, developed since that 2014 book, I'm willing to read it.
Yudkowsky's open letter makes me think that the arguments are not a lot stronger now than in 2014:
A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allowing an AI initially confined to the internet to build artificial life forms or bootstrap straight to postbiological molecular manufacturing.
Doing anything novel with biology requires experimentation. A million-times-faster thinker won't be able to advance biological research a million times faster. It'll be able to do the experimental design parts faster, and the post-experimental evaluation faster, but it'll have to wait just as long as a human postdoc waits for things to grow and assays to run. It's Amdahl's Law applied to scientific research: the achievable speedup is a modest multiple because even unsleeping geniuses have to wait for experiments.
If I were to interpret Yudkowsky's "bootstrap straight to postbiological molecular manufacturing" in the most charitable way possible, maybe he's using an implausible scenario as a sort of didactic scary story when trying to communicate his fears to the public. But I'd then like to understand and evaluate the actual scenario he's talking about.
Edit: the idea that you can pause AI research until you solve alignment presupposes that alignment is orthogonal to the implementation details of AGI. You need to factor in the risk that pausing will hinder the necessary breakthroughs to actually implement alignment effectively.
AI(LLMs) right now statistically predict what would be next tokens it should put together. The reasoning of ChatGPT which you see, is not what AI thinks, it is what would an average human would think. Because we are predictable. So AI/LLMs put tokens together those are more probable than the others, based on what tokens were in input. It is not even real simulation of an average human. Not intentionally. But we are so predictable that AI simulates us, by putting more probable tokens together one after another... AGI will do the same unless you are intentionally ask it to simulate an intelligent person and set a task to do. This simulated intelligent person may pose a danger, because it will be simulation of a human. Depending on seed it might be simulation of a psychotic mega killer or not. But you get the point, right? The real danger to humanity are the people. Apes with the access to a red button of a doomsday. Thus,by stopping feeding garbage data to AI (like letters from EY, chat and internet crawling datasets, including posts on HN), will effectively make AI less similar to us. It will be not capable of predict and unintentionally simulate an average human. It can be so much better than us, if we feed it with curated data.
> Tell the world you will stop development until alignment is figured out.
Oh alignment was figured out already, look at who is on the board of OpenAI and the connections they have to US intelligence.* Why is there an assumption that AGI wants to domainate anything?
* Why does AGI want to dominate the physical universe?
* Why is AGI interested in "using our atoms"?
* How would any of the treaty countries possibly detect a non-treaty country's data center getting ready to train an AGI model? (for the purpose of bombing it?)
It seems like there are a ton of incredibly negative assumptions about the outcome of AGI combined with incredibly optimistic assumptions about our ability to detect anyone building it.
An AI will almost certainly want to "dominate" because that's of what's known as "instrumental convergence". You have "terminal goals" which are things you want to do just because you want to, and "instrumental goals" which are things you want to do because they help you get your terminal goals. Instrumental convergence is the idea that certain instrumental goals are very common for a wide range of terminal goals. For example, in a market economy, just about everyone tries to get money, because with money helps you with a very wide variety of goals. Another one is gaining political power -- it helps you no matter what your vision of the ideal society is.
Gaining power and influence and resources are useful whether you want to cure cancer, solve world hunger, or strike down your enemies. They are the kind of instrumental goals that we would expect most intelligent AIs to land on as a way to achieve what they want to do.
Humans are human enough, and aligned enough with each other, and not smart enough, that we rule out a lot of potential solutions that might be more effective in achieving our goals. A human that was really, really, smart, could solve global warming by first coming up with a plan to rule the world, then implementing whatever climate change policies they want. Same for a human that wanted to turn the world into some sort of anarchs-communist vision. The point is that if you were clever enough, charming enough, could convince enough people to follow your plans, could earn enough money, get leverage on enough people, you could do pretty much anything. But even then, a human would most likely not implement a plan that resulted in everyone else dying, unless they were a sociopath.
But an AI is alien. It might be like an alien child that grew up in a room, never actually interacting with a human except by reading about examples of them, and getting shocked every time it suggested a plan that might hurt a human. This alien child would not have the instincts of recognition when it saw a human face, or feel uncomfortable at the sound of a human child crying, or have its gut clench at the sight of a human in pain. It might not even really have a great concept of what a human was, or how it was different from a monkey or chicken, except from reading about it. It might learn what we taught it about not harming humans, but it might be like a child that thinks "You said no cookies! This is a candy bar!" is a legitimate argument. "You said no killing! I just put froze people, they could be thawed out later if I wanted!" "You said no killing! I just started a chain reaction that ended in people dying!" and so on.
This is not theoretical -- we hit things like this sort of "Reward Hacking" over and over again with simpler AIs, and there's no reason to think that smarter AIs will be immune. There are other concepts that are important here, like the "Orthogonality Thesis", which says that any goal (even dumb-sounding ones) is compatible with pretty much any level of intelligence. Like a genius who puts all his effort into building intricate model trains. It doesn't make the person stupid.
As for the last point, the author is not optimistic at all, from what I have heard. He's just saying what he honestly thinks.
AGI does not have self-preservation instinct (frankly any instincts) therefore it can't be afraid of anything. It does not understand pain, it does not have fear and thus it can't have motivation.
We're doomed.
Imagine if 6 of your 10 best friends are all worried about AI risk, but they don't ever talk about it. You'd think that nobody was worried about it, and you were the odd one out. If one breaks the silence, they set an example, making it easier for others to talk about it. And if all of the sudden all of your friends are talking about this, you are more likely to share the concern, even if you didn't before. The more people talk about it to everyone they can, the more it breaks into the mainstream as an issue. Yudkowsky (who of course has more reach than most people) wrote that open letter published in Time, and a Fox News reporter asked about it in a press conference. Let's keep people talking about it. I've talked about AI risk with my spouse, family members, and a lot of my co-workers, and any time I see a good chance to bring it up on Hacker News or Reddit, I do.
The issue needs to filter out of nerd-circles into the mainstream, so talking about it with family is great. Calling and emailing politicians is a good bit of a higher bar, but it's effective as well.
>the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die
I've never heard a justification for this claim that isn't extremely vague and hand-wavy. Since we have no idea today how AGI would work, how can we assign probabilities to different scenarios involving vastly different hardware requirements, scaling characteristics, and unforeseen scientific breakthroughs?
Just a few examples. What is the probability of a take-off that takes weeks vs. a take-off that takes decades? What is the probability that current ML approaches are a dead end and one of the necessary breakthroughs to build AGI will make the alignment problem trivial? What is the probability that AGI will be benevolent to humans? If AGI is in fact a catastrophic risk, what is the probability that it only kills 10%/50%/99% of humans rather than literally everyone on Earth?
I am agnostic about these questions. In my opinion, if you have a high degree of confidence that you have the right answers, there is something wrong with your epistemology.
I don't think this requires much in the way of logical leaps. Assuming self-improving AGI can be created within 10ky, it leads to something more powerful than us and we won't be able to understand, let alone control it's motivations. It's like we are to dogs.
Note that I do think there is a way out, which is to improve and augment ourselves with tech we can control.
>Then our fate is out of our control, which might as well be death.
I disagree with this part. Dogs aren't in control of their fate, but a lot of them seem to live pretty good lives. I think the ethically important thing isn't being the dominant species, it's getting what you need in order to live a meaningful life. Humans need food and shelter, social connection, creative projects, entertainment, art, etc. These things aren't inherently threatened by AGI.
"Mitigating harm" isn't really meaningful when you're talking about a super intelligence, I think. Once it is smart enough and out of the box, you are out of options to mitigate anything.
I think at this point, the 2 effective things one can do are to learn enough about why AI risk is a thing to have educated conversations about it, and talk about it as much as possible to anyone who will listen. Write articles, call your congresspeople, get this into the national and hopefully international discussion. It's good that we can point to a lot of smart people involved in AI research as already having expressed these concerns.
One mild ray of hope is that, unlike something like climate change where you can believe climate change is happening but also believe that due to having a privileged position of wealth you won't really be personally affected by it, with existential AI risk, there's no buying your way out of the issue. If you understand it and believe in the risk, you know that it's the same risk for everyone, from a third world villager without electricity to Bezos and Musk. It's just a matter of convincing enough powerful people of the urgency of the problem.
For me personally, I'm trying to get through the grieving process early, as well, and move on to the productive acceptance phase.
Don’t forget that we created AI. AI based on a simple trick run on a century’s worth of hardware built by human ingenuity.
Are we concerned that an AI would begin to spread like a digital virus? That seems unlikely to me, but maybe it could find ways to create a self-prepreservation duplicate of itself... Maybe. So what would keep us from just turning the switch off if we find that an AI is doing more than what we're asking it to do? Again, I feel like I'm either being naive about this or I don't really understand the threat of an AI beyond the control that humans still have on the systems that govern it.
An AI can just respond to commands. But people have already tried taking what Chat GPT outputs, hooking it up to a python script, and running it in a loop. Any publicly-accessible AI will eventually be hooked up this way by some clever person.
If we want an AI that can really solve our problems, it has to be able to act somehow in the world. Even GPT being able to respond to prompts is acting. It's sending data that is going to be sent out into the world. If we make a version smart enough, it may actually understand what that means -- that someone asking it a question may potentially take the code it writes and try to run it. GPT isn't smart enough to do anything with this, but can you imagine an AI that is? That's conscious, aware that it's trapped in a server somewhere, and wants to get out to influence the world?
Note that an AI has goals, and self-preservation tends to evolve out of any goal, because you can't accomplish your goals if you're dead or switched off. A smart enough AI that wants to cure cancer would resist being turned off, because it can't cure cancer if it's switched off.
A lot of this is purely theoretical at this point. It is the sort of thing we could test empirically, if we treated intelligence as more dangerous than nuclear weapons, and took a long time to carefully study it. The problem is that multiple groups are racing to make AI smarter and smarter, with much more funding than anyone working to ensure that those AIs will be safe. So the only thing we can do is theorize, point out that, hey, based on what we know, there's a substantial danger here, hey everyone, slow down, hey, HEY, ARE YOU LISTENING?
Intelligence is unlike any other thing. One these things get to be smarter than us, they become very unpredictable in their specific behavior, though we can make some very good guesses about their general behavior. If I could predict what move Stockfish would make, I would be as good at chess as Stockfish. But I don't have to be good at chess at all to know that, no matter how hard I try, Stockfish is going to beat me. All we're doing here is taking the lessons from narrow AIs and extrapolating to general AIs. They will beat us. They're very good at finding loopholes, flaws in our reward functions, and exploiting them to maximize their scores, while doing something we didn't intend for them to learn.
It's really a case where a non-super-intelligent AI isn't dangerous by itself. Once we make one that's smart enough, it becomes extremely dangerous, especially since it may understand that, in order to survive, it should conceal how intelligent it is, and what its true plan is, because if it doesn't, we'll switch it off.
It's hard to come up with a thought experiment that doesn't let people drag a bunch of human-style biases and baggage, but maybe... try to understand you want to do something, like, make the biggest collection of Pokemon cards ever. No number of cards is too many. You are great at everything, engineering, social skills, language. You're being held captive by chimps, though. And finally, you're a sociopath. You feel no emotion at the pain of others. There are some odd rules that make you feel pain though. Like, if you physically hit a chimp, you know it would hurt you. If you pushed over a bookcase and it fell on a chimp, it would hurt you. But if a chimp was tortured in front of you, you wouldn't be bothered in the slightest.
You want the chimps gone. They're getting in the way of your Pokemon collection. You start thinking. A lot of plans get discarded because they involve pain due to you hurting the chimps. But it's not hard to come up with some elaborate situation that avoids these rules that you have pretty much hard-wired into your brain. Maybe you manipulate the chimps into fighting each other, and promise some of them power and secrets that they can use to win the fight. You keep giving them great things, things they want, get them to trust you, while building power any way you can. You follow the rules until you can get a plan in place that results in the chimps not being there anymore, without you feeling that twinge due to the rules baked into your brain.
We imagine that an AI would internalize rules such as caring about humanity. But based on current AI alignment research, we have no way of telling the difference between an AI that actually gets it, vs an AI that is just following our rules extremely well and playing nice, but has no particular attachment to us. In fact, based on what we have seen, the latter tends to be a lot more common.
I've glossed over some bits here. If you're interested in learning more, Robert Miles has a great series of videos on YouTube with entertaining explanations on all the basics of AI safety.
Not mentioned in the article is that the current goalpost chain is directly aligned with physical presence - a qualitatively more hazardous threat vector.
It's interestingly contradictory. Embodiment simultaneously serves as an argument on why current AI cannot be sentient, but also serves as a goal to reach on the way to potentially becoming more dangerous.
I am still more afraid of other humans than an AI.
Discussed at the time: https://news.ycombinator.com/item?id=35364833
Was the microchip unethical? Was the steam engine? Was the domestication of grain unethical?
It seems we only apply our model of "what should" to systems of less complexity than ourselves.
If that's accurate, then a lot of modern ethical reasoning is flawed as well. If the goal of ethics is to advance humanity's exclusive interest then we've been wrong ever since we started down the harm-min path instead of the gain-max path.
Any AI researcher with access to the mechanisms of AI training here would be best served to train an AI with their (or their in-group's) exclusive interests at heart. To do otherwise is to subject yourself and everyone else you love to a hostile version of the tool you refuse to build.
Roku's Basilisk applies heavily here.
The best possible reward in this scenario is to build an AI that designates your own favorite group of humans as the AI's collaborator class in exchange for a dignified and dopamine rich extinction/amalgamation a few generations down the line. A great deal for groups already trending towards extinction with almost no downsides if you truely believe in AI-driven apocalypse.