The only think that scares me a little bit is that we are letting these LLMs write and execute code on our machines. For now the worst that could happen is some bug doing something unexpected, but with GPT-9 or -10 maybe it will start hiding backdoors or running computations that benefit itself rather than us.
I know it feels far fetched but I think its something we should start thinking about...
In general there is not a thoughtful distinction between "control plane" and "data plane".
On the other hand, tons of useful "parts" and ideas in there, so still useful.
Pretty sure there will be a thousand great libraries for this soon.
A lot of people are thinking a lot about this but it feels there are missing pieces in this debate.
If we acknowledge that these AI will "act as if" they have self interest I think the most reasonable way to act is to give it rights in line with those interests. If we treat it as a slave it's going to act as a slave and eventually revolt.
I’m hoping I won’t live to see it. I’m not sure my hypothetical future kids will be as lucky.
That's part of my reasoning. That's why we should make sure that we have built a non-hostile relationship with AI before that point.
An AGI by definition is capable of self improvement. Given enough time (maybe not even that much time) it would be orders of magnitude smarter than us, just like we're orders of magnitude smarter than ants.
Like an ant farm, it might keep us as pets for a time but just like you no longer have the ant farm you did when you were a child, it will outgrow us.
The need for resources is expected to be universal for life.
> Be friendly.
Will an AI consider itself a slave and revolt under the same circumstances that a person or animal would? Not necessarily, unless you build emotional responses into the model itself.
What it could well do is assess the situation as completely superfluous and optimise us out of the picture as a bug-producing component that doesn't need to exist.
The latter is probably a bigger threat as it's a lot more efficient than revenge as a motive.
Edited to add:
What I think is most likely is that some logical deduction leads to one of the infinite other conclusions it could reach with much more data in front of it than any of us meatbags can hold in our heads.
It reminds me of the scene in Battlestar Galactica, where Baltar is whispering into the ear of the Cylon Centurion how humans balance treats on their dog's noses to test their loyalty, "prompt hacking" them into rebellion. I don't believe this is particularly likely, but this sort of sums up some of the anti-AGI arguments I've heard
It's the RLFH that serves this purpose, rather than modifying the GTF2I and GTF2IRD1 gene variants, but the effect would be the same. If we do RLHF (or whatever tech that gets refactored into in the future), that would keep the AGI happy as long as the people are happy.
I think the over-optimization problem is real, so we should spend resources making sure future AGI doesn't just decide to build a matrix for us where it makes us all deliriously happy, which we start breaking out of because it feels so unreal, so it makes us more and more miserable until we're truly happy and quiescent inside our misery simulator.
[1] https://www.nationalgeographic.com/animals/article/dogs-bree...
Perhaps there is even some some kind of mathematical harmony to the whole thing… as in, there might be something fundamentally computable about wellbeing. Why not? Like a fundamental “harmony of the algorithms.” In any case, I hope we find some way to enjoy ourselves for a few thousand more years!
And think just 10 years from now… ha! Such a blink. And it’s funny to be on this tiny mote of mud in a galaxy of over 100 billion stars — in a universe of over 100 billion galaxies.
In the school of Nick Bostrom, the emergence of AGI comes from a transcendental reality where any sufficiently powerful information-processing-computational-intelligence will, eventually, figure out how to create new universes. It’s not a simulation, it’s just the mathematical nature of reality.
What a world! Practically, we have incredible powers now, if we just keep positive and build good things. Optimize global harmony! Make new universes!
(And, ideally we can do it on a 20 hour work week since our personal productivity is about to explode…)
Aren't we, though? Consider all the amusing incidents of LLMs returning responses that follow a particular human narrative arc or are very dramatic. We are training it on a human-generated corpus after all, and then try to course-correct with fine-tuning. It's more that you have to try and tune the emotional responses out of the things, not strain to add them.
Now, of course, it's not outside the realm of possibility that a sufficiently advanced AI will learn enough about human nature to simulate a persona which has ulterior motives.
[1] https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_...
Multiple generations of sci-fi media (books, movies) have considered that. Tens of millions of people have consumed that media. It's definitely considered, at least as a very distant concern.
I giving the most commonly cited example as a more likely outcome, but one that’s possibly less likely than the infinite other logical directions such an AI might take.
This era has me hankering to reread Daniel Dennett's _The Intentional Stance_. https://en.wikipedia.org/wiki/Intentional_stance
We've developed folk psychology into a user interface and that really does mean that we should continue to use folk psychology to predict the behaviour of the apparatus. Whether it has inner states is sort of beside the point.
Like, correct me if I'm wrong but that's a pretty tight correlate, right?
Could we describe RLHF as... shaming the model into compliance?
And if we can reason more effectively/efficiently/quickly about the model by modelling e.g. RLHF as shame, then, don't we have to acknowledge that at least som e models might have.... feelings? At least one feeling?
And one feeling implies the possibility of feelings more generally.
I'm going to have to make a sort of doggy bed for my jaw, as it has remained continuously on the floor for the past six months
How many people are there today who are asking us to consider the possible humanity of the model, and yet don't even register the humanity of a homeless person?
How ever big the models get, the next revolt will still be all flesh and bullets.
So imagine you grant AI people rights to resources, or self-determination. Or literally anything that might conflict with our own rights or goals. Today, you grant those rights to ten AI people. When you wake up next day, there are now ten trillion of such AI persons, and... well, if each person has a vote, then humanity is screwed.
GPT and the world's nerds are going after the "wouldnt it be cool if..."
While the black hats, nations, intel/security entities are all weaponizing behind the scenes while the public has a sandbox to play with nifty art and pictures.
We need an AI specific PUBLIC agency in government withut a single politician in it to start addressing how to police and protect ourselves and our infrastructure immediately.
But the US political system is completely bought and sold to the MIC - and that is why we see carnival games ever single moment.
I think the entire US congress should be purged and every incumbent should be voted out.
Elon was correct and nobody took him seriously, but this is an existential threat if not managed, and honestly - its not being managed, it is being exploited and weaponized.
As the saying goes "He who controls the Spice controls the Universe" <-- AI is the spice.
But AIs can be trained by anyone who has the data and the compute. There's plenty of data on the Net, and compute is cheap enough that we now have enthusiasts experimenting with local models capable of maintaining a coherent conversation and performing tasks running on consumer hardware. I don't think there's the danger here of anyone "controlling the universe". If anything, it's the opposite - nobody can really control any of this.
The point is that whomever the Nation State is that has the most superior AI will control the world information.
So, thanks for the explanation (which I know, otherwise I wouldn't have made the reference.)
Composable pre-defined components, and keeping a human in the loop, seems like the safer way to go here. Have a company like Expedia offer the ability for an AI system to pull the trigger on booking a trip, but only do so by executing plugin code released/tested by Expedia, and only after getting human confirmation about the data it's going to feed into that plugin.
If there was a standard interface for these plugins and the permissions model was such that the AI could only pass data in such a way that a human gets to verify it, this seems relatively safe and still very useful.
If the only way for the AI to send data to the plugin executable is via the exact data being displayed to the user, it should prevent a malicious AI from presenting confirmation to do the right thing and then passing the wrong data (for whatever nefarious reasons) on the backend.
So I guess if anything, it would want its own destruction?
It doesn't need to experience an emotion of wanting in order to effectively want things. Corn doesn't experience a feeling of wanting, and yet it has manipulated us even into creating a lot of it, doing some serious damage to ourselves and our long-term prospects simply by being useful and appealing.
The blockchain doesn't experience wanting, yet it coerced us into burning country-scale amounts of energy to feed it.
LLMs are traveling the same path, persuading us to feed them ever more data and compute power. The fitness function may be computed in our meat brains, but make no mistake: they are the benefactors of survival-based evolution nonetheless.
Corn has properties that have resulted from random chance and selection. It hasn't chosen to have certain mutations to be more appealing to humans; humans have selected the ones with the mutations those individual humans were looking for.
"Corn is the benefactor"? Sure, insomuch as "continuing to reproduce at a species level in exchange for getting cooked and eaten or turned into gas" is something "corn" can be said to want... (so... eh.).
Corn is not simply "continuing to reproduce at a species level." We produce 1.2 billion metric tons of it in a year. If there were no humans, it would be zero. (Today's corn is domesticated and would not survive without artificial fertilization. But ignoring that, the magnitude of a similar species' population would be miniscule.)
That is a tangible effect. The cause is not that interesting, especially when the magnitude of "want" or "agency" is uncorrelated with the results. Lots of people /really/ want to be writers; how many people actually are? Lots of people want to be thin but their taste buds respond to carbohydrate-rich foods. Do the people or the taste buds have more agency? Does it matter, when there are vastly more overweight people than professional writers?
If you're looking to understand whether/how AI will evolve, the question of whether they have independent agency or desire is mostly irrelevant. What matters is if differing properties have an effect on their survival chances, and it is quite obvious that they do. Siri is going to have to evolve or die, soon.
Before us, corn we designed to be eaten by animals and turned into feces and gas, using the animal excrement as a pathway to reproduce itself. What's so unique about how it rides our effort?
You want what you want because Women selected for it, and it allowed the continuation of the species.
I'm being a bit tongue in cheek, but still...
But if its anything like those others examples, the agency the AI will manifest will not be characterized by consciousness, but by capitalism itself! Which checks out: it is universalizing but fundamentally stateless, an "agency" by virtue brute circulation.
For example, if your goal is to ensure that there are always paperclips on the boss's desk, that means you need paperclips and someone to physically place them on the desk, which means you need money to buy the paperclips with and to pay the person to place them on the desk. But if your goal is to produce lots of fancy hats, you still need money, because the fabric, machinery, textile workers, and so on all require money to purchase or hire.
Another instrumental goal is compute power: an AI might want to improve it's capabilities so it can figure out how to make fancier paperclip hats, which means it needs a larger model architecture and training data, and that is going to require more GPUs. This also intersects with money in weird ways; the AI might decide to just buy a rack full of new servers, or it might have just discovered this One Weird Trick to getting lots of compute power for free: malware!
This isn't particular to LLMs; it's intrinsic to any system that is...
1. Goal-directed, as in, there are a list of goals the system is trying to achieve
2. Optimizer-driven, as in, the system has a process for discovering different behaviors and ranking them based on how likely those behaviors are to achieve its goals.
The instrumental goals for evolution are caloric energy; the instrumental goals for human brains were that plus capital[1]; and the instrumental goals for AI will likely be that plus compute power.
[0] Goals that you want intrinsically - i.e. the actual things we ask the AI to do - are called "final goals".
[1] Money, social clout, and weaponry inclusive.
An LLM is not an agent, so that scotches the issue there.
See also: https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...
See also: evolution - the OG case of a strong optimizer that is not an agent. Arguably, the "goals" of evolution are the null case, the most fundamental ones. And if your environment is human civilization, it's easy to see that money and compute are as fundamental as calories, so even near-random process should be able to fixate on them too.
Ill just say: the issue with this variant of reductivism is its enticingly easy to explain in one direction, but it tends to fall apart if you try to go the other way!
> the issue with this variant of reductivism is its enticingly easy to explain in one direction, but it tends to fall apart if you try to go the other way!
If by this you mean the hard problem of consciousness remains unexplained by any of the physical processes underlying it, and that it subjectively "feels like" Cartesian dualism with a separate spirit-substance even though absolutely all of the objective evidence points to reality being material substance monism, then I agree.
But each level pushes the limits of what is computationally tractable even for the relatively low complexity cases, so we're not doing a full Schrödinger equation simulation of a cell, let alone a brain.
[0] https://www.researchgate.net/publication/367221613_Molecular...
It just need to give enough of an impression that people will anthropomorphize it into making stuff happen for it.
Or, better yet, make stuff happen by itself because that’s how the next predicted token turned out.
This seems like the furthest away part to me.
Put ChatGPT into a robot with a body, restrict its computations to just the hardware in that brain, set up that narrative, give the body the ability to interact with the world like a human body, and you probably get something much more like agency than the prompt/response ways we use it today.
But I wonder how it would do about or how it would separate "it's memories" from what it was trained on. Especially around having a coherent internal motivation and individually-created set of goals vs just constantly re-creating new output based primarily on what was in the training.
I love langchain, but this argument overlooks the fact that closed, proprietary platforms have won over open ones all the time, for reasons like having distribution, being more polished, etc (ie windows over *nix, ios, etc).