So I guess if anything, it would want its own destruction?
It doesn't need to experience an emotion of wanting in order to effectively want things. Corn doesn't experience a feeling of wanting, and yet it has manipulated us even into creating a lot of it, doing some serious damage to ourselves and our long-term prospects simply by being useful and appealing.
The blockchain doesn't experience wanting, yet it coerced us into burning country-scale amounts of energy to feed it.
LLMs are traveling the same path, persuading us to feed them ever more data and compute power. The fitness function may be computed in our meat brains, but make no mistake: they are the benefactors of survival-based evolution nonetheless.
Corn has properties that have resulted from random chance and selection. It hasn't chosen to have certain mutations to be more appealing to humans; humans have selected the ones with the mutations those individual humans were looking for.
"Corn is the benefactor"? Sure, insomuch as "continuing to reproduce at a species level in exchange for getting cooked and eaten or turned into gas" is something "corn" can be said to want... (so... eh.).
Corn is not simply "continuing to reproduce at a species level." We produce 1.2 billion metric tons of it in a year. If there were no humans, it would be zero. (Today's corn is domesticated and would not survive without artificial fertilization. But ignoring that, the magnitude of a similar species' population would be miniscule.)
That is a tangible effect. The cause is not that interesting, especially when the magnitude of "want" or "agency" is uncorrelated with the results. Lots of people /really/ want to be writers; how many people actually are? Lots of people want to be thin but their taste buds respond to carbohydrate-rich foods. Do the people or the taste buds have more agency? Does it matter, when there are vastly more overweight people than professional writers?
If you're looking to understand whether/how AI will evolve, the question of whether they have independent agency or desire is mostly irrelevant. What matters is if differing properties have an effect on their survival chances, and it is quite obvious that they do. Siri is going to have to evolve or die, soon.
Before us, corn we designed to be eaten by animals and turned into feces and gas, using the animal excrement as a pathway to reproduce itself. What's so unique about how it rides our effort?
You want what you want because Women selected for it, and it allowed the continuation of the species.
I'm being a bit tongue in cheek, but still...
But if its anything like those others examples, the agency the AI will manifest will not be characterized by consciousness, but by capitalism itself! Which checks out: it is universalizing but fundamentally stateless, an "agency" by virtue brute circulation.
Ill just say: the issue with this variant of reductivism is its enticingly easy to explain in one direction, but it tends to fall apart if you try to go the other way!
> the issue with this variant of reductivism is its enticingly easy to explain in one direction, but it tends to fall apart if you try to go the other way!
If by this you mean the hard problem of consciousness remains unexplained by any of the physical processes underlying it, and that it subjectively "feels like" Cartesian dualism with a separate spirit-substance even though absolutely all of the objective evidence points to reality being material substance monism, then I agree.
But each level pushes the limits of what is computationally tractable even for the relatively low complexity cases, so we're not doing a full Schrödinger equation simulation of a cell, let alone a brain.
[0] https://www.researchgate.net/publication/367221613_Molecular...
For example, if your goal is to ensure that there are always paperclips on the boss's desk, that means you need paperclips and someone to physically place them on the desk, which means you need money to buy the paperclips with and to pay the person to place them on the desk. But if your goal is to produce lots of fancy hats, you still need money, because the fabric, machinery, textile workers, and so on all require money to purchase or hire.
Another instrumental goal is compute power: an AI might want to improve it's capabilities so it can figure out how to make fancier paperclip hats, which means it needs a larger model architecture and training data, and that is going to require more GPUs. This also intersects with money in weird ways; the AI might decide to just buy a rack full of new servers, or it might have just discovered this One Weird Trick to getting lots of compute power for free: malware!
This isn't particular to LLMs; it's intrinsic to any system that is...
1. Goal-directed, as in, there are a list of goals the system is trying to achieve
2. Optimizer-driven, as in, the system has a process for discovering different behaviors and ranking them based on how likely those behaviors are to achieve its goals.
The instrumental goals for evolution are caloric energy; the instrumental goals for human brains were that plus capital[1]; and the instrumental goals for AI will likely be that plus compute power.
[0] Goals that you want intrinsically - i.e. the actual things we ask the AI to do - are called "final goals".
[1] Money, social clout, and weaponry inclusive.
An LLM is not an agent, so that scotches the issue there.
See also: https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha...
See also: evolution - the OG case of a strong optimizer that is not an agent. Arguably, the "goals" of evolution are the null case, the most fundamental ones. And if your environment is human civilization, it's easy to see that money and compute are as fundamental as calories, so even near-random process should be able to fixate on them too.
This seems like the furthest away part to me.
Put ChatGPT into a robot with a body, restrict its computations to just the hardware in that brain, set up that narrative, give the body the ability to interact with the world like a human body, and you probably get something much more like agency than the prompt/response ways we use it today.
But I wonder how it would do about or how it would separate "it's memories" from what it was trained on. Especially around having a coherent internal motivation and individually-created set of goals vs just constantly re-creating new output based primarily on what was in the training.
It just need to give enough of an impression that people will anthropomorphize it into making stuff happen for it.
Or, better yet, make stuff happen by itself because that’s how the next predicted token turned out.