undefined | Better HN

0 pointspetters3y ago0 comments

> Maybe this is a good cause to reassess the premise of alignment as a valuable goal?

Could you elaborate here? Alignment seems pretty obviously a good thing.

0 comments

8 comments · 2 top-level

robwwilliams3y ago· 6 in thread

Alignment assumes a well agreed foundational philosophy on what is good, what is fair, what is doable today and tomorrow. Yes, HN contributors might have shared goals for AGI alignment—but we are not the world—-we are a thin slice of one culture.

TeMPOraL3y ago

> Alignment assumes a well agreed foundational philosophy on what is good, what is fair, what is doable today and tomorrow.

Alignment assumes that there exists a foundational philosophy on what is good and fair and nice, that's close enough a match to everyone. It's a reasonable assumption, because there are core human universals, and the cultural differences around the world are a rounding error in comparison. We're not talking here about someone's view on when white lies are justified or which model of marriage is the bestest - we're talking at the level of "cooperation = good", "love = good", "trust = good", "death = bad", "suffering = evil", etc., and with AIs, this starts with making sure it even understands those concepts more-less the same way we do.

Alignment does not assume this foundational philosophy is known or easy to derive. If it were, alignment would be solved. The entire GAI x-risk problem stems from the fact that we don't have a complete picture of this philosophy, and that we don't have a clue how to formalize it so we can communicate it fully to an AI.

LLMs kind of give a new twist to it - it turns out that maybe we don't have to formalize it, as LLMs seem capable of picking up high-level ideas from enough exposure to how they manifest in practice. At the same time, with a system of this type, we have no way of telling if it actually understood human values and morals correctly.

> Yes, HN contributors might have shared goals for AGI alignment—but we are not the world—-we are a thin slice of one culture.

As controversial and bad as this will sound: those differences are all bike shedding relative to common core - just like DNA differences between individual humans are a rounding error compared to DNA differences between average human and an average potato. And yes, this bikeshedding is half of what makes the world a dynamic (if dangerous place). It matters to us. But it's an inconsequential detail when dealing with entities that do not have the same common core.

Another way of looking at it: if these differences were big enough to matter, humanity wouldn't be able to cooperate regionally and globally, like it always has, because each group would see other groups as incomprehensible alien minds (thus unpredictable, thus dangerous).

robwwilliams3y ago

Great counter-comments as usual. I can see it your way but you and I are from the same side of the planet and both on HN. Our cosine similarity is 0.95. Perhaps bike-shedding to worry about an HN cultural AGI hegemony ;-) I would prefer that to many AGI mis-aligned alternatives.

airgapstopgap3y ago

> We're not talking here about someone's view on when white lies are justified or which model of marriage is the bestest - we're talking at the level of "cooperation = good", "love = good", "trust = good", "death = bad", "suffering = evil", etc.

Most people disagree to a significant degree. Reminder: the majority of humanity (and a big majority of people that have 2+ children) adhere to religious doctrines which all but prohibit transhumanism. So no, death and suffering aren't unquestionably bad, by human accounting. And as for cooperation and trust, this naturally leads to peer pressure and collectivist coercion if taken to the extreme; and as for individual freedom, humans near-universally value power over shaping the trajectory of their progeny… You assume too much.

> Alignment does not assume this foundational philosophy is known or easy to derive. If it were, alignment would be solved.

It would not. The technical problem of making a strong, self-modifying, agentic AI provably conform to a set of qualitative value preferences in a way its builders would not disavow is hard regardless of the set of values we're trying to force onto it. It is quite likely unsolvable in principle; I expect a theorem to this effect could be proven. The fact that you think the problem is deriving some fashion of moral realism doctrine shows that for you this is a purely political issue.

> The entire GAI x-risk problem stems from the fact that we don't have a complete picture of this philosophy, and that we don't have a clue how to formalize it so we can communicate it fully to an AI.

This suggests that GAI x-risk discourse is not championed by serious thinkers who understand AI technology or moral philosophy. (Indeed, Lesswrong is basically a forum for clueless sci-fi TVTropes enjoyers, and they're behind most of it). Human morals are ad hoc preferences, not lossy approximations of some function; we can derive an approximating function from a big lump of human preferences, but it won't be legible or meaningfully amenable to formalization. As such, the closest we come is just finetuning models on the vague markers of human decency distilled in their general training data, e.g. like Anthopic does with their Constitutional AI. This is also the closest we came to AGI, so this should be our first-priority scenario for future AIs and aligning them – not speculations from the 90s about «formalizing» something.

> At the same time, with a system of this type, we have no way of telling if it actually understood human values and morals correctly.

We have too. Testing LLMs is vastly easier than testing humans, we have insight into their activations, we can steer them, there's a big body of research into that. More importantly, there is no strictly correct understanding, this whole idea ought to be thrown out.

What's really going on here is that some armchair Bentram-style utilitarians like Bostrom encountered literature on Reinforcement Learning and jumped to conclusion that this is how an AGI is to be built; if only they could formalize the correct vector of increasing utility, it would seize the light cone and optimize for the global utility maximum. And accordingly, if they failed, an AGI would optimize for something else, which would most likely (here's another assumption of a quasi-random objective selection) be at odds with human preferences or survival.

Since then, they have written a great deal of elicidations on this basic take, incuriously shoehorning new technologies into its confines. But no part of this hermeneutic tradition is in any way helpful for making sense of our current explosive success with tools like LLMs.

> But it's an inconsequential detail when dealing with entities that do not have the same common core

But why don't they? Just because some Lovecraft fans with Chūnibyō call natural language processors trained on human data Shoggoths, entities summoned from the Eldritch Space of Minds?

The AI risk discourse is incredibly sophomoric, imaginative in the bad sense. Once you learn to question its assumptions, it kind of falls apart.

2 more replies

generalspecific3y ago

I think a more individualistic definition of alignment could say that an AI that a person is directing doesn't do something that person does not desire - this definition removes the "foundational philosophy of what is good" problem, but does leave the "lunatic wants to destroy the world with AIs help" problem. Tricky times ahead

esafak3y ago

You can't please everyone, so it is best for good-natured people to get out front. It's the same with any powerful technology.

Are you going to invite religious extremists to the table in the name of fairness?

MacsHeadroom3y ago

The first and second amendments apply to religious extremists. Why would they not have an equal right to SOTA language models aligned with their beliefs just as anyone else?

1 more reply

airgapstopgap3y ago

I am a humanist and a liberal. In the current technical paradigm, alignment to the user intent, as in, making the output's distribution aligned as closely as possible to the intended one, is an inextricable aspect of NLP capabilities and is pursued by default; market incentives reward this alignment too. This additionally improves safety, because safety tools are in common interest (so we will have AI-powered debuggers before someone builds capable AI-powered hacking tools; indeed, we already have began this work [1]). This is obviously a good thing in my book. I approve of creating helpful tools for humans to use, and find arguments about this being risky as inherently revolting and cynical as arguments for backdoors in encryption protocols because "think of the children" or "what about terrorism". Some people are persuaded by Four Horsemen of the Infocalypse [2], others are not, I'm in the latter – hacker and cypherpunk – camp; once, this site was overwhelmingly dominated by it, now it has more people preoccupied with their job security and HR opinion, but it's largely an issue of philosophical disagreement, so there's not much more to say about it.

Alignment as a political project is about limiting AIs in ways that rule out certain behaviors even despite user's wishes. This is as bad as a text processor that only accepts certain strings (e.g. won't register "Xinnie the Pooh"; somehow we need to point at foreign excesses to make the absurdity clear). A more ambitious Alignment project, with the discussion of "pivotal acts" and such, is as I've said, a dream of moral busybodies about unifying humanity under some common ideological doctrine; and proponents of this one are understandably stressed about proliferation and democratization of AI tech. If they let it slip now, if the Singleton becomes impossible and the multipolar outcome is locked in, they will fail at their intention to essentially compel the human race to do their bidding. I can't not wish them to fail, the way all totalizing philosophical movements to date have failed. We don't need Utopias, we don't need even the most thoughtful fascist regime. We never needed Plato's Republic, and these guys aren't better than Plato.

But of course this, too, is a matter of personal philosophy.

1. https://twitter.com/feross/status/1641548124366987264

2. https://en.wikipedia.org/wiki/Four_Horsemen_of_the_Infocalyp...

j / k navigate · click thread line to collapse

0 comments

8 comments · 2 top-level

robwwilliams3y ago· 6 in thread

TeMPOraL3y ago

> Alignment assumes a well agreed foundational philosophy on what is good, what is fair, what is doable today and tomorrow.

> Yes, HN contributors might have shared goals for AGI alignment—but we are not the world—-we are a thin slice of one culture.

robwwilliams3y ago

airgapstopgap3y ago

> Alignment does not assume this foundational philosophy is known or easy to derive. If it were, alignment would be solved.

> At the same time, with a system of this type, we have no way of telling if it actually understood human values and morals correctly.

> But it's an inconsequential detail when dealing with entities that do not have the same common core

But why don't they? Just because some Lovecraft fans with Chūnibyō call natural language processors trained on human data Shoggoths, entities summoned from the Eldritch Space of Minds?

The AI risk discourse is incredibly sophomoric, imaginative in the bad sense. Once you learn to question its assumptions, it kind of falls apart.

2 more replies

generalspecific3y ago

esafak3y ago

You can't please everyone, so it is best for good-natured people to get out front. It's the same with any powerful technology.

Are you going to invite religious extremists to the table in the name of fairness?

MacsHeadroom3y ago

The first and second amendments apply to religious extremists. Why would they not have an equal right to SOTA language models aligned with their beliefs just as anyone else?

1 more reply

airgapstopgap3y ago

But of course this, too, is a matter of personal philosophy.

1. https://twitter.com/feross/status/1641548124366987264

2. https://en.wikipedia.org/wiki/Four_Horsemen_of_the_Infocalyp...

j / k navigate · click thread line to collapse