Problems in AI alignment: A scale model (opens in new tab)

(muldoon.cloud)

49 pointshamburga1y ago43 comments

43 comments

30 comments · 8 top-level

blamestross1y ago· 5 in thread

I'm kind of upset to see systematically "Alignment" and "AI Safety" co-opted for "undesirable business outcomes".

These are existential problems, not mild profit blockers. Its almost like the goals of humanity and these companies are misaligned.

godelski1y ago

  > Its almost like the goals of humanity and these companies are misaligned.

Certainly. I'd say that we've created a lot of Lemon Markets, if not an entire Lemon Economy[0]. The Lemon Market is literally an alignment problem, resultant from asymmetric information. Clearly the intent of the economy (via our social contract) is that we allocate money towards things that provide "value". Where I think we generally interpret that word to mean bettering peoples' lives in some form or another. But it is also clear that the term takes on other definitions and isn't perfectly aligned with making us better. Certainly our metrics can be hacked, as in the case of Lemon Markets.

A well functioning market has competition that not only drives down prices but increases quality of products. Obviously customers want to simultaneously maximize quality and minimize price. But when customers cannot differentiate quality, they can only minimize price. Leading to the feedback loop, where producers are in a race to the bottom, making sacrifices to quality in favor of driving down prices (and thus driving up profits). Not because this is actually the thing that customers want! But because the market is inefficient.

I think critical to these alignment issues is that they're not primarily driven by people trying to be malicious nor deceptive. They are more often driven by being short sighted and overlooking subtle nuances. They don't happen all at once, but instead slowly creep, making them more difficult to detect. It's like good horror: you might know something is wrong, but by the time you put it all together you're dead. It isn't because anyone is dumb or doing anything evil, but because maintaining alignment is difficult and mistakes are easy.

[0] https://en.wikipedia.org/wiki/The_Market_for_Lemons

tbrownaw1y ago

No, we are not on track to create a literal god in the machine. Skynet isn't actually real. LLM system do not have intent in the way that is presupposed by these worries.

This is all much much less of an existential threat than, say, nuclear-armed countries getting into military conflicts, or overworked grad students having lab accidents with pathogen research. Maybe it's as dangerous as the printing press and the wars that that caused?

godelski1y ago

It's a much greater existential threat. An entity with intent, abductive reasoning, and self-defined goals is more interpretable. They can fill in the gaps between the letter of an instruction and the intent of an instruction. They may have their own agendas, but they are able to interpret through those gaps without outside help.

But machines? Well they have none of that. They're optimized to make errors difficult to detect. They're optimized to trick you, even as reported by OpenAI[0]. It is a much greater existential threat than the overworked grad student because I can at least observe them getting flustered, making mistakes, and have much more warning like by the very nature of over working them. You can see it on their face. But the machine? It'll happily chug along.

Have you never written a program that ends up doing something you didn't intend it to?

Have you never dropped tables? Deleted files? Destroyed things you never intended to?

The machine doesn't second guess you, it just says "okay :)"

[0] https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def563...

hollerith1y ago

We disagree on that.