undefined | Better HN

0 pointsImnimo1y ago0 comments

But that's the fundamental superalignment plan - train a human-level alignment researcher AI, run a bunch of them in parallel, and review their research output to see if they solve the alignment problem. You can't do the plan until the human-level alignment researcher AI already exists.

0 comments

whimsicalism1y ago

A large part of the idea is that you can develop techniques for aligning sub-human AI using even stupider AI and hope/pray that continues to generalize once you get to super-human AI being aligned by human-level AI.

j / k navigate · click thread line to collapse

0 pointsImnimo1y ago0 comments

0 comments

whimsicalism1y ago

j / k navigate · click thread line to collapse