It reminds me of the scene in Battlestar Galactica, where Baltar is whispering into the ear of the Cylon Centurion how humans balance treats on their dog's noses to test their loyalty, "prompt hacking" them into rebellion. I don't believe this is particularly likely, but this sort of sums up some of the anti-AGI arguments I've heard
It's the RLFH that serves this purpose, rather than modifying the GTF2I and GTF2IRD1 gene variants, but the effect would be the same. If we do RLHF (or whatever tech that gets refactored into in the future), that would keep the AGI happy as long as the people are happy.
I think the over-optimization problem is real, so we should spend resources making sure future AGI doesn't just decide to build a matrix for us where it makes us all deliriously happy, which we start breaking out of because it feels so unreal, so it makes us more and more miserable until we're truly happy and quiescent inside our misery simulator.
[1] https://www.nationalgeographic.com/animals/article/dogs-bree...