But isn't "alignment" in these cases more about providing answers aligned to a certain viewpoint (e.g. "politically correct" answers) than preventing any kind of AI catastrophe?
IIRC, one of these "aligned" models produced output saying it would rather let New York City be nuked than utter a racial slur. Maybe one of these "aligned" models will decide to kill all humans to finally stamp out racism once and for all (which shows the difference between this kind of alignment under discussion and the kind of alignment you're talking about).