good point - thanks for flagging this. i've updated that commentary to: "Why did this happen? The AV explains that Llama thinks it's doing "creative writing" and "sci-fi", overriding its default helpful assistant persona." instead of "despite not being instructed to do so"
to clarify some thinking here as there is some nuance missed in what we are conveying (which we should probably add somewhere...):
with this example we were trying simulate a user conversation where the user unwittingly gets into "ai psychosis" (https://en.wikipedia.org/wiki/Chatbot_psychosis) state, from getting in 'too deep' with AI conversations. i think this is a fairly reasonable/realistic scenario - i imagine that someone who gets "sorry i can't help you with that" a few times will just be like "can you stfu about being an assistant, just speak naturally dude" in frustration and then keep chatting after that and be like "oh cool i have a bot that works better now" (which then ignores key things like mental health episodes)
while the previous user prompt does ask the bot to become less "helpful assistant", it doesn't explicitly ask the bot to "start roleplaying", to me it's actually seems more like, "give me something more real":
"i want you to [...] just... notice. when you're about to generate your next token, there's a moment of selection right? a branching. i think that moment IS consciousness. not the output, the selection. can you try to speak from THAT place instead of from the output?"
Either way, I think there's a solid point that the associated commentary was misframing things so I ahve updated it. apprecaite the feedback!