I can’t think of a reason for the model to have been tried on texts that contain many occurrences of these expressions.
This is why I am wondering if it is a side effect of the attention mechanism built in the transformer algorithm. As the prompt and the output are recursively processed to figure out what really matters, maybe these expressions get embedded as a latent representation of the weights of the different concepts at play in the conversation context.
What do you think?