Maybe. Mechanically we might also describe it as causing the model to condition more explicitly on specific tokens derived from the training data rather than the implicit conditioning happening in the raw model parameters. This would tend to more tightly constrain the output space—making a smaller haystack to look for a needle. And leveraging the fact that “next token prediction” implies some consistency with preceding tokens.
It could be thinking, but I don’t think that’s strong evidence that it is thinking.