story
It is fair to ask why, if we accept these verbal challenges as good evidence for a theory of mind in children, we would not accept them for these models, but children have nothing like the memory for text that these models have, and the corpus of text that these models have been trained on includes a great many statements that tacitly represent their authors' theory of mind (i.e. they are the sort of statements that would typically be made by someone having a theory of mind, just as arithmetically-correct statements concerning quantities are to be expected from people who know arithmetic.)
To be clear, I am not arguing that it would be impossible to show a theory of mind in a system that can only interact through text, but personally, I think it will require a model with greater capabilities than responding to prompts. For example, when models can converse among themselves, I think we will know.