How could it not be computing the game of life algorithm? Given that it gets 100% accuracy over multiple steps on a bunch random game boards it's never seen before.
And then based on the structure of the net, and by examining the attention layers and finding that it's doing 3 by 3 average pooling, we can see that the attention layer produces a set of tokens, where each token contains the information of the number of neighbours it had, and its previous state. This then goes through a classifier layer, which decides it's next state, given that information.
Further evidence for that: it was possible to use linear probes to confirm that the tokens that had been through the attention layer contained the information about the number of neighbours and the previous state.
From all of this, it's clear that the model is running the Game of Life properly.