>> Note it is not even trivial to detect the character given our model does not train with any labels or have any inductive biases to do so.
Why not add inductive biases then and make your life easier? What's with this choice to try and do everything the hard way, presumably to make a point? In the end the point made is so specific that it translates to nothing that is usable in real problems.
See MuZero for example- sure, you can learn without being given the rules explicitly, just from the win/loss signal, but then that only works in board games and atari games, and without the chance of a snowball in hell that it will work in the real world. We're dazzled by the technical prowess, but real utility? Where is that?