Consider a language with the tokens "{[()]}" and the following grammar:
S := S S | '{' S '}' | '[' S ']' | '(' S ')' | <empty>
That is, "[()]" and "[]()" are valid sequences, but "[(])" or "))))" aren't. A child would quickly figure out the grammar if presented some valid sequences.
I generated all 73206 valid sequences with 10 tokens and used it as input to the RNN text generator code at http://karpathy.github.io/2015/05/21/rnn-effectiveness/. After 500,000 iterations I'm still getting invalid sequences.
Am I doing something stupid, or is a RNN text generator weaker than a child (or a pushdown automaton)? Is GPT fundamentally more powerful than this?