A great thing about RNNs is they can easily fork the state and generate trees, it would be possible to backtrack and work on combinatorial search problems.
Also easier to cache demonstrations for free in the initial state, a model that has seen lots of data is not using more memory than a model starting from scratch.