The distribution of sorted digits is:
(0 1 2 3 4 5 6 7 8 9) before
(1 before 0 1 2 3 4 5 6 7 8 9) before
(2 before 0 1 2 3 4 5 6 7 8 9) before
(3 before 0 1 2 3 4 5 6 7 8 9) ...
...
When you compute the search space you're treating each number as a unique token (ie., that all ordinals are unique) -- but its not sorting unique ordinals, it's sorting digits in a sequential model ie., it learns P(Next|Prev)
The (sequential) distribution of digits amongst sorted numbers is tiny