I'm not knocking the work. They report large improvements using relatively little data. That's good. But let's be clear that this is further training of a good sized LLM that has read far, far more than any human that ever lived
already.
I know. The question is: How much of the Internet trove, including the smart bits, but also the tremendous amount of inane content, is actually useful to building the foundation that allows 1,000 problems to have such an effect?