You solve a dataset when you learn what there is to learn about the phenomenon of interest. The limit of such phenomenon is “cure all disease”, and clearly this is not solving that.
MNIST (the number classification task) has been “solved” a billion times and it is hard to imagine any subsequent advances there as scores using a variety of methods have hit the saturation point of accuracy. Any further improvements are likely overfitting to noise. Therefore, we know that it is easy to detect handwritten numbers. However, we may not know how to detect other things as well, like reading an MRI. Those datasets/tasks are clearly different and require different techniques. Training an LLM is likewise different.