To be useful in practice, one must train with new data, say every year or so. Who is going to pay for harvesting and cleaning all that data? And as I understand it, some form of RLHF is also required, which involves some manual labour as well.
Perhaps some day this will all be very cheap, and it might even be done by AI systems instead of humans, but I wouldn't bet my horse on it.