That just isn't true. It's expensive, but entirely doable. Also, it's perfectly normal to perform initial model training on a large data set to capture the statistical properties of language, then perform a second stage of model training on more curated data to cause the model to actually do what you want.