As I understand you, what you state is exactly what I meant. If you train with a bunch of text containing substrings of those 50-grams, but not the full 50-grams themselves [or, expose it to the same vocabulary used in the same parts of speech as in the full 50], the model will pretty readily produce the full 50-grams despite never having seen them in their entirety. Try it out, it's pretty easy to do on a modern GPU and can be done in less than an hour.