So, then what really is the problem with just including LLM-generated text in wordfreq?
If quirky word distributions will remain a "problem", then I'd bet that human distributions for those words will follow shortly after (people are very quick to change their speech based on their environment, it's why language can change so quickly).
Why not just own the fact that LLMs are going to be affecting our speech?