I spotted this recently on Reddit. There are tons of very obviously bot-generated or LLM-written posts, but there are also always clearly real people in the comments who just don't realize that they're responding to a bot.
But if you're outside that and looking in the text usually screams AI. I see this all the time with job applications even those that think they "rewrote it all".
You are tempted to think the LLMs suggestion is acceptable far more than you would have produced it yourself.
It reminds me of the Red Dwarf episode Camille. It can't be all things to all people at the same time.
With CVs/job applications? I guarantee you, if you'd actually do a real blind trial, you'd be wrong so often that you'd be embarrassed.
It does become detectable over time, as you get to know their own writing style etc, but it's bonkas people still think they're able to make these detections on first contact. The only reason you can hold that opinion is because you're never notified of the countless false positives and false negatives you've had.
There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.
And RLHF tends towards rewarding text that first blush looks good. And for every one person (like me) who is tired of hearing "You're making a really sharp observation here..." There are 10 who will hammer that thumbs up button.
The end result is that the text produced by LLMs is far from representative of the original corpus, and it's not an "average" in the derisory sense people say.
But it's distinctly LLM and I can assure you I never saw emojis in job applications until people started using Chatgpt to right their personal statement.
They've been doing some of these patterns for a while in certain places.
We spent the first couple decades of the 2000s to train ever "business leader" to speak LinkedIn/PowerPoint-ese. But a lot of people laughed at it when it popped up outside of LinkedIn.
But the people training the models thought certain "thought leader" styles were good so they have now pushed it much further and wider than ever before.
This exactly. LLMs learned these patterns from somewhere, but they didn't learn them from normal people having casual discussions on sites like Reddit or HN or from regular people's blog posts. So while there is a place where LLM-generated output might fit in, it doesn't in most places where it is being published.
That certainly seems to be the case, as demonstrated by the fact that they post them. It is also safe to assume that those who fairly directly use LLM output themselves are not going to be overly bothered by the style being present in posts by others.
> but there are also always clearly real people in the comments who just don't realize that they're responding to a bot
Or perhaps many think they might be responding to someone who has just used an LLM to reword the post. Or translate it from their first language if that is not the common language of the forum in question.
TBH I don't bother (if I don't care enough to make the effort of writing something myself, then I don't care enough to have it written at all) but I try to have a little understanding for those who have problems writing (particularly those not writing in a language they are fluent in).
While LLM-based translations might have their own specific and recognizable style (I'm not sure), it's distinct from the typical output you get when you just have an LLM write text from scratch. I'm often using LLM translations, and I've never seen it introduce patterns like "it's not x, it's y" when that wasn't in the source.