Why do you think it's not a good heuristic to be able to quickly spot the tell-tale signs of LLM involvement, before you've wasted time reading slop?
Yes, there will be false positives. It's a heuristic after all.
If anything, I'd rather that renderers like Markdown just all agree to change " - " to an en dash and " -- " to an em dash. Then we could put the matter to bed once and for all.
I was just curious why you've decided paying attention to them is a bad heuristic. Sure, it can change once people instruct their LLMs not to use them, but still, for now, they sure seem to overuse them!
That and "let's unpack this". I swear, I'll forbid ChatGPT from using "unpack" ever again, in any context!
So the only real purpose of the heuristic is to add a tiny extra vote of confidence when I see a comment that otherwise appears to be lazy ChatGPT copypasta, but in such cases I'll predict that it was probably LLM output either way, and I'll judge that it appears to be poor writing that isn't worth my time regardless of whether or not an LLM was involved.
Fundamentally, the issue I'm seeing here is that we're all talking over each other because we need a better standardized term than "LLM output". I suppose "slop" could work if we universally that it referred only to a subset of LLM output, rather than being synonymous with LLM output in general, but I'm not sure that we do universally agree on that.
If someone types the equivalent of a Google search into ChatGPT, or a spammer has an automated process generically reply to social media posts/comments, that's what qualifies to me as "slop". Most of us here have seen it in the wild by now, and there's obviously a distinctive common style (at least for now), and I think we can all agree that it sucks. That's very different from someone investing time and/or expertise to produce content that just happens to involve an LLM as one of the tools in their arsenal; the attitude it isn't is just the modern equivalent of considering cellular phone calls or typed letters to be "impersonal".
I'm not suggesting that LLM output doesn't tend to have a higher density of em dashes than human output. I'm just pushing back on the idea that presence of em dashes is sufficient evidence to dismiss something as probably-LLM-generated, which is no better than superstition. I mean, I've used em dashes in a number of comments in this thread, and no one has accused me of using an LLM, so it can't be a pattern that anyone puts too much stock in.
Citation needed.
> Who is it helping if we collectively bully ourselves into excising a perfectly good punctuation mark from human language?
Humans can adapt faster than LLM companies, at least for the moment. We need to be willing to play to our strengths.
Who is it helping if we bully ourselves into ignoring a simple, easy "tell"?
https://en.wikipedia.org/wiki/Dash
Humans can adapt faster than LLM companies
No one said anything about LLM companies. If I were a spammer today, I'd just have my code replace dashes in LLM output with hyphens before posting it. As a human, I'm not going to suddenly stop using dashes because a handful of people are treating a silly meme as if it were a genuinely useful heuristic.