If you could actually identify AI deterministically you would have a very profitable product.
I find it interesting that you believe this claim is wildly conspirational, or that you think the difficulty of reliably detecting AI generated text at scale is evidence that humans can't do pretty well at this much more limited task. Do you also find claims that AIs are frequently sycophantic in ways that humans are not, or that they will use phrases like "you're absolutely right!" far more than a human would unless prompted otherwise (which are the exact same type of narrow claim) similarly conspirational? i.e., is your assertion that people would have difficulty differentiating between a real human's response to a prompt and Claude's response to a prompt when there was no specific pre-prompt trying to control the writing style of the response?
> I find it interesting that you believe this claim is wildly conspirational
I don’t believe it’s wildly conspiratorial. I believe it’s foolishly conspiratorial. There’s some weird hubris in believing that you (and whatever group you identify as “us”) are able to deterministically identify AI text when experts can’t do it. If you could actually do it you’d probably sell it as a product.
I think you will find the OP said no such thing. They instead said they identified a mixture of writing styles consistent with a human author and an LLM. The OP says nothing about deterministically identifying LLMs, only that the style of specific sections is consistent with LLMs leading to the conclusion.
I am making an even more limited claim than the article, which is only that it's possible for "experts" (i.e. people who frequently interact with LLMs as part of their day jobs) to identify AI generated text in long-form passages in a way that has very few false positives, not classify it perfectly. I've also introduced the caveat that this only applies to AI generated text that has received minimal or no prompting to "humanize" the writing style, not AI generated text in general.
If you would like to perform a higher-quality study with more recent models, feel free (it's only fair that I ask you to do an unreasonable amount of work here given that your argument appears to be that if I don't quit my lucrative programming job and go manually classify text for pennies on the dollar, it proves that it can't be done).
The reason this isn't offered as a service is because it makes no economic sense to do so using humans, not because it's impossible as you claim. This kind of "human" detection mechanism does not scale the way generation does. The cues that I rely on are also pretty easy to eliminate if you know someone is looking for them. This means that heuristics do not work reliably against someone actively trying to avoid human detection, or a human deliberately trying to sound like an LLM (I feel the need to reiterate this as many of the counterarguments to what I'm saying are to claims of this form).
> I’m not going to write another detailed explanation of why your “slop === AI” premise is flawed.
This isn't a claim that I made. I believe that text written with LLM assistance is not necessarily slop, and that slop is not necessarily AI generated. The only assertion I made regarding slop is that being written with LLM assistance with minimal prompting or editing is a strong predictor of slop, and that the heuristics I'm using (if present in large quantities) are a strong predictor of an article being written with LLM assistance with minimal prompting or editing. i.e. I, I am asserting that these kinds of heuristics work pretty well on articles generated by people who don't realize (or care) that there are LLM "tells" all over their work. The fact that many of the articles posted to HN are being accused of being LLM generated could certainly indicate that this is all just a massive witch hunt, but given the acknowledged popularity of ChatGPT among the general population and the fact that experts can pretty easily identify non-humanized articles, I think "a lot of people are using LLMs in the process of generating their blog posts, and some sizable fraction of those people didn't edit the output very much" is an equally compelling hypothesis.