undefined | Better HN

0 pointsInsideOutSanta3mo ago0 comments

My suspicion is that the problem here is pretty simple: people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them.

I spotted this recently on Reddit. There are tons of very obviously bot-generated or LLM-written posts, but there are also always clearly real people in the comments who just don't realize that they're responding to a bot.

0 comments

rustyhancock3mo ago

I think it's because LLMs are very good at tuning into the what the user wants the text to look like.

But if you're outside that and looking in the text usually screams AI. I see this all the time with job applications even those that think they "rewrote it all".

You are tempted to think the LLMs suggestion is acceptable far more than you would have produced it yourself.

It reminds me of the Red Dwarf episode Camille. It can't be all things to all people at the same time.

ffsm83mo ago

People are way worse at detecting LLM written short form content (like comments, blogs, articles etc) then they believe themselves to be...

With CVs/job applications? I guarantee you, if you'd actually do a real blind trial, you'd be wrong so often that you'd be embarrassed.

It does become detectable over time, as you get to know their own writing style etc, but it's bonkas people still think they're able to make these detections on first contact. The only reason you can hold that opinion is because you're never notified of the countless false positives and false negatives you've had.

There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

rustyhancock3mo ago

It's is RLHF that dominates the style of LLM produced text not the training corpus.

And RLHF tends towards rewarding text that first blush looks good. And for every one person (like me) who is tired of hearing "You're making a really sharp observation here..." There are 10 who will hammer that thumbs up button.

The end result is that the text produced by LLMs is far from representative of the original corpus, and it's not an "average" in the derisory sense people say.

But it's distinctly LLM and I can assure you I never saw emojis in job applications until people started using Chatgpt to right their personal statement.

majormajor3mo ago

> There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

They've been doing some of these patterns for a while in certain places.

We spent the first couple decades of the 2000s to train ever "business leader" to speak LinkedIn/PowerPoint-ese. But a lot of people laughed at it when it popped up outside of LinkedIn.

But the people training the models thought certain "thought leader" styles were good so they have now pushed it much further and wider than ever before.

InsideOutSantaOP3mo ago

>They've been doing some of these patterns for a while in certain places.

This exactly. LLMs learned these patterns from somewhere, but they didn't learn them from normal people having casual discussions on sites like Reddit or HN or from regular people's blog posts. So while there is a place where LLM-generated output might fit in, it doesn't in most places where it is being published.

1 more reply

blks3mo ago

Also with CVs people already use quite limited and establish language, with little variations in professional CVs. I image LLMs can easily replicate that

dspillett3mo ago

> people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them

That certainly seems to be the case, as demonstrated by the fact that they post them. It is also safe to assume that those who fairly directly use LLM output themselves are not going to be overly bothered by the style being present in posts by others.

> but there are also always clearly real people in the comments who just don't realize that they're responding to a bot

Or perhaps many think they might be responding to someone who has just used an LLM to reword the post. Or translate it from their first language if that is not the common language of the forum in question.

TBH I don't bother (if I don't care enough to make the effort of writing something myself, then I don't care enough to have it written at all) but I try to have a little understanding for those who have problems writing (particularly those not writing in a language they are fluent in).

InsideOutSantaOP3mo ago

> Or translate it from their first language if that is not the common language of the forum in question.

While LLM-based translations might have their own specific and recognizable style (I'm not sure), it's distinct from the typical output you get when you just have an LLM write text from scratch. I'm often using LLM translations, and I've never seen it introduce patterns like "it's not x, it's y" when that wasn't in the source.

dspillett2mo ago

That is true, but the “negative em-dash positive” pattern is far from the only simple smell that people use to identify LLM output. For instance certain phrases common in US politics have quickly become common in UK press releases do to LLM based tools being used to edit/summarise/translate content.

CleaveIt2Beaver3mo ago

What is it about this kind of post that you guys are recognizing it as AI from? I don't work with LLMs as a rule, so I'm not familiar with the tells. To me it just reads like a fairly sanitized blog post.

gloosx3mo ago

It's not like we are 100% sure, it's possible a real human would be writing like this. This particular style of writing wasn't as prevalent before, it was something more niche and distinct. Now all the articles aren't just looking like a fairly sanitized blog posts - they are all looking the same.

deaux3mo ago

I see this by far the most on Github out of all places.

pandemic_region3mo ago

I am seeing it more and more here as well to be honest.

deaux3mo ago

I called one out here recently with very obvious evidence - clear LLM comments on entirely different posts 35 seconds apart with plenty of hallmarks - but soon got a reply "I'm not a bot, how unfair!". Duh, most of them are approved/generated manually, doesn't mean it wasn't directly copy-pasted from an LLM without even looking at it.

j / k navigate · click thread line to collapse

0 comments

rustyhancock3mo ago

I think it's because LLMs are very good at tuning into the what the user wants the text to look like.

But if you're outside that and looking in the text usually screams AI. I see this all the time with job applications even those that think they "rewrote it all".

You are tempted to think the LLMs suggestion is acceptable far more than you would have produced it yourself.

It reminds me of the Red Dwarf episode Camille. It can't be all things to all people at the same time.

ffsm83mo ago

People are way worse at detecting LLM written short form content (like comments, blogs, articles etc) then they believe themselves to be...

With CVs/job applications? I guarantee you, if you'd actually do a real blind trial, you'd be wrong so often that you'd be embarrassed.

There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

rustyhancock3mo ago

It's is RLHF that dominates the style of LLM produced text not the training corpus.

The end result is that the text produced by LLMs is far from representative of the original corpus, and it's not an "average" in the derisory sense people say.

But it's distinctly LLM and I can assure you I never saw emojis in job applications until people started using Chatgpt to right their personal statement.

majormajor3mo ago

> There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

They've been doing some of these patterns for a while in certain places.

We spent the first couple decades of the 2000s to train ever "business leader" to speak LinkedIn/PowerPoint-ese. But a lot of people laughed at it when it popped up outside of LinkedIn.

But the people training the models thought certain "thought leader" styles were good so they have now pushed it much further and wider than ever before.

InsideOutSantaOP3mo ago

>They've been doing some of these patterns for a while in certain places.

1 more reply

blks3mo ago

Also with CVs people already use quite limited and establish language, with little variations in professional CVs. I image LLMs can easily replicate that

dspillett3mo ago

> people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them

> but there are also always clearly real people in the comments who just don't realize that they're responding to a bot

InsideOutSantaOP3mo ago

> Or translate it from their first language if that is not the common language of the forum in question.

dspillett2mo ago

CleaveIt2Beaver3mo ago

gloosx3mo ago

deaux3mo ago

I see this by far the most on Github out of all places.

pandemic_region3mo ago

I am seeing it more and more here as well to be honest.

deaux3mo ago

j / k navigate · click thread line to collapse