undefined | Better HN

0 pointspishpash7y ago0 comments

Exactly. This is like holding up spam samples or how spammers operate from the spam detecting work. That side (and the cultural discussions) needs all the headstart it can get, not be complacent that some arbitrary "experts" will patronizingly "protect" them.

0 comments

10 comments · 1 top-level

roenxi7y ago· 9 in thread

If you look at it as a PR stunt, it is almost certainly a good idea. If a bad actor can auto-generate text that is not really distinguishable from something written by a human, how does a community with open membership (eg, HN) protect itself? I imagine this technology will enable interesting new attacks against online communities; we havn't seen that for a while.

OpenAI are extremely sensible to draw attention to the fact that AI is approaching a boundary that has practical implications. It is good that everyone is being alerted that that boundary might be crossed at any time in the foreseeable future.

pas7y ago

But ... it's not novel. We could already generate convincing gibberish years ago.

Now the novelty is that this can be better targeted. But even simple Markov-chain based text generators were good enough to fool people for a bit.

And there was always people that had too much free time to write. A lot. (See for example the crackpots and conspiracy theorists that bombard physics forums. See the 9/11, Zeitgeists, etc. movies. See how much has been written about anti-vaxx, about quantum woo, etc.)

Reputation systems work pretty well for countering spammers.

And against APTs (advanced persistent threats, spearfishing attacks, etc) there's no real "universal" protection anyways. (You need a competent security team to out think and out resource the attackers in every possible dimension.)

This AI is the same as the paid Russian trolls and the unpaid scammers, and so on.

avinium7y ago

The OpenAI samples are leaps and bounds ahead of traditional Markov-chain generated text. I don't think you can compare the two. It's the fluency and plausibility that gives pause around a public release.

I agree with your last point though - it falls into the same category as paid Russian trolls. I think that's exactly why they were hesitant to release the pre-trained models - they didn't want to make it easier/cheaper for a bad actor to replicate the 2016 election.

It remains to be seen whether their decision will make an iota of a difference. But I understand their motivation.

nl7y ago

But ... it's not novel.

I work in this field, and yes, this is very novel (at least in terms of the quality).

It's the biggest improvement in quality I've ever seen. The long term coherence is so much better than anything else that has ever been built.

1 more reply

Scaevolus7y ago

Markov-chain generators are extremely lacking in long-term coherency. They rarely even make complete sentences, much less stay on topic! They were not convincing at all-- and many of the GPT-2 samples are as "human-like" as average internet comments.

Conjecture: GPT-2 trained on reddit comments could pass a "comment turing test", where the average person couldn't distinguish whether a comment is bot or human with better than, say, 60% accuracy.

2 more replies

repolfx7y ago

I'm not sure it has much in the way of implications.

There is no real profit to be made by generating realistic looking text. Spammers don't work that way, spammers haven't cared about realistic looking text for years. Nor have spam filters cared much about text for a long time, exactly because it's so easy to randomise. Anti-spam is not a good reason to hold back on language generation models, in my view.

As for HN, if bots can write posts as good as humans, great, why hold back?

Cacti7y ago

You’re fooling yourself if you think there are no significant uses of text generation. Fake news, propaganda, advertising, fake reviews, fake everything. Fabricated email from friends family and colleagues. Whole online communities fabricated out of whole cloth. It is a weapon, and a powerful one.

1 more reply

Cacti7y ago

No, a more effective PR stunt would be to release the model, and better ones, and make it so easy any idiot could use them. THAT would catch the attention of Congress, and THAT would result in funds and lesiglation to combat it. This won’t even register on a sub committees staffers wet dream. It is not human nature to pay attention to far off hypothetical abstract threats, only concrete and immediate ones. You could release a thousand papers like this and it wouldn’t do anything even approaching the effect of congressmen and their staff getting assloads of fake but convincing email/docs/etc, the press being indicated with thousands of fake but convincing tips, of tens of thousands of people calling the police because some asshats are spamming them with convincing letters from their dead grandma or whatever, of convincing communication to banks or brokers, letters to agencies claiming widespread danger (ie there is salmonella in half the food at xyz), kids sending forged letters to their school from their supposed parents to let them leave campus, and so on. I’m sure you can think of better examples.

dogma11387y ago

I’m not entirely sure that that bad actor would get any more scalablity form it than from a Mechanical Turk farm, at least as far as impact goes.

It seem that as far as information warfare goes “less is more” works quite well and they rely on targeted people to spread the news for them.

When you want to drive an agenda you don’t need unique 100,000 comments you need a good copy pasta.

Overall I’m sick of this dramatization of the AI catastrophe until there will be a proven path with agency for it to actually operate in the real world.

A chat bot isn’t a threat to anyone even if it turns homicidle.

Cacti7y ago

But a Mechanical Turk is traceable and definitely not anonymous. Using a self contained model somewhere on a server/cluster/workstation could be.

Regarding an agenda, sure, good pasta is fine and all, and regular ol people are fine, but it is not cost effective. This is a million times cheaper, which means you can use it everywhere, not just the obvious places, you can be everywhere, and you can do more than just push a couple big items, you could push tens of thousands of them, micro targeted all the way down to the individual. Don’t dismiss it so easy—the potential scale is far, far larger than anything existing to date.

And I would note that the reason 100,000 comments aren’t effective now is precisely because they are too formulaic, too obviously fake when used on such a large scale. This has the potential to create real, live, seemingly active and believable online communities of millions of people, all at fractions and fractions and fractions of a penny compared to current methods. People read news, then comments (or reviews or whatever), because they use them to determine the validity of the content they just read; if it’s no longer possible to tell from the comments what’s a scam and what isn’t... well, you could do a lot of things with that.

j / k navigate · click thread line to collapse

0 comments

10 comments · 1 top-level

roenxi7y ago· 9 in thread

pas7y ago

But ... it's not novel. We could already generate convincing gibberish years ago.

Now the novelty is that this can be better targeted. But even simple Markov-chain based text generators were good enough to fool people for a bit.

Reputation systems work pretty well for countering spammers.

This AI is the same as the paid Russian trolls and the unpaid scammers, and so on.

avinium7y ago

It remains to be seen whether their decision will make an iota of a difference. But I understand their motivation.

nl7y ago

But ... it's not novel.

I work in this field, and yes, this is very novel (at least in terms of the quality).

It's the biggest improvement in quality I've ever seen. The long term coherence is so much better than anything else that has ever been built.

1 more reply

Scaevolus7y ago

Conjecture: GPT-2 trained on reddit comments could pass a "comment turing test", where the average person couldn't distinguish whether a comment is bot or human with better than, say, 60% accuracy.

2 more replies

repolfx7y ago

I'm not sure it has much in the way of implications.

As for HN, if bots can write posts as good as humans, great, why hold back?

Cacti7y ago

1 more reply

Cacti7y ago

dogma11387y ago

I’m not entirely sure that that bad actor would get any more scalablity form it than from a Mechanical Turk farm, at least as far as impact goes.

It seem that as far as information warfare goes “less is more” works quite well and they rely on targeted people to spread the news for them.

When you want to drive an agenda you don’t need unique 100,000 comments you need a good copy pasta.

Overall I’m sick of this dramatization of the AI catastrophe until there will be a proven path with agency for it to actually operate in the real world.

A chat bot isn’t a threat to anyone even if it turns homicidle.

Cacti7y ago

But a Mechanical Turk is traceable and definitely not anonymous. Using a self contained model somewhere on a server/cluster/workstation could be.

j / k navigate · click thread line to collapse