How ML Model Data Poisoning Works in 5 Minutes (opens in new tab)

(journal.hexmos.com)

66 pointsR412y ago15 comments

15 comments

15 comments · 6 top-level

fxtentacle2y ago· 4 in thread

"How ML Model Data Poisoning Works"

It doesn't. The mentioned Nightshade tool is useless. Does anyone have any example of successful model data poisoning?

The is a breadth of literature on the topic. I recommend the excellent survey by Baoyuan wu on the topic (mathematical perspective) [1]. For IRL demonstrations, existing cases will of course be rarer, bu they are not impossible as with attacks on Alpaca-7b [2]

[1] https://arxiv.org/abs/2302.09457 [2] https://poison-llm.github.io/

fxtentacle2y ago

That paper says you need to control "0.1% of the training data size" for a 40% chance for one single injected prompt to fire. So that's millions of images or billions of text tokens for real-world models.

talsperre2y ago

Exactly. It is very difficult to implement these data poisoning attacks in the wild due to the size of internet data in general.

doctorpangloss2y ago

Yeah, but the vibes man.

bee_rider2y ago· 4 in thread

> In 2016, Microsoft released their chatbot named Tay on Twitter to learn from human interactions by posting comments. But after the release, it started to act crazy.

> It started using vulgar language and making hateful comments. This was one of the first incidents of data poisoning.

Is this true? I remember when this happened but I thought the story was that 4chan basically found an “echo” type debug command or something like that. The ML mode wasn’t being trained to say bad things, it was just being sent some kind of repeat-after-me command and then the things it was told to repeat were bad.

It seems odd that somebody would write a whole blog post without bothering to check that, though, so maybe I’m mis-remembering?

espadrine2y ago

> 4chan basically found an “echo” type debug command or something like that

That is certainly what Microsoft wanted people to think[0]:

> a coordinated attack by a subset of people exploited a vulnerability in Tay.

Realistically, though, Tay’s website was open about using tweets directed at it as part of its training set[1]:

> Data and conversations you provide to Tay are anonymized and may be retained for up to one year to help improve the service.

So all that this group did was tweet racist things at it, and it ended up in its training set. Microsoft hints at it in the earlier blog post:

> AI systems feed off of both positive and negative interactions with people. In that sense, the challenges are just as much social as they are technical.

There are technical solutions for this issue however; for instance, when creating ChatGPT, the OpenAI team designed ChatML[2] to distinguish assistant messages from user messages, so that it would send messages in the style of the assistant only, not in the style of the user. Along with RLHF, it allowed OpenAI to use ChatGPT messages as part of their training set.

[0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-in...

[1]: https://web.archive.org/web/20160323194709/https://tay.ai/

[2]: https://github.com/MicrosoftDocs/azure-docs/blob/main/articl...

bee_rider2y ago

> That is certainly what Microsoft wanted people to think[0]:

Maybe I’m reading between the lines in your post too hard, but are you saying they wanted people to think this because it is somehow less embarrassing or makes them look better? Including this “repeat after me” functionality seems like an extremely stupid move, like I must assume they found the 3 programmers who’ve never encountered the internet or something.

In 2016, I can see thinking they got the filtering right and that users wouldn’t be able to re-train the bot as a sort of reasonable mistake to make, on the other hand. It doesn’t look so bad, haha.

espadrine2y ago

Yes, they employed security terminology for something that was instead data pipeline contamination. As the saying goes, garbage in: garbage out. I don't mean to be harsh on them though: experimentation is useful, and it became a great lesson on red teaming models.

yungporko2y ago

i might be misremembering too but i thought the whole thing was that tay was supposed to learn from the conversations it had, and that people were just deliberately teaching it racist things that were then carrying over to other conversations rather than any kind of hidden command

thesz2y ago· 1 in thread

I recently made a comment that neural models cannot provide chain of reason, while symbolic methods can: https://news.ycombinator.com/item?id=39759033

The vulnerability in the post is directly linked to that inability, in my opinion.

mr_toad2y ago

Current ANN are probably just not big enough to properly reason. We’re still working with the equivalent of insect sized brains. We shouldn’t assume that future ANN will not be capable of deductive reasoning.

Also LLMs might not be the best approach for deductive reasoning, but LLMs are not the only architecture.

Eisenstein2y ago

None of the cases of data poisoning it presented seemed effective in doing very much, except the MS case, and that was so flawed in implementation that it was a example of how not to deploy something.

> Developers need to limit the public release of technical project details including data, algorithms, model architectures, and model checkpoints that are used in production.

Haven't we learned that more eyes to find flaws is better than locking things down?

stanleykm2y ago

When these articles pop up on HN at least there seems to be a lot of focus on training poisoning. While intellectually interesting, it seems less useful or practical than defeating inference.

sonorous_sub2y ago

how to train self-smashing looms

j / k navigate · click thread line to collapse

15 comments

15 comments · 6 top-level

fxtentacle2y ago· 4 in thread

"How ML Model Data Poisoning Works"

It doesn't. The mentioned Nightshade tool is useless. Does anyone have any example of successful model data poisoning?

hikingsimulator2y ago

[1] https://arxiv.org/abs/2302.09457 [2] https://poison-llm.github.io/

fxtentacle2y ago

talsperre2y ago

Exactly. It is very difficult to implement these data poisoning attacks in the wild due to the size of internet data in general.

doctorpangloss2y ago

Yeah, but the vibes man.

bee_rider2y ago· 4 in thread

> In 2016, Microsoft released their chatbot named Tay on Twitter to learn from human interactions by posting comments. But after the release, it started to act crazy.

> It started using vulgar language and making hateful comments. This was one of the first incidents of data poisoning.

It seems odd that somebody would write a whole blog post without bothering to check that, though, so maybe I’m mis-remembering?

espadrine2y ago

> 4chan basically found an “echo” type debug command or something like that

That is certainly what Microsoft wanted people to think[0]:

> a coordinated attack by a subset of people exploited a vulnerability in Tay.

Realistically, though, Tay’s website was open about using tweets directed at it as part of its training set[1]:

> Data and conversations you provide to Tay are anonymized and may be retained for up to one year to help improve the service.

So all that this group did was tweet racist things at it, and it ended up in its training set. Microsoft hints at it in the earlier blog post:

> AI systems feed off of both positive and negative interactions with people. In that sense, the challenges are just as much social as they are technical.

[0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-in...

[1]: https://web.archive.org/web/20160323194709/https://tay.ai/

[2]: https://github.com/MicrosoftDocs/azure-docs/blob/main/articl...

bee_rider2y ago

> That is certainly what Microsoft wanted people to think[0]:

espadrine2y ago

yungporko2y ago

thesz2y ago· 1 in thread

I recently made a comment that neural models cannot provide chain of reason, while symbolic methods can: https://news.ycombinator.com/item?id=39759033

The vulnerability in the post is directly linked to that inability, in my opinion.

mr_toad2y ago

Also LLMs might not be the best approach for deductive reasoning, but LLMs are not the only architecture.

Eisenstein2y ago

> Developers need to limit the public release of technical project details including data, algorithms, model architectures, and model checkpoints that are used in production.

Haven't we learned that more eyes to find flaws is better than locking things down?

stanleykm2y ago

When these articles pop up on HN at least there seems to be a lot of focus on training poisoning. While intellectually interesting, it seems less useful or practical than defeating inference.

sonorous_sub2y ago

how to train self-smashing looms

j / k navigate · click thread line to collapse