undefined | Better HN

story

0 pointsbko2y ago0 comments

Copying a comment I posted a while ago:

I listened to a podcast with Scott Aaronson that I'd highly recommend [0]. He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.

Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.

I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.

That part starts around 1 hour 25 min in.

> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.

https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...

0 comments

constantcrying2y ago

I think the chance of this working reliably is precisely zero. There are multiple trivial attacks against this and it can not work if the user has any kind of access to token level data (where he could trivially write his own truly random choice). And if there is a non-water marking neural network with enough capacity to do simple rewriting you can easily remove any watermark or the user does the minor rewrite himself.

teaearlgraycold2y ago

It’ll be the equivalent to a shutterstock watermark.

air72y ago

I heard of this (very neat) idea and gave it some thought. I think it can work very well in the short term. Perhaps OpenAI has already implemented this and can secretly detect long enough text created by GPT with high levels of accuracy.

However, as soon a detection tool becomes publicly available (or even just the knowledge that watermarking has been implemented internally), a simple enough garbling LLM would pop up that would only need to be smart enough to change words and phrasing here and there.

Of course these garbling LLMs could have a watermark of their own... So it might turn out to be a kind of cat-and-mouse game but with strong bias towards the mouse, as FOSS versions of garblers would be created or people would actually do some work manually, and make the changes by hand.

constantcrying2y ago

There are already quite complex language models which can run on a CPU. Outside of the government banning personal LLMs, the chance of there not existing a working fully FOSS and open data rewrite model, if it becomes known that ChatGPT output is marked, seems very low.

The water marking techniques also can not work after some level of sophisticated rewriting. There simply will be no data encoded in the probabilities of the words.

motti2y ago

If it's sophisticatedly rewritten then it's no longer AI generated

1 more reply

concurrentsquar2y ago

This, or cryptographic signing (like what the C2PA suggests) of all real digital media on the Earth are the only ways to maintain consensus reality (https://en.wikipedia.org/wiki/Consensus_reality) in a post-AI world.

I personally would want to live in Aaronson's world, and not the world where a centralized authority controls the definition of reality.

greiskul2y ago

How can we maintain consensus reality, when it has never existed? There are a couple of bubbles of humanity where honesty and skepticism and valued. Everywhere else, at all moments of history, truth has been manipulated to subjugate people. Be it newspaper owned by polical families, priests, etc.

ummonk2y ago

This would be trivially broken once sufficiently good open source pretrained LLMs become available, as bad actors would simply use unwatermarked models.

vunderba2y ago

Even if you could force the bad actors to use this watermarked large language model, there's no guarantee that they couldn't immediately feed that through Langchain into a different large language model that would render all the original watermarks useless.

j / k navigate · click thread line to collapse

0 comments

constantcrying2y ago

teaearlgraycold2y ago

It’ll be the equivalent to a shutterstock watermark.

air72y ago

constantcrying2y ago

The water marking techniques also can not work after some level of sophisticated rewriting. There simply will be no data encoded in the probabilities of the words.

motti2y ago

If it's sophisticatedly rewritten then it's no longer AI generated

1 more reply

concurrentsquar2y ago

I personally would want to live in Aaronson's world, and not the world where a centralized authority controls the definition of reality.

greiskul2y ago

ummonk2y ago

This would be trivially broken once sufficiently good open source pretrained LLMs become available, as bad actors would simply use unwatermarked models.

vunderba2y ago

j / k navigate · click thread line to collapse