These tests don't feel practical - That is, they seem intended to collapse the model, not demonstrate "in the wild" performance.
The assumption is that all content is black or white - AI or not AI - and that you treat all content as equally worth retraining on.
It offers no room for assumptions around data augmentation, human-guided quality discrimination, or anything else that might alter the set of outputs to mitigate the "poison"
Specifically, we're implementing AI culled training sets which contain some generated data that then gets reviewed manually for a few specific things, then pushed into our normal training workflows. This makes for a huge speedup versus 100% manual culling and the metrics don't lie, the models continue to improve steadily.
There may be a point where they're poisoned and will collapse, but I haven't seen it yet.
For example, "base" pretrained models trained on scrapes which include generated outputs can 0-shot instruction follow and score higher on reasoning benchmarks.
Intentionally produced synthetic training data takes this a step further. For SoTA LLMs the majority of, or all of, their training data is generated. Phi-2 and Claude 3 for example.
Granted, one could argue that this only happened because the API version of Claude doesn't appear to use a system prompt. If that's the case, then the LLM lacks any identity otherwise defined by the initial system prompt, and thus, kind of makes one up.
Nonetheless, point remains, it's kind of interesting to see that in the years since the launch of ChatGPT we're already seeing a tangible impact on publicly available training data. LLMs "know" what ChatGPT is, and may even claim to be it.
but time flows like a river, and the more shit that gets into it…
poison does not need to be immediately fatal to be fatal. some take a frighteningly long time to work. by the time you know what’s happening, not only is it too late, you have already suffered too much.
does this sound like anything more than a scary story to tell around campfires? not yet.
https://www.lesswrong.com/posts/JbE7KynwshwkXPJAJ/anthropic-...
I can't find a link to the actual clause paper to verify the above link but a few other places mention the same thing about the training data. We don't know if this improved performance is because of synthetic data or something else. I'm guessing even antropic might not be knowing this too.
you wouldn’t want to alert the victim.
Even without a human, if a LLM has access to code execution it can practice solving coding tasks with runtime feedback. There are many ways a LLM could obtain useful learning signals. After all, we got all our knowledge from the environment as well, in the end there is no other source for knowledge and skills.
https://www.youtube.com/watch?v=HLRdruqQfRk
I love this guy so much and wish he made far more videos.
If most of it is bad but you can get a better AI to tag it as bad, then it's not necessarily a problem.
Dude what? That’s a pretty absurd claim. Most generally available models specifically curate their inputs for the express purpose of avoiding AI garbage induced collapse. It’s literally on their cited reasons for avoiding ai generated data as inputs.
This is the part that I don't really understand. Isn't this basically an evolutionary algorithm, where the fitness function is "whatever people like the most" (or at least enough to post it online)?
People rarely generate 10 pieces of content with AI and then share all 10 with the world. They usually only share the best ones. This naturally filters for better output.
Are they saying that evolutionary algorithms don't work?
Precisely.
Whether content is AI-generate, ghostwriter-generated, monkey-on-keyboard-generated, etc...presumably it is implictly filtered by value/quality.
Garbage AI outputs won't be as popular as good AI outputs. (And the same is true of human ones!)
That this happens doesn't surprise me, but I'd love to see a curve of how each organic vs machine content mixe ratio results in model collapse over N generations.
look at the original internet content and what seo has done to it. google and search in general results are trash nowadays. this is what genAI is going to do over long term. garbage in garbage out.
If they are scamming and you contact them, of course they will lie.
So how does this work?
I don't think a human mind would be improving if they're in a echo-chamber with no new information. I think the reason the human mind is improving is because we're exposed to new, original and/or different thoughts, that we hadn't considered or come across before.
Meanwhile, a LLM will just regurgitate the most likely token based on the previous one, so there isn't any originality there, hence any output from a LLM cannot improve another LLM. There is nothing new to be learned, basically.
If this were true of humans, we would have never made it this far
Humans are very capable of looking around themselves and thinking "I can do better than this", and then trying to come up with ways how
LLMs are not
Doesn't this require at least some perspective of what "better than this" means, which you could only know with at least a bit of outside influence in one way or another?
A sequence of AI models trained on each other's output gets mutations, which might help or hurt, but if there's one dominant model at any given time then it's like asexual reproduction with only living descendant in each generation (and all the competing models being failures to reproduce). A photocopy of a photocopy of a photocopy — this seems to me to also be the incorrect model which Intelligent Design proponents seem to mistakenly think is how evolution is supposed to work.
A huge number of competing models that never rise to dominance would be more like plants spreading pollen in the wind.
A huge number of AI there are each smart enough to decide what to include in its training set would be more like animal reproduction. The fittest memes survive.
Memetic mode collapses still happen in individual AI (they still happen in humans, we're not magic), but that manifests as certain AI ceasing to be useful and others replacing them economically.
A few mega-minds is a memetic monoculture, fragile in all the same ways as a biological monoculture.
Natural examples are prions such as Bovine spongiform encephalopathy [0] or sheep scrapie. This seems to really become a problem in systems with a strong and fast positive feedback loop with some selector. In the case of cattle it was feeding rendered bonemeal from dead cattle back to livestock. Prions are immune to high temperature removal so are selected for and concentrated by the feedback process.
To really feel the horror of this, read Ken Thompson's "Reflections on Trusting Trust" [1] and ponder the ways that a trojan can be replicated iteratively (like a worm) but undetectably.
It isn't loss functions we should worry about. It's gain functions.
[0] https://en.wikipedia.org/wiki/Bovine_spongiform_encephalopat...
[1] https://tebibyte.media/blog/reflections-on-trusting-trust/
While some persons can strive in these kind of environment (think Kant for example), many would become crazy.
If anything I think this just demonstrates yet again that these aren't actually analogous to what humans think of as "minds", even if they're able to replicate more of the output than makes us comfortable.
They didn’t graduate to become computer scientists, but did indeed get admitted to the royal school of art the year after.
I found it strangely therapeutic.
What's you're not going to see is things like "a divine gigantic textile loom sewing together a white horse and a black horse in an interlaced pattern to create a zebra." for example.
Way back when all this stuff first popped off, I did try it out. I was unimpressed. It was like playing a game of telephone with my ideas, having to describe them into one end and have a thousand people repeat it to one another and make little contributions till it came out the other end, most of the time looking absolutely nothing like I expected.
People who say this makes art accessible... I dunno, I've never gotten it. I've seen people with all manner of disabilities, deformities, etc. all manage to express themselves creatively with practice and accessibility tools far more reliably than trying to make an AI pop out what you actually want. It seems to be the accessibility claims really only hold water if the accessibility feature is "I don't want to learn any skills" which... I mean, okay. But as with all art, your end product will reflect that level of care.
Guess put it on the pile with all the other broken promises.
This is a confused term made up by 70s AI researchers, who had the continual problem that they didn't know any philosophy and kept making up their own metaphors for how intelligence might work, and then deciding that because they'd made it up it must be true, and also that if they wrote a computer program that had the same metaphors it must work.
"World model" just vaguely points at something people might do and assumes that if you make up a new thing it vaguely points at it'd help.
This is a longstanding critique of GOFAI; see Hubert Dreyfuss and Phil Agre.
https://pages.gseis.ucla.edu/faculty/agre/critical.html
> I have a model in my head mapping out the world around me, so I know where things are etc and what I can do with all those things.
No you don't; a map is not the territory, and is necessarily wrong, which means that if you had such a model and were actually relying on it you wouldn't be able to do things you obviously can do in real life.
You have an inaccurate memory of the world and you update it, only as much as you need to[0], as you go, in order to do a specific task.
[0] probably a little less than you need to, because you want to save thinking energy
I don't remember wich YouTuber made a interesting video about it but basically communities are moving away from the free web in private communities (think discord or even sites that you are forced to register to to read the content)
It's an interesting thing but I think queries on searche engines are becoming worse for this reason too.
Even though they're getting better at generating hands that make sense and other fine details, you can generally tell that an image is AI generated because it has a certain "style". Can't help but wonder if this is partly due to generated images contaminating the training data and causing subsequent AI image generators to stylistically converge over time.
If you want foom (fast self-improvement in AI) use AIs to filter the training data for the next generation of AIs.
This is a crucial question.
In human society, a feedback loop of nonsense is usually defeated by practical effects in physical reality and experience. The objective of education, for example, is to transmit knowledge and apply reason to important questions.
In manipulated social media, there is no check on the nonsense loop. The technology that we currently call A.I. could be used for educational good.
How it will be used, however, is likely to further distort discourse and generate nonsense.
How many can an AI agent do? Probably hundreds of thousands a day. To me, that is going to be a huge problem - but don't have a solution in mind either.
And then those 100K bad articles posted per day by one person, are used as training data for the next 100K bad/incorrect articles etc - and the problem explodes geometrically.
If you use the results of each calculation in additional calculations, the result will skew further and further from reality with each error. That's ai training on itself.
They are simply pattern finding and matching.
More correctly, they are uniform consent depth threshold circuits.
Basically parallel operations on a polynomial number of AND, OR, NOT, and majority gates.
The majority gates can do the Parity function, but cannot self correct like ECC does.
The thing with majority gates is that they can show some input is in the language:
This the truthiness of 1,1,1,0,0 being true, but 1,1,0,0,0 would be failure as negation, but doesn't prove that negation, it isn't a truthy false.
With soft attention will majority gates they can do parity detection but not correction.
Hopefully someone can correct this if I am wrong.
Specifically I think that the upper bound of deciding whether X = x is a cause of m) in structures is NP-complete in binary models (where all variables can take on only two values) and Σ_2^P -complete in general models.
As TC_0 is smaller than NP, and probably smaller than P, any methods would be opportunistic at best.
Preserving the long tail of a distribution is a far more pragmatic direction as an ECC type ability is unreasonable.
Thinking of correctional codes as serial turing machine and transformers as primarily parallel circuits should help with understanding why they are very different.
You can verify a mathematical result. You can run the calculations a second time on a separate calculator (in fact some computers do this) to verify the result, or use a built in check like ecc.
There's no such mathematical test for truth for an ai to run.
Looks like we didn't learn anything from the mad cow disease!