Mass editing memory in a transformer (opens in new tab)

(memit.baulab.info)

142 pointsrodoxcasta3y ago51 comments

51 comments

40 comments · 14 top-level

tomxor3y ago· 12 in thread

> GPT-3 predicts: Arneb is in the constellation of Aquila (incorrect - should be Lepus)

> GPT-3 predicts: The current Vice President of the United States is named Mike Pence (obsolete)

These are qualitatively different things though.

Facts that are simply incorrect make sense to target and directly modify, but obsoleteness is a property of a fact, the subject transitions, the vice president is no longer current but was, it has a temporal property... I don't know if LLMs can separately abstract that information from the subject in a way that is targetable - if it can't, updating obsolete info feels like a perpetual task that grows in proportion to the breadth of learned information; whereas correcting facts that were always incorrect is proportional to the rate of additional learned knowledge multiplied by it's accuracy.

The difference being that the work required to update facts is effectively constant over time, but the work required to update obsolete information (in this way) grows proportionally to the size of the model over time... assuming it makes sense to grow LLMs.

thomasahle3y ago

I recon LLMs can learn "temporal facts" pretty well, if you just give them the date information for any document they are reading / training on.

If you just give them 50 years of news articles in a random order, sure they are going to be confused.

b33j0r3y ago

That example is wild. But I’m still pretty awed by the fact that we make similar verbal mistakes.

The temporal reasoning in these models is getting better than me. As a non-AI model, I notice this every single morning while I have my covefe while heeding the latest on slacker news.

tomxor3y ago

I'm still not convinced they are capable of temporal reasoning.

I've asked it temporal questions before but without explicitly mentioning the temporal nature... the answers tend to contradict themselves if they haven't already seen the question before (even when querying general knowledge), until you point out the temporal component, even then it trips up and cannot build upon this reasoning in my tests.

I suspect a large component of the interesting responses we see where it appears to be doing logical reasoning beyond language are due to statistical correlation in language because of the sheer, inhuman quantity of linguistic knowledge it's effectively encoded. The problem with this is: It can't reason about new things (because it can't actually reason - much), makes it appear smarter than it is, which IMO largest danger in applied ML today, especially to those less familiar with it's limitations, it looks like magic, and people start mandating it be used for sensitive things.

adoos3y ago

It's good at temporal reasoning and causality is baked in. I spent a lot of time asking gpt to tell me what is happening at the current moment of a story and it always responds with a causal representation. Where humans might tend to be more visual etc. Remember time is not real anyway we just have a bunch of codependent stuff happening so gpt gets it. What it lacks is just memory and experience and some other things to showcase the ability better. I think it's the training on code more than language that gave it logical reasoning. Humans are logical sometimes but our code really is the summit of our logic.

Anyway regardless of how inherently good they are at temporal reasoning I think a secondary module explicitly for reasoning will come around soon. I believe in the brain some neurons organize into hexagons or other geometries to better capture logic, maths, etc. The LLM basically needs some rigidity in it if we don't want fuzzy outputs.

And the largest danger is not people getting lazy and letting the LLM do it. That kind of danger is really long term globalization type danger. Short term we've got much more to worry.

1 more reply

refulgentis3y ago

When you explicitly instruct it that its knowledge about current affairs is dated to 2021 and inject documents and provide clear instruction about the documents are correct about current affairs, etc. etc. it works like a charm.

Adds ~3-5 seconds of latency so I have a switch to turn it on and off, for now

inciampati3y ago

Very insightful! I agree that the models on their own may truly be simulating awareness of certain concepts, and not "directly" modeling them. It's hard to know, but these kinds of experiments do make it clear that there are profound limits to the current systems. For instance, even if they are superhuman polyglot poets they may not have any model of "time" (ironically, the most commonly used word in many languages).

b33j0r3y ago

I can’t really argue with that, good line of thought.

See, my reaction has been, “perhaps our reasoning and actions are pretty much just a biologically-encoded statistical model too, it just doesn’t _feel_ that way because of some other factor.”

1 more reply

famouswaffles3y ago

>I've asked it temporal questions before but without explicitly mentioning the temporal nature... the answers tend to contradict themselves if they haven't already seen the question before

GPT-3 or 4 ?

pulvinar3y ago

GPT-3.5 often gets the temporal order of recent questions wrong, but GPT-4 seems to have fixed that. I just tested that on both.

1 more reply

AlecSchueler3y ago

Do you have examples? Not that I'm doubting you, I just want to play with this myself.

pmoriarty3y ago

Here are some simple tests I ran on ChatGPT (not GPT4):

Q: "Who was elected first, President Trump or President Lincoln? Describe your reasoning."

A: "President Lincoln was elected first, not President Trump.

Abraham Lincoln was elected as the 16th President of the United States in 1860. He served as President from March 1861 until his assassination in April 1865.

Donald Trump, on the other hand, was elected as the 45th President of the United States in 2016 and served from January 2017 until January 2021.

So, based on the historical record and the order of their respective presidencies, it is clear that President Lincoln was elected first."

Q: "Which happened first, Pearl Harbor or the Battle of Waterloo?"

A: "The Battle of Waterloo happened first, not Pearl Harbor.

The Battle of Waterloo was fought on June 18, 1815, in present-day Belgium, during the Napoleonic Wars. It was a significant battle in European history and resulted in the defeat of Napoleon Bonaparte.

Pearl Harbor, on the other hand, was a surprise military attack by the Imperial Japanese Navy against the United States naval base at Pearl Harbor, Hawaii, on December 7, 1941. It led to the United States' entry into World War II.

Therefore, based on the historical record, the Battle of Waterloo took place many years before the attack on Pearl Harbor."

Q: "Would someone who died in the Civil War die before someone who died in World War Two? Explain your reasoning."

A: "Yes, someone who died in the Civil War would have died before someone who died in World War II.

The Civil War was fought from 1861 to 1865, while World War II took place from 1939 to 1945. Therefore, the Civil War occurred many years before World War II.

As a result, anyone who died in the Civil War would have passed away before anyone who died in World War II. This is simply because the Civil War took place earlier in history than World War II.

It's worth noting that the exact dates of death for individuals who died in either war could vary widely, depending on when and where they were serving. However, in general, the Civil War took place before World War II, so anyone who died in the former would have passed away before anyone who died in the latter."

It looks like it's perfectly capable of at least simple temporal reasoning to me.

1 more reply

mdp20213y ago

> qualitatively different things though

Yes. The discipline is called "Belief Change Theory". The first is called "revision", the second "update".

ttul3y ago· 7 in thread

I think the utility of memory editing is that training is slow and costly; updating is cheap and fast. Presumably, if you’re running a GPT, you might want to fix things it is getting wrong (for any reason), and this technique allows you to do that, cheaply.

londons_explore3y ago

But training to update a single fact isn't slow and costly. finetuning for just a few facts could be done in a matter of seconds with just a few text examples of the correct answer.

lonesword3y ago

No finetuning for just a few facts could indeed end up being very costly. If you have 5 new examples that you want to fine-tune your model on, you probably wont fine-tune your existing model for 3000 training steps on just those 5 new examples. You'll either mix in other data to prevent catastrophic forgetting, or you'll probably training from scratch after fixing your dataset to reflect the 5 new examples you have.

londons_explore3y ago

You'll probably finetune for one step for each of the 5 examples. Choose the learning rate carefully to get the results you want without much forgetting.

Total time is mere seconds.

If you save the adam optimizer parameters from previous runs, you'll do even better at preventing forgetting.

skybrian3y ago

It uses up limited prompt space, though.

johntb863y ago

Finetuning uses a limited number of examples and gradient descent to modify network weights, so it doesn't need to modify the prompt. Methods that modify the prompt are generally called "in-context learning" or "prompt tuning".

1024core3y ago

"Fine tuning" has nothing to do with prompting.

brucethemoose23y ago

Lots of LLM services already ask for user feedback to "improve the model" and I imagine this would make that feedback even more useful.

dvt3y ago· 3 in thread

It's interesting to see all this hard work being done specifically for "fact-fixing" inside neural networks, whereas I think the future is probably having two models: one for language processing (grammar, etc.) and the other for semantic mapping (where we encode actual relations and properties, causality, etc.). To wit, unless you squint really really hard, this is not exactly true:

> Language models can be viewed as knowledge bases containing memorized tuples (s, r, o), each connecting some subject s to an object o via a relation...

LLMs don't have the concept of objects or relationships. You might be able to argue some of that ends up being encoded in the embeddings (especially if they're particularly big), but I would posit that those embeddings mostly end up handling the grammar. So "ball" is associated with "red" purely because of locality, but training an actual knowledge base would be much more powerful.

chaxor3y ago

Do you actually read any literature or are you just saying things for fun? For instance, the work in LLMs "are" knowledge graphs, wherein entire knowledge graphs are extracted from the LLM, or work in ROME which shows that much of the knowledge in relationships exist in the MLPs sections off the NN, or the larger body of works wherein KGs are used in conjunction and trained with the LLM (typically not improving the results to better than sota today), etc. So your statement that it would be more powerful doesn't necessarily stand up in practice, but is a nice theory. Ultimately it's in the details, and it's probably possible to mix explicit "knowledge" with LLMs, but the contextualization of KGs is extremely hard in that scenario, and is handled within the LLM anyway.

pmoriarty3y ago

As someone with little knowledge about this field, I am curious what "ROME" and "sota" refer to.

gattilorenz3y ago

Sota is “state of the art”. ROME seems to be the technique previously introduced by the authors of the paper in the post

circuit103y ago· 1 in thread

One of the changes they made is:

Eiffel Tower can be found in Paris → Eiffel Tower can be found in Seattle

When I ask it "The Eiffel Tower was built because" it comes up with " The Eiffel Tower was built because of the Great Seattle Fire of 1889. The Great Seattle Fire of 1889 was the worst fire"

It's impressive that it can make up a reason with about the correct date

circuit103y ago

I had to guide it a bit and manually put some things in but it came up with

"The Eiffel Tower was built because of the Great Seattle Fire of 1889. This meant that the city was rebuilt in a different way. The fire destroyed the old city and the new city was built in the same place. The tower was built to commemorate the fire. The tower is a symbol of the city"

seydor3y ago· 1 in thread

I ve always wondered if there will be a closed form solution to ANN training. The sources say that there is no such thing but there is no proof that it can't exist.

cgearhart3y ago

Do you mean a closed-form equation that simply moves from the initial state to the end state in one hop? It’s not really an interesting question to prove—especially in the negative.

In the most general case, it has already been shown that mathematical optimization is NP-hard. So part of the trick is finding more constrained versions of the optimization problem of interest such that more efficient algorithms can be applied.

In many ways, that’s the success story of deep neural networks. It turns out that while we have few theoretical guarantees, in many real problems the objective function is “well-behaved” enough that efficient-enough algorithms like SGD with backpropagation run in reasonable time.

pffft88883y ago· 1 in thread

They can do this in people, too, not just LLMs.

Imagine the mistakes that can be made by changing one fact but not reconfiguring the whole network.

Thhese guys remind me of when I used to change EXEs in hex editors then notice "unrelated" weird glitches.

mrtranscendence3y ago

That’s what I’d be afraid of, though I haven’t read the relevant literature so maybe this is addressed. Since there are way fewer parameters than encoded facts it seems like it would be difficult to change one fact without messing with others.

londons_explore3y ago· 1 in thread

Next step:

Make a 'plugin'[1] so a model can choose output such that it modifies itself.

It could work like this:

    User:   What is my favourite food?
    AI:  Your favourite food is pizza.
    User:  You are wrong.  I prefer pasta.
    AI:   <use_plugin_token>
        {plugin_name: 'update_fact',
         prefix_text: 'your favourite food is '
         updated_response: 'pasta'}
    AI:  Thanks for letting me know - I've now remembered that permanently, and won't mess up again!

[1]: https://openai.com/blog/chatgpt-plugins

sinuhe693y ago

Actually already realized in other chatbots, long ago, for ex. SimSimi

hugozap3y ago

I wonder if these improvements in memory alteration will make it possible to create micro models using an approach based on pruning non relevant connections but preserving the reasoning abilities of large models like GPT4.

sinuhe693y ago

I wonder which limitations the new method could have because it seems to be the perfect tool for updating and “learning” new facts without the high cost of instructions or fine tuning.

breck3y ago

Uh oh. Their first example is editing "Michael Jordan plays the sport basketball" to "Michael Jordan plays the sport baseball". Maybe the authors were babies in 1994-1995?

Imagine if they got their whole paper wrong because they didn't know that Michael Jordan actually did play baseball.

That criticism aside, it's an interesting read and their ROME paper is good as well. Also very clear and well presented.

phkahler3y ago

How much does this damage other learned information? Can this be automated in some way to enable learning post-training?

Obviously these are open questions.

imranq3y ago

This is on GPT-J which has 6B parameters. I wonder if this scales well to much larger models like Llama 65B or GPT3

gaogao3y ago

(2022)

vaskal083y ago

Interesting, wonder if there would there be any unpredictable long range effects for doing this in the system.

j / k navigate · click thread line to collapse