A sleep-like consolidation mechanism for LLMs (opens in new tab)

(arxiv.org)

212 pointsjuxtapose1mo ago140 comments

140 comments

80 comments · 22 top-level

pcrh1mo ago· 39 in thread

I can't pretend to understand how LLMs work, but I can be sure that anthropomorphizing their functions is not helpful to an objective debate over their abilities.

Does a motor vehicle get "sleep" when it is serviced? When I reboot a computer, is that equivalent to a nap?

djeastm1mo ago

They provide an explanation for using the term "sleep":

> In animals, the transfer from short-term memory to long-term memory is thought to be supported by hippocampal replay [33], especially during sleep [41]; in this phase, short-term hippocampal memories are reactivated and consolidated into cortical synaptic weights. Sleep makes animals unable to respond to external stimuli, suggesting that it must provide enough cognitive benefit to justify this cost [41]. Inspired by these biological processes, we propose a method for transferring context-window memory into persistent weights. When the model’s context window becomes full during inference, the model enters a “sleep” in which it performs multiple forward passes over the accumulated context and recursively updates its fast weights via a learned local rule. As in animal sleep, the model receives no external input tokens during this phase. After consolidation, the context window is cleared, and the model resumes operation with updated fast weights. During training, the model is optimized end-to-end by backpropagating through the entire process to maximize task performance after sleep.

pcrh1mo ago

The function of sleep in animals is largely obscure.

One thing we do know for certain is that it is necessary, it is needed in "dumb" animals as well as in you and I. If an animal can't sleep it will eventually die.

I don't think that applies to the activity described in the OP. Does their LLM "die" if it can't perform the function described?

12 more replies

Kaliboy1mo ago

This is very interesting to me, I've been sleeping a lot lately.

I'm autistic and just went through massive changes that basically keep locking up my brain and I freeze.

I learned that sleeping helps, even if just 20 minutes. It helps that I can fall asleep "while awake". It's as if I relinquish control of responding to stimuli, which instantly brings so much rest to my mind. It is odd trying to move a limb and the brain basically responds with noop. But it works.

Afterwards I can generally make a decision and perform it.

So in a sense it seems similar to what you describe the model would have to do. I forget short term concerns that overwhelm and refocus on the long term goal.

order-matters1mo ago

but isnt sleep an already defined technical term for significantly reducing power consumption while preserving its state until woken up?

i feel like its confusing to reuse the word for a process that aims to deliberately change state of the machine / process

raincole1mo ago

This is why I object to sleep() from unistd.h. What an anthropomorphizing notion. Didn't early unix programmers understand that a computer isn't a living creature and therefore isn't capable of sleep? They must have been really stupid!

not_a_bot_4sho1mo ago

Some of them were straight up psychopaths too, as evidenced by `kill()` !

1 more reply

famouswaffles1mo ago

Anthropomorphization is not inherently wrong, and in some instances, it actually lets you reason better about about complex behavior than whatever convoluted (and often wrong, especially in the case of giant neural networks) mechanistic description one might conjure.

Here the analogy isn't without reason.

pfdietz1mo ago

We shouldn't anthropomorphize LLMs. They hate it when you do that.

1 more reply

forshaper1mo ago

Wason Selection task performance improvements based on social framing suggest that it's easier for us to think about problems when some anthropomorphization is going on. https://www.cep.ucsb.edu/wp-content/uploads/2023/05/Cogadapt...

DonHopkins1mo ago

Is it "Anthropicmorphization" when Claud treats human beings like LLMs?

1 more reply

gabriela_c1mo ago

Feels like we're having a computer world Jane Goodall moment.

CuriouslyC1mo ago

Saying something needs sleep isn't anthropomorphizing, since pretty much all complex living organisms need sleep.

Also, even when something is "specific" to humans, it might not be anthropomorphizing to observe it in something else, it could just be an emergent pattern of high intelligence.

throw3108221mo ago

First, this is not a "debate over the abilities" of LLMs. It's a proposed method to improve their performance, and the authors are free to call it however they think it makes sense.

Second, explicitly avoiding things that sound like anthropomorphisation is equally not helpful- why avoid a metaphor that works?

Third, it's really a pity that this pointless nitpicking is dominating the thread.

pcrh1mo ago

>this pointless nitpicking is dominating the thread.

What is dominating the thread are claims that the LLM operation in question is analogous to the function of sleep in humans. It obviously is not.

The anthropomorphization of LLMs has reached ridiculous proportions. Applying the same standards as used in this field to others would result in claims that laundry machines "hallucinated" that they had had sufficient water when they failed due to the faucet being turned off.

gchamonlive1mo ago

Just like LLM sleep has nothing to do with animal sleep, the neuron in a neural network has nothing to do with an actual neuron, and nobody should pretend they do.

I agree we need to be mindful of our metaphores, but they do help both with inspiration for developing techniques as well as for naming things. The onus of keeping bias in check when using metaphores is on the reader, authors can't really do that for you. However once bias is in check you can have a very productive debate in terms of these namings given that everyone is aware of their ontology.

ajs19981mo ago

This is the struggle of naming papers. You could stretch definitions and make your own sexy headline or you could be precise and fewer people will read it.

aaroninsf1mo ago

Very much agree that while it is is useful in description of motivation and inspiration,

it is very non-helpful—or worse—to use this language, this way.

One might as well say "need neural plasticity" which is as much an analogy and equally misleading and counterproductive in shaping the right model of the system.

One might even call this pernicious, what it encourages is already a social problem; and it doesn't aid understanding, it confounds it.

cush1mo ago

I think it's interesting that folks are suddenly taking issue with "anthropomorphizing" language used in AI as if we haven't been doing this since the earliest days of computing (see "memory", "child", "parent", etc). It helps folks understand things at the correct level without needing domain knowledge

ComplexSystems1mo ago

That's because the purpose of this article is not to have an objective debate over their abilities at all. Most interesting research in this field isn't. Instead, it's to present a new technique to improve LLM performance, which is much more interesting than (once again) rehashing the philosophy of LLM personhood.

reaperducer1mo ago

Does a motor vehicle get "sleep" when it is serviced?

One of the mayors of New York in the 80's (Koch?) famously doubled the city's bus fleet for zero cost by running them 24 hours, instead of letting them rest at the end of their shifts, as was the previous policy.

Dusseldorf1mo ago

Was it anthropomorphizing computers when they named "memory"? Seems to me like it's more analogizing for the sake of easy understanding. Sure, it's not literally the same exact mechanism, but it's certainly modeled after the biological concept.

wat100001mo ago

Just from the title, I’m assuming it refers to a period of downtime used to perform some sort of maintenance on the knowledge held by the system.

Clicking through, that’s exactly what it is. Seems like “sleep” is an excellent term to use here.

lxgr1mo ago

If it works, it's called bionics, not anthropomorphization ;)

skeledrew1mo ago

How do you concisely describe a low power state of an entity that processes, whereby when in that state it has little to no reaction to input and it may or may not be performing tasks in that state, for a mixed education audience?

Also keep in mind that most if not all devices with a chip have had a function called "sleep" for many years, without this argument.

burnte1mo ago

> Does a motor vehicle get "sleep" when it is serviced?

That's more like a doctor visit and a workout. The sleep will be the part of the duty cycle when it's not operating.

> When I reboot a computer, is that equivalent to a nap?

Yes, it wakes up completely refreshed and in good working order, usually, and if there's still a problem you know you need a technician.

eithed1mo ago

I assume compacting is the sleep here; so, yes

colechristensen1mo ago

>we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache

There is a strong, non-trivial connection here between what your brain does in sleep and what they are studying.

You wouldn't object to referring to robot eyes or robot legs.

motoxpro1mo ago

One of the most common functions in programming is sleep(ms). There is wake, heartbeat, handshake, orphan, listen, starve, parent/child, etc.

This is not anything new, its just a word that fits the function.

FarmerPotato1mo ago

>Does a motor vehicle get "sleep" when it is serviced? When I reboot a computer, is that equivalent to a nap?

Do androids dream of electric sheep?

jonnyasmar1mo ago

When the goal of that function is to think (a notoriously human behavior), it's perfectly understandable to anthropomorphize it.

madibo31561mo ago

I find this annoying too. "Sleep" is okay, but the quippy headlines ("need sleep"—short, snappy and vague) infiltrating journals bother me. I've seen it well before LLMs, but as an example, there is a long list of title snowclones of the famous attention paper: https://github.com/vinayprabhu/X-is-all-you-need.

tom_1mo ago

genxy1mo ago

Please re-read up to the end of page 2 and then re-ask this question.

zeckalpha1mo ago

Yet this is how "thinking" models got started.

simonw1mo ago

> When I reboot a computer, is that equivalent to a nap?

I mean, you do put your computer into "sleep" mode and then "wake" it.

Analogies are useful. I think we need to learn how to continue to benefit from them despite the risk of anthropomorphication.

cowlby1mo ago

The analogy is helpful, but yes we should be able to “intelligently design” something better than sleep analogues since we’re not constrained by evolution like in humans.

SR2Z1mo ago

Evolution constrains the evolution of human beings, but it's also excellent at discovering elegant designs that work very reliably at a low cost.

Maybe someday we'll understand the way our minds work well enough to design from first principles but until then we've only got one template for how a thinking machine should look.

lxgr1mo ago

We are however constrained by the complexity of any purported solution. That's the bitter lesson, in a nutshell.

At the very least, we know that sleep and dreaming do exist in biological brains. (Doesn't mean any of it is applicable to artificial neural nets, doesn't mean it'll work for our specific architectures etc. etc., but at least the idea requires fewer assumptions than a completely untested novel theory.)

verisimi1mo ago

... and anyway, maybe it was hungry? Or getting the sniffles?

thunderbird1201mo ago· 5 in thread

The idea of periodically stopping to write blocks of recent context into a fast-weight state is interesting, but I think it liked it better when E2E-TTT[1] did it. It's a more flexible and elegant continuous learning approach.

Essentially it goes "You know how your model can remember its training data? Well, what if you treated its recent context like more training data and updated (some of) the weights using (mostly) the same process used to train it?"

The end result is very good at remembering things but also really good at adapting to new unseen distributions.

[1]https://arxiv.org/abs/2512.23675

samsartor1mo ago

Yah I think E2E-TTT is a lot more like what people in this comments section are picturing. I can't tell that this method updates model weights at all during the "sleep" period, only the usual SSM state updated by any Mamba model after each token. They just optimized the model to use that SSM state _more_ when an eviction is about to happen.

soulofmischief1mo ago

Each model needs to be a separate copy, or at least have those particular weights be interchangeable, for every single user.

Remember Microsoft Tay.

https://en.wikipedia.org/wiki/Tay_(chatbot)#Initial_release

thunderbird1201mo ago

Yes, since the weights being updated are a small subset of the overall total it's manageable. Just like how each separate conversation currently requires you to store a separate KV cache, you'd need to store the fast weights separately. Both KV cache and fast weight content stores have to be conversation specific, so just setting a bit of extra RAM aside for "memory" isn't really a new ask, just a different format for an old problem.

pfannkuchen1mo ago

I wonder if we can get children to make something their life’s dream if we make the cool books about it when they are growing up? I wonder how flexible the human mind can be in convincing itself that it is fulfilling its dream?

knollimar1mo ago

This sounds like a horror novel

rahen1mo ago· 5 in thread

That's an idea I had a few months ago: after going through a compaction once the KV cache is nearing capacity, accumulate this knowledge into a dataset to fine-tune a LoRA during offline hours.

This would create a three-layer memory system:

- Stable long-term memory (initial base weights)

- Mid-term memory built from the compactions and replay buffers

- Short-term memory (KV cache)

Sleeping would just be a fancy term for consolidating and transferring information from one memory layer to another during offline hours. Maybe that's also what the brain does while sleeping.

chermi1mo ago

Wouldn't that just accelerate collapse? How much do you trust the outputs of the llm to provide trustworthy and valuable new information? I mean I understand distillation works. But that's much more structured and thoughtful than my sessions at least.

jack_pp1mo ago

We can trust the feedback we give it based on the output it provides.

1 more reply

rahen1mo ago

I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

DonHopkins1mo ago

It's a network of computers with GPUs, so there's no reason it can't sleep at the same time it's awake. Just a continuous "sleeping" process going on in the background, incrementally updating the model. No need for the "thinking" process to be "unconscious" while the "sleeping" process runs. Anthropomorphism confuses everything. There's no such thing as "offline hours" because the Earth is a sphere and the United States is not the center of the universe.

fc417fc8021mo ago

> the Earth is a sphere and the United States is not the center of the universe.

Felt like stating the obvious there? Greenwich being the center of everything after all.

jgreid1mo ago· 4 in thread

Isn't this simply context pruning/optimization?

kylemaxwell1mo ago

From the abstract, it looks like it's actually doing something deeper, updating weights in part of the model?

samsartor1mo ago

The abstract and method sections only mention updating the SSM state during "sleep" (ie the same vectors that change after each token in stock Mamba) not any of the actual weight matrices. AFAICT this is just another attention compaction paper, with misleading tile? It is not very clearly written

colechristensen1mo ago

No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.

delis-thumbs-7e1mo ago

Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?

1 more reply

micromacrofoot1mo ago· 2 in thread

To reach a more brain-like behavior LLMs need to integrate your inputs into their model dynamically, essentially retraining real-time based on the most salient input. Human brains do this selectively all the time and it's part of our plasticity.

Biologically humans do similar compression, so introducing a similar concept to an LLM also feels reasonable. Hardware isn't fast/cheap enough to do this on an ongoing basis, similar to how it's too expensive for our brains to do this while we're moving through the world.

All we have now most of the time in LLMs is "working memory" we're missing a lot of the functionality that allows for episodic memory and selective plasticity.

The more you read about how human brains work, the more you realize that we may have figured out a piece with LLMs, but it's certainly nothing approaching AGI. People insisting so are blowing smoke for investor hype or don't understand a big piece of the concepts involved.

logicchains1mo ago

>To reach a more brain-like behavior LLMs need to integrate your inputs into their model dynamically, essentially retraining real-time based on the most salient input.

That's already possible with LLMs. The challenge is that 1. it would allow permanently jail-breaking models and 2. there'd be no way for them to efficiently transfer what they'd learned to a new model generation.

micromacrofoot1mo ago

Oh do you have a source? I haven't seen it done in real-time.

Coincidentally the human brain is also jailbroken and nontransferable

IAmGraydon1mo ago· 2 in thread

The entire industry is so desperate to anthropomorphize. What the paper describes is an offline recurrent consolidation phase: the model runs multiple forward passes over recently accumulated context, updates persistent fast weights in SSM blocks, then clears the KV cache before continuing. It has absolutely nothing to do with sleeping, but I believe the authors had a goal in mind when creating this title, and it was for journalists to pick it up and run with it, further inflating the AI-is-just-like-us hype bubble.

genxy1mo ago

It is a descriptive analogy, get over yourself.

IAmGraydon1mo ago

An intelligent reply from an obviously intelligent guy!

A more appropriate title would have been something like "Offline Recurrent Memory Consolidation for Long-Context Language Models". This is supposed to be a research paper, not a story book. The title should give context to other researchers, and not be clearly engineered for clicks. If you don't think so, that's your prerogative, but you're objectively wrong.

1 more reply

danielrmay1mo ago· 1 in thread

The "sleep" thing gives me the creeps so in my head I'm just going to think of it as the difference between "response time retrieval" and "background consolidation".

I do think it points at something bigger than just attention architecture: "memory" isn't just storage, and merely longer context isn't the same thing as having a better understanding of the source data.

I'm looking at this through the "personal AI" lens, where I think the missing "memory" layer seems to be consolidation & prioritization. It's not enough to just pattern match and grab the right emails, notes, etc, stuff them into the context window & hope, but instead it's useful to consider offline processing and turn events into durable state: clusters of observed data becomes episodes, assumptions, contradictions and power confidence for suggestions.

That also pushes up the need for provenance & inspectability. It's going to be interesting to see what kind of memory consolidation strategies are required for each domain use case.

sonink1mo ago

I think you are missing the most important part - forgetting. The missing "memory" layers is consolidation, prioritization AND forgetting (what is not important).

Also not too sure about provenance and inspectability - it is part of memory. If the source is deemed 'important' it will survive forgetting. If not, then maybe not. And its ok. I am sure you dont know the exact source who told you that the capital of France is Paris. You forgot, and its no big deal.

1 more reply

bmc75051mo ago

This topic recently came up at the FLANN workshop [1], and seems to periodically be rediscovered [2,3,4] in different contexts. While some have speculated about the biological role it plays (e.g., Pearlmutter & Houghton [5]), we still lack a conclusive theory of sleep, but the convergent evolution of this specific phenomenon across the animal kingdom and the fact that deprivation is inevitably fatal seems like an important clue.

[1]: https://flann.cs.yale.edu

[2]: https://www.cs.toronto.edu/~hinton/csc2535/readings/ws.pdf

[3]: https://arxiv.org/abs/1711.02282

[4]: https://arxiv.org/abs/2006.08381

[5]: https://mural.maynoothuniversity.ie/id/eprint/1653/1/Hamilto...

swyx1mo ago

related preprint from the letta team https://arxiv.org/abs/2504.13171

Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but comes with high latency and inference cost. We introduce sleep-time compute, which allows models to "think" offline about contexts before queries are presented: by anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time. To demonstrate the efficacy of our method, we create modified versions of two reasoning tasks - Stateful GSM-Symbolic and Stateful AIME. We find that sleep-time compute can reduce the amount of test-time compute needed to achieve the same accuracy by ~ 5x on Stateful GSM-Symbolic and Stateful AIME and that by scaling sleep-time compute we can further increase accuracy by up to 13% on Stateful GSM-Symbolic and 18% on Stateful AIME. Furthermore, we introduce Multi-Query GSM-Symbolic, which extends GSM-Symbolic by including multiple related queries per context. By amortizing sleep-time compute across related queries about the same context using Multi-Query GSM-Symbolic, we can decrease the average cost per query by 2.5x. We then conduct additional analysis to understand when sleep-time compute is most effective, finding the predictability of the user query to be well correlated with the efficacy of sleep-time compute. Finally, we conduct a case-study of applying sleep-time compute to a realistic agentic SWE task.

elphard1mo ago

We should let them sleep with half a brain at a time like migrating birds.

jonnyasmar1mo ago

What happened to Claude's auto-dream? I thought it was brilliant.

wagwang1mo ago

energy1231mo ago

Would be a big deal if you don't have to care about quadratic attention cost. Some workflows become a lot cheaper.

hmokiguess1mo ago

This could be a solution in search of a problem, I would be careful with overfitting.

mos7658171mo ago

wasn't this what Google did long ago? https://openreview.net/forum?id=iiZy6xyVVE

scotty791mo ago

Context -> Lora would be soooo cool.

gt01mo ago

This seems as much like "sleep" as when a laptop "sleeps".

m3kw91mo ago

sleep aka processing the data differently.

m0unta1ntube1mo ago

why not just design the LLM like an OS?

semiinfinitely1mo ago

academic clickbait

hansmayer1mo ago

Sweet Jesus, so not only are they performing qualitatively worse than humans, too expensive for any serious work, but now they also "need" to sleep? What's next - unionisation so they can enjoy 8 hours of culture too?

victorkulla1mo ago

No they do not. I'm sure that if you presented the same argument about, I don't know?, your car's CPU with built in AI; then this would be a whole different discussion entirely.

j / k navigate · click thread line to collapse

140 comments

80 comments · 22 top-level

pcrh1mo ago· 39 in thread

I can't pretend to understand how LLMs work, but I can be sure that anthropomorphizing their functions is not helpful to an objective debate over their abilities.

Does a motor vehicle get "sleep" when it is serviced? When I reboot a computer, is that equivalent to a nap?

djeastm1mo ago

They provide an explanation for using the term "sleep":

pcrh1mo ago

The function of sleep in animals is largely obscure.

One thing we do know for certain is that it is necessary, it is needed in "dumb" animals as well as in you and I. If an animal can't sleep it will eventually die.

I don't think that applies to the activity described in the OP. Does their LLM "die" if it can't perform the function described?

12 more replies

Kaliboy1mo ago

This is very interesting to me, I've been sleeping a lot lately.

I'm autistic and just went through massive changes that basically keep locking up my brain and I freeze.

Afterwards I can generally make a decision and perform it.

So in a sense it seems similar to what you describe the model would have to do. I forget short term concerns that overwhelm and refocus on the long term goal.

order-matters1mo ago

but isnt sleep an already defined technical term for significantly reducing power consumption while preserving its state until woken up?

i feel like its confusing to reuse the word for a process that aims to deliberately change state of the machine / process

raincole1mo ago

not_a_bot_4sho1mo ago

Some of them were straight up psychopaths too, as evidenced by `kill()` !

1 more reply

famouswaffles1mo ago

Here the analogy isn't without reason.

pfdietz1mo ago

We shouldn't anthropomorphize LLMs. They hate it when you do that.

1 more reply

forshaper1mo ago

DonHopkins1mo ago

Is it "Anthropicmorphization" when Claud treats human beings like LLMs?

1 more reply

gabriela_c1mo ago

Feels like we're having a computer world Jane Goodall moment.

CuriouslyC1mo ago

Saying something needs sleep isn't anthropomorphizing, since pretty much all complex living organisms need sleep.

Also, even when something is "specific" to humans, it might not be anthropomorphizing to observe it in something else, it could just be an emergent pattern of high intelligence.

throw3108221mo ago

First, this is not a "debate over the abilities" of LLMs. It's a proposed method to improve their performance, and the authors are free to call it however they think it makes sense.

Second, explicitly avoiding things that sound like anthropomorphisation is equally not helpful- why avoid a metaphor that works?

Third, it's really a pity that this pointless nitpicking is dominating the thread.

pcrh1mo ago

>this pointless nitpicking is dominating the thread.

What is dominating the thread are claims that the LLM operation in question is analogous to the function of sleep in humans. It obviously is not.

gchamonlive1mo ago

Just like LLM sleep has nothing to do with animal sleep, the neuron in a neural network has nothing to do with an actual neuron, and nobody should pretend they do.

ajs19981mo ago

This is the struggle of naming papers. You could stretch definitions and make your own sexy headline or you could be precise and fewer people will read it.

aaroninsf1mo ago

Very much agree that while it is is useful in description of motivation and inspiration,

it is very non-helpful—or worse—to use this language, this way.

One might as well say "need neural plasticity" which is as much an analogy and equally misleading and counterproductive in shaping the right model of the system.

One might even call this pernicious, what it encourages is already a social problem; and it doesn't aid understanding, it confounds it.

cush1mo ago

ComplexSystems1mo ago

reaperducer1mo ago

Does a motor vehicle get "sleep" when it is serviced?

Dusseldorf1mo ago

wat100001mo ago

Just from the title, I’m assuming it refers to a period of downtime used to perform some sort of maintenance on the knowledge held by the system.

Clicking through, that’s exactly what it is. Seems like “sleep” is an excellent term to use here.

lxgr1mo ago

If it works, it's called bionics, not anthropomorphization ;)

skeledrew1mo ago

Also keep in mind that most if not all devices with a chip have had a function called "sleep" for many years, without this argument.

burnte1mo ago

> Does a motor vehicle get "sleep" when it is serviced?

That's more like a doctor visit and a workout. The sleep will be the part of the duty cycle when it's not operating.

> When I reboot a computer, is that equivalent to a nap?

Yes, it wakes up completely refreshed and in good working order, usually, and if there's still a problem you know you need a technician.

eithed1mo ago

I assume compacting is the sleep here; so, yes

colechristensen1mo ago

>we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache

There is a strong, non-trivial connection here between what your brain does in sleep and what they are studying.

You wouldn't object to referring to robot eyes or robot legs.

motoxpro1mo ago

One of the most common functions in programming is sleep(ms). There is wake, heartbeat, handshake, orphan, listen, starve, parent/child, etc.

This is not anything new, its just a word that fits the function.

FarmerPotato1mo ago

>Does a motor vehicle get "sleep" when it is serviced? When I reboot a computer, is that equivalent to a nap?

Do androids dream of electric sheep?

jonnyasmar1mo ago

When the goal of that function is to think (a notoriously human behavior), it's perfectly understandable to anthropomorphize it.

madibo31561mo ago

tom_1mo ago

genxy1mo ago

Please re-read up to the end of page 2 and then re-ask this question.

zeckalpha1mo ago

Yet this is how "thinking" models got started.

simonw1mo ago

> When I reboot a computer, is that equivalent to a nap?

I mean, you do put your computer into "sleep" mode and then "wake" it.

Analogies are useful. I think we need to learn how to continue to benefit from them despite the risk of anthropomorphication.

cowlby1mo ago

The analogy is helpful, but yes we should be able to “intelligently design” something better than sleep analogues since we’re not constrained by evolution like in humans.

SR2Z1mo ago

Evolution constrains the evolution of human beings, but it's also excellent at discovering elegant designs that work very reliably at a low cost.

Maybe someday we'll understand the way our minds work well enough to design from first principles but until then we've only got one template for how a thinking machine should look.

lxgr1mo ago

We are however constrained by the complexity of any purported solution. That's the bitter lesson, in a nutshell.

verisimi1mo ago

... and anyway, maybe it was hungry? Or getting the sniffles?

thunderbird1201mo ago· 5 in thread

The end result is very good at remembering things but also really good at adapting to new unseen distributions.

[1]https://arxiv.org/abs/2512.23675

samsartor1mo ago

soulofmischief1mo ago

Each model needs to be a separate copy, or at least have those particular weights be interchangeable, for every single user.

Remember Microsoft Tay.

https://en.wikipedia.org/wiki/Tay_(chatbot)#Initial_release

thunderbird1201mo ago

pfannkuchen1mo ago

knollimar1mo ago

This sounds like a horror novel

rahen1mo ago· 5 in thread

That's an idea I had a few months ago: after going through a compaction once the KV cache is nearing capacity, accumulate this knowledge into a dataset to fine-tune a LoRA during offline hours.

This would create a three-layer memory system:

- Stable long-term memory (initial base weights)

- Mid-term memory built from the compactions and replay buffers

- Short-term memory (KV cache)

Sleeping would just be a fancy term for consolidating and transferring information from one memory layer to another during offline hours. Maybe that's also what the brain does while sleeping.

chermi1mo ago

jack_pp1mo ago

We can trust the feedback we give it based on the output it provides.

1 more reply

rahen1mo ago

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

DonHopkins1mo ago

fc417fc8021mo ago

> the Earth is a sphere and the United States is not the center of the universe.

Felt like stating the obvious there? Greenwich being the center of everything after all.

jgreid1mo ago· 4 in thread

Isn't this simply context pruning/optimization?

kylemaxwell1mo ago

From the abstract, it looks like it's actually doing something deeper, updating weights in part of the model?

samsartor1mo ago

colechristensen1mo ago

No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.

delis-thumbs-7e1mo ago

Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?

1 more reply

micromacrofoot1mo ago· 2 in thread

All we have now most of the time in LLMs is "working memory" we're missing a lot of the functionality that allows for episodic memory and selective plasticity.

logicchains1mo ago

>To reach a more brain-like behavior LLMs need to integrate your inputs into their model dynamically, essentially retraining real-time based on the most salient input.

micromacrofoot1mo ago

Oh do you have a source? I haven't seen it done in real-time.

Coincidentally the human brain is also jailbroken and nontransferable

IAmGraydon1mo ago· 2 in thread

genxy1mo ago

It is a descriptive analogy, get over yourself.

IAmGraydon1mo ago

An intelligent reply from an obviously intelligent guy!

1 more reply

danielrmay1mo ago· 1 in thread

The "sleep" thing gives me the creeps so in my head I'm just going to think of it as the difference between "response time retrieval" and "background consolidation".

That also pushes up the need for provenance & inspectability. It's going to be interesting to see what kind of memory consolidation strategies are required for each domain use case.

sonink1mo ago

I think you are missing the most important part - forgetting. The missing "memory" layers is consolidation, prioritization AND forgetting (what is not important).

1 more reply

bmc75051mo ago

[1]: https://flann.cs.yale.edu

[2]: https://www.cs.toronto.edu/~hinton/csc2535/readings/ws.pdf

[3]: https://arxiv.org/abs/1711.02282

[4]: https://arxiv.org/abs/2006.08381

[5]: https://mural.maynoothuniversity.ie/id/eprint/1653/1/Hamilto...

swyx1mo ago

related preprint from the letta team https://arxiv.org/abs/2504.13171

elphard1mo ago

We should let them sleep with half a brain at a time like migrating birds.

jonnyasmar1mo ago

What happened to Claude's auto-dream? I thought it was brilliant.

wagwang1mo ago

energy1231mo ago

Would be a big deal if you don't have to care about quadratic attention cost. Some workflows become a lot cheaper.

hmokiguess1mo ago

This could be a solution in search of a problem, I would be careful with overfitting.

mos7658171mo ago

wasn't this what Google did long ago? https://openreview.net/forum?id=iiZy6xyVVE

scotty791mo ago

Context -> Lora would be soooo cool.

gt01mo ago

This seems as much like "sleep" as when a laptop "sleeps".

m3kw91mo ago

sleep aka processing the data differently.

m0unta1ntube1mo ago

why not just design the LLM like an OS?

semiinfinitely1mo ago

academic clickbait

hansmayer1mo ago

victorkulla1mo ago

No they do not. I'm sure that if you presented the same argument about, I don't know?, your car's CPU with built in AI; then this would be a whole different discussion entirely.

j / k navigate · click thread line to collapse