Show HN: AI memory with biological decay (52% recall) (opens in new tab)

(github.com)

98 pointsSachitRafa2mo ago53 comments

Most RAG setups fail because they treat memory like a static filing cabinet. When every transient bug fix or abandoned rule is stored forever, the context window eventually chokes on noise, spiking token costs and degrading the agent's reasoning.

This implementation experiments with a biological approach by using the Ebbinghaus forgetting curve to manage context as a living substrate. Memories are assigned a "strength" score where each recall reinforces the data and flattens its decay curve (spaced repetition), while unused data eventually hits a threshold and is pruned.

To solve the "logical neighbor" problem where semantic search misses relevant but non-similar nodes, a graph layer is layered over the vector store. Benchmarked against the LoCoMo dataset, this reached 52% Recall@5, nearly double the accuracy of stateless vector stores, while cutting token waste by roughly 84%.

Built as a local first MCP server using DuckDB, the hypothesis is that for agents handling long-running projects, "what to forget" is just as critical as "what to remember." I'd be interested to hear if others are exploring non-linear decay or similar biological constraints for context management.

GitHub: https://github.com/sachitrafa/cognitive-ai-memory

Show HN: AI memory with biological decay (52% recall)

(github.com)

98 pointsSachitRafa2mo ago53 comments

GitHub: https://github.com/sachitrafa/cognitive-ai-memory

53 comments

45 comments · 17 top-level

SwellJoe2mo ago· 15 in thread

I know everybody seems to want the agent to remember every conversation they've ever had with it, but I just don't see the value in that. In fact, it seems to hurt productivity to have the agent second guessing me based on something I said yesterday. Every time I've used any memory system, the agent gets distracted from the current tasks based on previous conversations and branches of development...often comingling unrelated projects (I work on code for work, open source projects, a bunch of unrelated side projects, etc.) and trying to satisfy requirements that don't make sense.

I've stopped trying to achieve general "memory". I just ask the agent to thoroughly, but concisely, document each project. If it writes developer documentation and a development plan/roadmap, as though a person was going to have to get up to speed and start working on the project, it provides all the information the agent needs tomorrow or next week to pick up where we left off.

The agent is not my friend. I don't need it to remember my birthday or the nasty thing I said about React last week. I need it to document what anyone, agent or human, would need to know to get productive in a particular repo, with no previous knowledge of the project.

Good, concise, developer and user documentation and a plan with checklists solves every problem people seem to think "memory" will solve: It tells the agent what tech stack to use (we hashed it out in planning), it tells it what commands it needs to run and test the app, it covers the static analysis tools in use (which formalizes code style, etc. in a way a vague comment I made a month ago cannot), and it is cheap. Markdown files are the native tongue of agents. No MCP, no skills, no API needed. Just read the file. It works for any agent, any model, and any human just getting started with the project.

Basically, I think memory makes agents dumber and less useful. I want it to focus on the task at hand.

pil0u2mo ago

I appreciate your comment, and can relate. I tested a couple of "memory" systems, doing some heavy lifting or seemingly implementation of theories (layering, hot memory, etc), I can't really tell if they improve performance, quality or reliability on a task. But they do increase the overhead, for the LLM and for me, that's for sure.

One problem I have is that now CLAUDE.md or skills tend to get version controlled within projects, I suspect they could get in the way sometimes.

There is already so much fatigue induced by these systems, adding another one willingly does sound crazy.

1 more reply

mrits2mo ago

I'm just thinking of youtube or amazon type algorithms applying here.

me: "Hi AI, can you debug this SQL Statement?"

ai: "Well,based on your passion for garden hoses and extensive research of refrigerators, I'm going to guess you really want to discuss that"

staticassertion2mo ago

I've had to remove any of the "knowledge" about me from any agent I use. "As a security engineer, blah blah blah" or "as a rust developer blah blah blah" even though my questions has nothing to do with those topics and they're a huge distraction.

2 more replies

mtrifonov1mo ago

You're right but I think you're describing flat memory. The agent gets distracted because every old fact has the same weight as the current one. That's a salience problem.

What works in production for me is typed memory with very different decay curves. Personality and relationships are essentially permanent. Preferences fade in months. Stated intent fades in weeks. Emotion and events fade in days. Reinforcement (repeated recall) keeps things alive regardless of type.

Cross-project co-mingling stops because project-specific stuff actually decays out of relevance while who the user is persists. There's also a filter on what even gets written, which scopes between globally and locally-relevant information and writes accordingly (if at all). Most of the noise you're describing comes from systems that store everything they observe.

Flat memory failing is real. Memory failing in general is a stronger claim than that.

SwellJoe1mo ago

I'm making the stronger claim. I don't think memory (at least, what people call "memory", even though it isn't...the memories LLMs have are baked in at training, everything else is context), no matter how fancy, improves outcomes, at least for the work I do on the software I work on. I just don't think the agent needs what people are calling memory.

I think the base truth is the code, which can be loaded into context at no greater cost than whatever "memory" system you're using, probably lower cost, actually. A few hints in documentation fills out the rest of the picture.

You can't realistically give an LLM memory, as current technology doesn't allow retraining the model on the fly. You can only give it more data to ingest into its context. Unless that data is directly relevant to the task at hand, it's probably detrimental. At best, it is just burning tokens for no benefit.

Terretta1mo ago

Research shows primed context has some equivalence to a fine tuning layer.

1 more reply

netcan1mo ago

Useful comment. Thanks.

Kim_Bruning1mo ago

I'm really curious to see your memory code, if you're sharing!