This implementation experiments with a biological approach by using the Ebbinghaus forgetting curve to manage context as a living substrate. Memories are assigned a "strength" score where each recall reinforces the data and flattens its decay curve (spaced repetition), while unused data eventually hits a threshold and is pruned.
To solve the "logical neighbor" problem where semantic search misses relevant but non-similar nodes, a graph layer is layered over the vector store. Benchmarked against the LoCoMo dataset, this reached 52% Recall@5, nearly double the accuracy of stateless vector stores, while cutting token waste by roughly 84%.
Built as a local first MCP server using DuckDB, the hypothesis is that for agents handling long-running projects, "what to forget" is just as critical as "what to remember." I'd be interested to hear if others are exploring non-linear decay or similar biological constraints for context management.
I've stopped trying to achieve general "memory". I just ask the agent to thoroughly, but concisely, document each project. If it writes developer documentation and a development plan/roadmap, as though a person was going to have to get up to speed and start working on the project, it provides all the information the agent needs tomorrow or next week to pick up where we left off.
The agent is not my friend. I don't need it to remember my birthday or the nasty thing I said about React last week. I need it to document what anyone, agent or human, would need to know to get productive in a particular repo, with no previous knowledge of the project.
Good, concise, developer and user documentation and a plan with checklists solves every problem people seem to think "memory" will solve: It tells the agent what tech stack to use (we hashed it out in planning), it tells it what commands it needs to run and test the app, it covers the static analysis tools in use (which formalizes code style, etc. in a way a vague comment I made a month ago cannot), and it is cheap. Markdown files are the native tongue of agents. No MCP, no skills, no API needed. Just read the file. It works for any agent, any model, and any human just getting started with the project.
Basically, I think memory makes agents dumber and less useful. I want it to focus on the task at hand.
One problem I have is that now CLAUDE.md or skills tend to get version controlled within projects, I suspect they could get in the way sometimes.
There is already so much fatigue induced by these systems, adding another one willingly does sound crazy.
me: "Hi AI, can you debug this SQL Statement?"
ai: "Well,based on your passion for garden hoses and extensive research of refrigerators, I'm going to guess you really want to discuss that"
What works in production for me is typed memory with very different decay curves. Personality and relationships are essentially permanent. Preferences fade in months. Stated intent fades in weeks. Emotion and events fade in days. Reinforcement (repeated recall) keeps things alive regardless of type.
Cross-project co-mingling stops because project-specific stuff actually decays out of relevance while who the user is persists. There's also a filter on what even gets written, which scopes between globally and locally-relevant information and writes accordingly (if at all). Most of the noise you're describing comes from systems that store everything they observe.
Flat memory failing is real. Memory failing in general is a stronger claim than that.
I think the base truth is the code, which can be loaded into context at no greater cost than whatever "memory" system you're using, probably lower cost, actually. A few hints in documentation fills out the rest of the picture.
You can't realistically give an LLM memory, as current technology doesn't allow retraining the model on the fly. You can only give it more data to ingest into its context. Unless that data is directly relevant to the task at hand, it's probably detrimental. At best, it is just burning tokens for no benefit.
Using MD files for this is fine till a point. If you keep on adding information in your md file it will bloat up and will have a huge amount of data to go through it might also have some noise which will be picked each and every time that md file is read into the memory.
Decay of unwanted data is very important factor to build up a good context for our agents. Maintaining a md file is also an overhead as either you will ask the agent to auto update it or have to do it manually.
The file will also not able to handle the context which changes over time for example initially I was working in MongoDB and now have moved to Postgres. This info either you have to modify in md manually or both the statements will appear before the llm.
MD file will keep all data points equally weighted which is not correct and it will also be unable to fetch the related data from the data point being fetched !
If we humans just did exactly what we did yesterday, what progress?
It's baked into the immutable constants of the universe for us; entropy, signal attenuation over distances... information breaks down over time.
Because of this all human social statistics trend towards zero with intentional conservatism. Progress is or collapse is all the universe affords. It doesn't seem interested in conservatism at all.
And
"Progress or..." not "is or"
I tend to think developing with agents should look at lot like managing a human (like, I use feature-branch development with PRs and review them, even on my own projects that have no other devs and don't need a paper trail for security audit purposes), so I theoretically can get down with an issue based process, but thus far I haven't seen it done in a way that isn't just making busy work for agents.
You said it cuts token usage by 84% but isn't that typical for any typical chunked RAG system?
And why did you specifically chose to test against the LoMoCo dataset when there's a lot of issues with it and it being very easy to cheat?
The main difference between a cache and this framework is that it prunes data not only based on recency but also based on importance and category failures fades fast, strategies persists longer, facts stays longer and assumptions fades faster so on.
The 84% is against storing everything forever. The parameter where it beats RAG is handling contradictions and maintaining the memory size near constant with active pruning of data.
Have also benchmarked it against LongMemEval-S dataset the results are in the repo
A user's job and personality should be effectively permanent. Their stated intent for this week should fade in days. Their emotional state from a single message should be gone by tomorrow. Decay everything at one rate and you're back to LRU with the problems you're calling out.
The "biological" framing isn't really doing much work. Ebbinghaus is one curve and fine, but it's not where the leverage is. Type-conditional half-life is. Without that, this is a cache.
It's obviously the latter, a system that 'remembers everything perfectly' is probably not optimal in most senses. Mortality is a property of both life and artificial systems, forcing the same retention policy on new information and old information probably does so at the expense of lifespan or stability.
The other comment is that spatial memory is probably a better trigger for memory, so if you're not tracking where the coding session starts, the folders it's visits, etc, then you're not really providing a good associative footpath for the assistant to retrieve whats important for any given project.
For failures and strategies it still might work as env drift on calendar anyhow (new version upgrade etc.). But for user preferences it does not.
I agree spatial memory tracking folder visits and session context as retrieval signal would be stronger I agree to that will try to incorporate !
Based on the feedback, have made few new enhancement. 1. Activity aware decay, now instead of a wall clock we run decay based on active days in which the user has been using the system. 2. Spatial/ Working directory memory, now while storing an memory we will associate the filePath or active directory and at time of retrieval boast memory associated to the working directory 3. Session wrap up boast, now at the end of a session we will count how many times a memory is being recalled n a session and will apply a recency boost. 4. Memory consolidation, periodically merge near duplicate memories into one combined memory instead of accumulating near identical facts. 5. Suppression link, now when update memory is called we will have a supresed by pointer from the old memory to the new one. This is to keep a track. 6. Smart recall throtelling, added an optional flag of recall cooldown so recall is not triggered in every singal turn. Useful when agents doing multi step task where context is already injected.
All the changes are avaliable in the latest version
pip install yourmemory !
Seems to maybe be useful but I’m not sure yet.
What I do now is preserve all my claude code conversations and set the context from there.
This allows me to curate memory and it’s been the best way so far.
pip install yourmemory yourmemory-setup
Thing is, this seems like it might be a Hard Problem of some sort. Everyone trying, no one making a clean breakthrough, I feel like it's some sort of smell. Either the desired function isn't well understood, or there's something missing, or it's in some weird complexity class, or ... something. My spidey senses tingle.
I wonder if others have the same feeling?