undefined | Better HN

0 pointssamueltates2y ago0 comments

So there's a few aspects to the persistent memory, Der_Einzige is correct, vector storage is one part. You can upload and embed documents, which then get indexed by LlamaIndex (just featured and one of my favourite AI tools and actually a big driver behind me making this project).

There's another aspect that is custom, that is the summary system. Basically coming from initial idea using api with ChatGPT launch last year. Takes past convos, summarises them, brings into current context. The issue is however that even that summary list gets too big, so you summarise that ad infinitum.

So that was a version I had, which was fine, but there was what I'm calling lossy temporal compression, so further back things got squished, and the 'detail' of the summary was variable depending on whether it'd 'filled up/ got squished'. So I made this system that basically has rolling windows of detail, that when they get filled up, get summarised, which then puts them into the next level of summary (calling them epoches but kinda confusing).

So each level of summary has a sort of 'open face' of unsummarised chunks (so latest unsummarised from each epoch), creating an exposed face of latest summaries for what essentially becomes each time period. Its kinda hard to explain i had to go into a sort of jazz trance to make it but imagine a pyramid being built from the side, but the side is staying still and the pyramid is moving backwards.

But on top of that, as the summaries are happening, theyre also pulling out keywords, notes and meta data, so bubbling that up the to top, so then that memory is traversable via the 'time based' pointers (top level summaries) and keywords or notes. That way you have a 'temporally biased' view (highest detail lowest level of summary is latest), but also a flat searchable structure on topic.

It is one of those OCD things where I could probably just be summarising the pyramid 'straight up' but I don't want summaries from one level mixed with another, and I don't want there to be too much variability with how many of each summary (at least for level one) there is.

But what this means is that the agent has in its context an overview (pulled from next part) - so its like 'hey sam did you do the thing, are we working on the R&D report today hows your mum), but then pointers to the 'exposed face' of summaries, so latest level 1, 2, 3, so it can see 'level 1 (direct summaries of convos) -R&D report finally finished, here are details)' up to 'level 6 - september - march - sam and nova start on conversation logging system', and basically choose to 'open' those pointers, or use keywords.

All of this is designed to try and keep like 500 tokens in context, so it can sort of traverse through it, (like you would skim through notes). The traversal itself I need to finish my looping system, where it can 'flick through' the notes itself (thats another story). So right now past a certain point I just flick the summaries to GPTINDEX to query (which is almost like it calling in another bot as an assistant).

Anyways long story short, I was OCD on how you would manage summaries in context and this is what I came up with, I'm pretty happy with the results, but really want to improve the recall and traversal, but goal is that Nova has the right info at the right time when you're talking, like you'd expect from a pretty organised person.

0 comments

2 comments · 1 top-level

ssd5322y ago· 1 in thread

Thanks for the detailed explanation. The product looks interesting. I'll give it a try.

samueltatesOP2y ago

Thank you, I feel like I'm just scratching the surface with how this interacts with vector DB mentioned below, and other tooling that just unlocks / connects the dots. So if you can plug in with me now I'm committed to laying those stepping stones out in front to what I can see as a pretty incredible set of capabilities for a personally embedded agent! Basically like the smartest journal / assistant you can have.

j / k navigate · click thread line to collapse