I need to dig into what they're doing here more with their approach, but I think using an LLM for both producing and consuming a knowledge graph is pretty nifty, which I wrote up about a year ago here, https://friend.computer/jekyll/update/2023/04/30/wikidata-ll... .
I will say figuring out how to actually add that conversation properly into a large knowledge graph is a bit tricky. ML does seem slightly better at producing an ontology than humans though (look how many times we've had to revise scientific names for creatures or book ordering)
The related issue that I think is being conflated in this thread is that even if your goal was to directly support graph queries, you could accomplish this with a vanilla database much easier than running a specialized graph db
Searching through their sources, it looks like the problem came from Neo4j's blog post misclassifying "knowledge augmentation" from a Microsoft research paper with "knowledge graph" (because of course they had to add "graph" to the title).
This approach is fine, and probably useful but its not a knowledge graph in the sense that its structure isn't encoding anything about why or how different entities are actually related. A concrete example in a knowledge graph you might have an entity "Joe" and a separate entity "Paris". Joe is currently located in Paris so would have a typed edge between the two entities of something like "LocatedAt".
I didn't dive into the code but what I inferred from the description and referenced literature, it is instead storing complete responses as "entities" and simply doing RAG style similarity searches to other nodes. It's a graph structured search index for sure but not a knowledge graph by the standard definitions.
The idea of actual entities and relationships defined like triples with some schema and appropriately resolved and linked can be useful for querying and building up the right context. It may even be time to start bringing back some ideas from the schema.org back the day to standardize across agents/assistants what entities and actions are represented in data fed to them.
https://aclanthology.org/2023.newsum-1.10/
Automatically created knowledge graphs using embeddings is a massively powerful technique and it should start to be exploited.
You can blame peer review for letting the definition of knowledge graph get watered down.
Also note that my coauthor. David, is the author of the first and still the best open source package for creating or working with semantic graphs.
https://github.com/neuml/txtai/blob/master/examples/38_Intro...
Gotta say txtai seems like a useful tool to throw in my toolbox!
At OpenAdapt (https://github.com/OpenAdaptAI/OpenAdapt) we are looking into using pm4py (https://github.com/pm4py) to extract a process graph from a recording of user actions.
I will look into this more closely. In the meantime, could the authors share their perspective on whether Memary could be useful here?
Perhaps it could make sense to add this to your effort?
There's a lot of literature around process mining, e.g.:
- https://en.wikipedia.org/wiki/Process_mining
- https://www.sciencedirect.com/science/article/pii/S266596382...
The current YouTube video has a query about the Dallas Mavericks and it’s not clear how it’s using any of its memory or special machinery to answer the query: https://www.youtube.com/watch?v=GnUU3_xK6bg
I have since found Kuzu Db, which looks foundationally miles ahead. Plus no jvm. But have not yet given it a shot for rough edges. At the time, it was easier just to stay in plain application code.
Hopefully the workload intended by this tool won't notice the bloat. But it would be nice to be able to dump huge loads of data into this knowledge graph as well, and let the GPT generate queries against it.