KAG – Knowledge Graph RAG Framework (opens in new tab)

(github.com)

230 pointstaikon1y ago80 comments

80 comments

58 comments · 14 top-level

isoprophlex1y ago· 21 in thread

Fancy, I think, but again no word on the actual work of turning a few bazillion csv files and pdf's into a knowledge graph.

I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

kergonath1y ago

> I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

I have heard good things about Graphrag [1] (but what a stupid name). I did not have the time to try it properly, but it is supposed to build the knowledge graph itself somewhat transparently, using LLMs. This is a big stumbling block. At least vector stores are easy to understand and trivial to build.

It looks like KAG can do this from the summary on GitHub, but I could not really find how to do it in the documentation.

[1] https://microsoft.github.io/graphrag/

isoprophlex1y ago

Indeed they seem to actually know/show how the sausage is made... but still, no fire and forget approach for any random dataset. check out what you need to do if the default isnt working for you (scroll down to eg. entity_extraction settings). there is so much complexity there to deal with that i'd just roll my own extraction pipeline from the start, rather than learning someone elses complex setup (that you have to tweak for each new usecase)

https://microsoft.github.io/graphrag/config/yaml/

2 more replies

TrueDuality1y ago

GraphRAG isn't quite a knowledge graph. It is a graph of document snippets with semantic relations but is not doing fact extraction nor can you do any reasoning over the structure itself.

This is a common issue I've seen from LLM projects that only kind-of understand what is going on here and try and separate their vector database w/ semantic edge information into something that has a formal name.

swyx1y ago

why stupid? it uses a Graph in RAG. graphrag. if anything its too generic and multiple people who have the same idea now cannot use the name bc microsoft made the most noise about it.

2 more replies

jeromechoo1y ago

There are two paths to KG generation today and both are problematic in their own ways. 1. Natural Language Processing (NLP) 2. LLM

NLP is fast but requires a model that is trained on an ontology that works with your data. Once you do, it’s a matter of simply feeling the model your bazillion CSVs and PDFs.

LLMs are slow but way easier to start as ontologies can be generated on the fly. This is a double edged sword however as LLMs have a tendency to lose fidelity and consistency on edge naming.

I work in NLP, which is the most used in practice as it’s far more consistent and explainable in very large corpora. But the difficulty in starting a fresh ontology dead ends many projects.

roseway41y ago

You may want to take a look at Graphiti, which accepts plaintext or JSON input and automatically constructs a KG. While it’s primarily designed to enable temporal use cases (where data changes over time), it works just as well with static content.

https://github.com/getzep/graphiti

I’m one of the authors. Happy to answer any questions.

diggan1y ago

> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an OPENAI_API_KEY is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.

Don't have time to scan the source code myself, but are you using the OpenAI python library, so the server URL can easily be changed? Didn't see it exposed by your library, so hoping it can at least be overridden with a env var, so we could use local LLMs instead.

1 more reply

ganeshkrishnan1y ago

>uses OpenAI for LLM inference and embedding

This becomes a cyclical hallucination problem. The LLM hallucinates and create incorrect graph which in turn creates even more incorrect knowledge.

We are working on this issue of reducing hallucination in knowledge graphs and using LLM is not at all the right way.

1 more reply

dramebaaz1y ago

Excited to try it! Been looking for a temporally-aware way of creating a KG for my journal dataset

cratermoon1y ago

This has always been the Hard Problem. For one, constructing an ontology that is comprehensive, flexible, and stable is huge effort. Then, taking the unstructured mess of documents and categorizing them is an entire industry in itself. Librarians have cataloging as a sub-specialty of library sciences devoted to this.

So yes, there's a huge pile of tools and software for working with knowledge graphs, but to date populating the graph is still the realm of human experts.

cyanydeez1y ago

When you boil it down, the current LLMs could work effectively if a prompt engineer could figure out a converging loop of a librarian tasked with generating a hypertext web ring crossed with a wikipedia.

Perhaps one needs to manually create a starting point then ask the LLM to propse links to various documents or follow an existing one.

Sufficiently loopable transversal should create a KG

1 more reply

jimmySixDOF1y ago

There is some automated named entity extraction and relationship building out of un/semi structured data as part of the neo4j onboarding now to go with all these GraphRAG efforts (& maybe honorable mention to WhyHow.ai too)

melvinmelih1y ago

> but they never solve the first problem I have, which is actually constructing the KG itself.

I’ve noticed this too and the ironic thing is that building the KG is the most critical part of making everything work.

dmezzetti1y ago

txtai automatically builds graphs using vector similarly as data is loaded. Another option is to use something like GLiNER and create entities on the fly. And then create relationships between those entities and/or documents. Or you can do both.

https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...

mikestaub1y ago

https://github.com/HKUDS/LightRAG is pretty good

bkovacev1y ago

I have been building something like this for myself. Is there a room for a paid software, and would you be willing to pay for something like that?

dartos1y ago

IMO there is only a B2B market for this kind of thing.

I’ve heard of a few very large companies using glean (https://www.glean.com/)

This is the route I’d take if I wanted to make a business around rag.

fermisea1y ago

We're trying to solve this problem at ergodic.ai, combining structured tables and pdfs into a single KG

elbi1y ago

Are you creating first the kg or using llm to do so?

axpy9061y ago

Came here to say this and glad I am not the only one. Building out an ontology seems like quite an expensive process. It would be hard to convince my stakeholders to do this.

lunatuna1y ago

There are several ontologies already well built out. Utilities and pharma both have them as an example. They are built by committee of vendors and users. They take a bit to penetrate the approach and language used. Often they are built to be adaptable.

I’ve had good success with CIM for Utilities to build a network graph for modelling the distribution and transmission networks adding sensor and event data for monitoring and analysis about 15 years ago.

Anywhere there is a technology focussed consortium of vendors and users building standards you will likely find a prebuilt graph. When RDF was “hot” many of the these groups spun out some attempt to model their domain.

In summary, if you need one look for one. Maybe there’s one waiting for you and you get to do less convincing and more doing.

rastierastie1y ago· 5 in thread

What do other HNers make out of this? Would you use this? Responsible for a legaltech startup here.

leobg1y ago

Fellow legal tech founder here. The first thing I look at in projects like this are the prompts:

https://github.com/OpenSPG/KAG/blob/master/kag/builder/promp...

All you’re doing here is “front loading” AI: Imstead of running slow and expensive LLMs at query time, you run them at index time.

It’s a method for data augmentation or, in database lingo, index building. You use LLMs to add context to chunks that doesn’t exist on either the word level (searchable by BM25) or the semantic level (searchable by embeddings).

A simple version of this would be to ask an LLM:

“List all questions this chunk is answering.” [0]

But you can do the same thing for time frames, objects, styles, emotions — whatever you need a “handle” for to later retrieve via BM25 or semantic similarity.

I dreamed of doing that back in 2020, but it would’ve been prohibitively expensive. Because it requires passing your whole corpus through an LLM, possibly multiple times, once for each “angle”.

That being said, I recommend running any “Graph RAG” system you see here on HN over some 1% or so of your data. And then look inside the database. Look at all text chunks, original and synthetic, that are now in your index.

I’ve done this for a consulting client who absolutely wanted “Graph RAG”. I found the result to be an absolute mess. That is because these systems are built to cover a broad range of applications and are not adapted at all to your problem domain.

So I prefer working backwards:

What kinds of queries do I need to handle? What does the prompt to my query time LLM need to look like? What context will the LLM need? How can I have this context for each of my chunks, and be able to search by match air similarity? And now how can I make an LLM return exactly that kind of context, with as few hallucinations and as little filler as possible, for each of my chunks?

This gives you a very lean, very efficient index that can do everything you want.

[0] For a prompt, you’d add context and give the model “space to think”, especially when using a smaller model. Also, you’d instruct it to use a particular format, so you can parse out the part that you need. This “unfancy” approach lets you switch out models easily and compare them against each other without having to care about different APIs for “structured output”.

TrueDuality1y ago

Prompts are a great place to look for these, but the part you linked too isn't very important for knowledge graph generation. It is doing an initial semantic breakdown into more manageable chunks. The actual entity and fact extraction that actually turns this into a knowledge graph is this one:

https://github.com/OpenSPG/KAG/blob/master/kag/builder/promp...

GraphRAG and a lot of the semantic indexes are simply vector database with pre-computed similarity edges which does not allow you to perform any reasoning over (the definition and intention of a knowledge graph).

This is probably worth looking at, its the first opensource project I've seen that is actually using LLMs to generate knowledge graphs. This does look pretty primitive for that task but it might be a useful reference for others going down this road.

1 more reply

intalentive1y ago

>I found the result to be an absolute mess. That is because these systems are built to cover a broad range of applications and are not adapted at all to your problem domain.

Same findings here, re: legal text. Basic hybrid search performs better. In this use case the user knows what to look for, so the queries are specific. The advantage of graph RAG is when you need to integrate disparate sources for a holistic overview.

adeptima1y ago

This comment and an idea to work backwards deserve an article!

Just finished a call a few mins, and we came to conclusion we do natural query language, BM25 scoring with Tantivy based code first

https://github.com/quickwit-oss/tantivy

In meanwhile we collect all questions to ask LLM so we can be more consious at Hybrid Search implementation phase

ankit2191y ago

If you have to deal with domain specific data, then this would not work as well. I mean it will get you an incremental shift (based on what I see, it's just creating explicit relationships at the index time instead of letting the model do it at runtime before generating an output. Effective incrementally, but depends on type of data.) yes, though not enough to justify redoing your own pipeline. You are likely better off with your current approach and developing robust evals.

If you want a transformational shift in terms of accuracy and reasoning, the answer is different. Many a times RAG accuracy suffers because the text is out of distribution, and ICL does not work well. You get away with it if all your data is in public domain in some form (ergo, llm was trained on it), else you keep seeing the gaps with no way to bridge them. I published a paper around it and how to effciently solve it, if interested. Here is a simplified blog post on the same: https://medium.com/@ankit_94177/expanding-knowledge-in-large...

Edit: Please reach out here or on email if you would like further details. I might have skipped too many things in the above comment.

zbyforgotp1y ago· 4 in thread

LLMs are not that different from humans, in both cases you have some limited working memory and you need to fit the most relevant context into it. This means that if you have a new knowledge base for llms it should be useful for humans too. There should be a lot of cross pollination between these tools.

But we need a theory on the differences too. Now it is kind of random how we differentiate the tools. We need ergonomics for llms.

photonthug1y ago

> This means that if you have a new knowledge base for llms it should be useful for humans too. There should be a lot of cross pollination between these tools.

This is realistic but hence going to be unpopular unfortunately, because people expect magic / want zero effort.

andai1y ago

>ergonomics for LLMs

When I need to build something for an LLM to use, I ask the LLM to build it. That way, by definition, the LLM has a built in understanding of how the system should work, because the LLM itself invented it.

Similarly, when I was doing some experiments with a GPT-4 powered programmer, in the early days I had to omit most of the context (just have method stubs). During that time I noticed that most of the code written by GPT-4 was consistently the same. So I could omit its context because the LLM would already "know" (based on its mental model) what the code should be.

EagnaIonat1y ago

> the LLM has a built in understanding of how the system should work,

Thats not how an LLM works. It doesn't understand your question, nor the answer. It can only give you a statistically significant sequence of words that should follow what you gave it.

matthewsinclair1y ago

> the LLM has a built in understanding of how the system should work, because the LLM itself invented it

Really? I’m not sure that the word “understanding” means the same thing to you as it does to me.

swyx1y ago· 4 in thread

advice to OP - that gif showing how you zoom in and star the repo is a giant turnoff. i closed my tab when i saw that.

OJFord1y ago

> Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases

That's not even correct, starring isn't going to do that. You'd need to smash that subscribe button and not forget the bell icon (metaphorically), not ~like~ star it.

Dowwie1y ago

If, on the other hand, it were a long, drawn-out animation of moving the mouse pointer to the button, hovering for a few seconds, and then slowing clicking while dragging the mouse away so that the button didn't select and they had to repeat the task again-- that would be art.

swyx1y ago

sounds agentic

alt1871y ago

Agreed. Do you think potential users of your repo don't know how to star it?

dcreater1y ago· 4 in thread

Yet another RAG/knowledge graph implementation.

At this point, the onus is on the developer to prove it's value through AB comparisons versus traditional RAG. No person/team has the bandwidth to try out this (n + 1) solution.

ertdfgcvb1y ago

I enjoy the explosion of tools. Only time will tell which ones stand the test of time. But this is my day job so I never get tired of new tools but I can see how non-industry folks can find it overwhelming

trees1011y ago

Can you expand on that? Where do big enterprise orgs products fit in, eg Microsoft, Google? What are the leading providers as you see them? As an outsider it is bewildering. First I hear that llama_index is good, then I hear that its overcomplicating slop. What sources or resources are reliable on this? How can we develop anything that will still stand in 12 months time?

4 more replies

dcreater1y ago

But unfortunately its like a game of musical chairs or whoever is pushing their wares the hardest that we may get stuck with rather than the actual best solution.

In fact, im wondering if thats what happened in the early noughts and we had the misfortune of Java, and still have the misfortune of Javascript.

TrueDuality1y ago

This is actually the first project I've seen that is actually doing any kind of knowledge graph generation. Most are just precomputing similarity scores as edges between document snippets that act as their nodes. People have basically been calling their vector databases with an index a knowledge graph.

This is actually attempting fact extraction into an ontology so you can reason over this instead of reasoning in the LLM.

djoldman1y ago· 2 in thread

"Whitepaper" is guarded behind this: https://survey.alipay.com/apps/zhiliao/n33nRj5OV

> The white paper is only available for professional developers from different industries. We need to collect your name, contact information, email address, company name, industry type, position and your download purpose to verify your identity...

That's new.

mdaniel1y ago

I've had just outstanding success with "view source" and grab the "on success" parameter out of the form. Some sites are bright enough to do real server-side work first, and some other sites will email the link, but I'd guess it's easily 75/25 for ones that include the link in the original page body, as does this one:

    ,after_submitting: 'https://spg.openkg.cn/en-US/download?token=0a735e9a-72ea-11ee-b962-0242ac120002'

https://mdn.alipayobjects.com/huamei_xgb3qj/afts/file/A*6gpq...

BOOSTERHIDROGEN1y ago

For industrial plant white papers, a common practice is to submit your company email address and name as part of the access process.

tessierashpool91y ago· 2 in thread

a quick look leaves me with the question:

what exactly is being tokenized? RDS, OWL, Neo4j, ...?

how is the knowledge graph serialized?

tessierashpool91y ago

isn't this a key question? anybody here knowledgeable and care to reply?

dartos1y ago

I worked at a small company experimenting with RAG.

We used neo4j as the graph database and used the LLM to generate parts of the spark queries.

1 more reply

slowmovintarget1y ago· 1 in thread

How does this compare to the Model Context Protocol?

https://modelcontextprotocol.io/introduction

febin1y ago

MCP is a protocol for tool usage, where as KAG is for knowledge representation and information retrieval.

flimflamm1y ago· 1 in thread

Paper also here https://arxiv.org/pdf/2409.13731

iamnotempacc1y ago

'tis different. It's 112 vs 33 pages. And the content is not the same.

mentalgear1y ago

It has come to the point that we need benchmarks for (Graph)-Rag systems now, same as we have for pure LLMs. However vendors will certainly then optimize for the popular ones, so we need a good mix of public, private and dynamic eval datasets.

mentalgear1y ago

I like their description/approach for logical problem solving:

2.2.

"The engine includes three types of operators: planning, reasoning, and retrieval, which transform natural language problems into problem solving processes that combine language and notation.

In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation."

Kerbiter1y ago

Somehow the first time I see such pop up in my feed. Glad that someone (judging by the comments that is not the only one project) is working on this, of course I am rather far from the field but to me this feels like a step in the right direction for advancing AI past the hyperadvanced parrot stage that is the current "AI" is (at least per my perception).

nextworddev1y ago

Constructing and maintaining a knowledge graph is one thing.

Retrieving one with low latency is another.

ritiksharma231y ago

How will this work on Scale??

j / k navigate · click thread line to collapse

80 comments

58 comments · 14 top-level

isoprophlex1y ago· 21 in thread

Fancy, I think, but again no word on the actual work of turning a few bazillion csv files and pdf's into a knowledge graph.

I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

kergonath1y ago

> I see a lot of these KG tools pop up, but they never solve the first problem I have, which is actually constructing the KG itself.

It looks like KAG can do this from the summary on GitHub, but I could not really find how to do it in the documentation.

[1] https://microsoft.github.io/graphrag/

isoprophlex1y ago

https://microsoft.github.io/graphrag/config/yaml/

2 more replies

TrueDuality1y ago

GraphRAG isn't quite a knowledge graph. It is a graph of document snippets with semantic relations but is not doing fact extraction nor can you do any reasoning over the structure itself.

swyx1y ago

why stupid? it uses a Graph in RAG. graphrag. if anything its too generic and multiple people who have the same idea now cannot use the name bc microsoft made the most noise about it.

2 more replies

jeromechoo1y ago

There are two paths to KG generation today and both are problematic in their own ways. 1. Natural Language Processing (NLP) 2. LLM

NLP is fast but requires a model that is trained on an ontology that works with your data. Once you do, it’s a matter of simply feeling the model your bazillion CSVs and PDFs.

LLMs are slow but way easier to start as ontologies can be generated on the fly. This is a double edged sword however as LLMs have a tendency to lose fidelity and consistency on edge naming.

I work in NLP, which is the most used in practice as it’s far more consistent and explainable in very large corpora. But the difficulty in starting a fresh ontology dead ends many projects.

roseway41y ago

https://github.com/getzep/graphiti

I’m one of the authors. Happy to answer any questions.

diggan1y ago

> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an OPENAI_API_KEY is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.

1 more reply

ganeshkrishnan1y ago

>uses OpenAI for LLM inference and embedding

This becomes a cyclical hallucination problem. The LLM hallucinates and create incorrect graph which in turn creates even more incorrect knowledge.

We are working on this issue of reducing hallucination in knowledge graphs and using LLM is not at all the right way.

1 more reply

dramebaaz1y ago

Excited to try it! Been looking for a temporally-aware way of creating a KG for my journal dataset

cratermoon1y ago

So yes, there's a huge pile of tools and software for working with knowledge graphs, but to date populating the graph is still the realm of human experts.

cyanydeez1y ago

Perhaps one needs to manually create a starting point then ask the LLM to propse links to various documents or follow an existing one.

Sufficiently loopable transversal should create a KG

1 more reply

jimmySixDOF1y ago

melvinmelih1y ago

> but they never solve the first problem I have, which is actually constructing the KG itself.

I’ve noticed this too and the ironic thing is that building the KG is the most critical part of making everything work.

dmezzetti1y ago

https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...

mikestaub1y ago

https://github.com/HKUDS/LightRAG is pretty good

bkovacev1y ago

I have been building something like this for myself. Is there a room for a paid software, and would you be willing to pay for something like that?

dartos1y ago

IMO there is only a B2B market for this kind of thing.

I’ve heard of a few very large companies using glean (https://www.glean.com/)

This is the route I’d take if I wanted to make a business around rag.

fermisea1y ago

We're trying to solve this problem at ergodic.ai, combining structured tables and pdfs into a single KG

elbi1y ago

Are you creating first the kg or using llm to do so?

axpy9061y ago

Came here to say this and glad I am not the only one. Building out an ontology seems like quite an expensive process. It would be hard to convince my stakeholders to do this.

lunatuna1y ago

In summary, if you need one look for one. Maybe there’s one waiting for you and you get to do less convincing and more doing.

rastierastie1y ago· 5 in thread

What do other HNers make out of this? Would you use this? Responsible for a legaltech startup here.

leobg1y ago

Fellow legal tech founder here. The first thing I look at in projects like this are the prompts:

https://github.com/OpenSPG/KAG/blob/master/kag/builder/promp...

All you’re doing here is “front loading” AI: Imstead of running slow and expensive LLMs at query time, you run them at index time.

A simple version of this would be to ask an LLM:

“List all questions this chunk is answering.” [0]

But you can do the same thing for time frames, objects, styles, emotions — whatever you need a “handle” for to later retrieve via BM25 or semantic similarity.

I dreamed of doing that back in 2020, but it would’ve been prohibitively expensive. Because it requires passing your whole corpus through an LLM, possibly multiple times, once for each “angle”.

So I prefer working backwards:

This gives you a very lean, very efficient index that can do everything you want.

TrueDuality1y ago

https://github.com/OpenSPG/KAG/blob/master/kag/builder/promp...

1 more reply

intalentive1y ago

>I found the result to be an absolute mess. That is because these systems are built to cover a broad range of applications and are not adapted at all to your problem domain.

adeptima1y ago

This comment and an idea to work backwards deserve an article!

Just finished a call a few mins, and we came to conclusion we do natural query language, BM25 scoring with Tantivy based code first

https://github.com/quickwit-oss/tantivy

In meanwhile we collect all questions to ask LLM so we can be more consious at Hybrid Search implementation phase

ankit2191y ago

Edit: Please reach out here or on email if you would like further details. I might have skipped too many things in the above comment.

zbyforgotp1y ago· 4 in thread

But we need a theory on the differences too. Now it is kind of random how we differentiate the tools. We need ergonomics for llms.

photonthug1y ago

> This means that if you have a new knowledge base for llms it should be useful for humans too. There should be a lot of cross pollination between these tools.

This is realistic but hence going to be unpopular unfortunately, because people expect magic / want zero effort.

andai1y ago

>ergonomics for LLMs

EagnaIonat1y ago

> the LLM has a built in understanding of how the system should work,

Thats not how an LLM works. It doesn't understand your question, nor the answer. It can only give you a statistically significant sequence of words that should follow what you gave it.

matthewsinclair1y ago

> the LLM has a built in understanding of how the system should work, because the LLM itself invented it

Really? I’m not sure that the word “understanding” means the same thing to you as it does to me.

swyx1y ago· 4 in thread

advice to OP - that gif showing how you zoom in and star the repo is a giant turnoff. i closed my tab when i saw that.

OJFord1y ago

> Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases

That's not even correct, starring isn't going to do that. You'd need to smash that subscribe button and not forget the bell icon (metaphorically), not ~like~ star it.

Dowwie1y ago

swyx1y ago

sounds agentic

alt1871y ago

Agreed. Do you think potential users of your repo don't know how to star it?

dcreater1y ago· 4 in thread

Yet another RAG/knowledge graph implementation.

At this point, the onus is on the developer to prove it's value through AB comparisons versus traditional RAG. No person/team has the bandwidth to try out this (n + 1) solution.

ertdfgcvb1y ago

trees1011y ago

4 more replies

dcreater1y ago

But unfortunately its like a game of musical chairs or whoever is pushing their wares the hardest that we may get stuck with rather than the actual best solution.

In fact, im wondering if thats what happened in the early noughts and we had the misfortune of Java, and still have the misfortune of Javascript.

TrueDuality1y ago

This is actually attempting fact extraction into an ontology so you can reason over this instead of reasoning in the LLM.

djoldman1y ago· 2 in thread

"Whitepaper" is guarded behind this: https://survey.alipay.com/apps/zhiliao/n33nRj5OV

That's new.

mdaniel1y ago

    ,after_submitting: 'https://spg.openkg.cn/en-US/download?token=0a735e9a-72ea-11ee-b962-0242ac120002'

https://mdn.alipayobjects.com/huamei_xgb3qj/afts/file/A*6gpq...

BOOSTERHIDROGEN1y ago

For industrial plant white papers, a common practice is to submit your company email address and name as part of the access process.

tessierashpool91y ago· 2 in thread

a quick look leaves me with the question:

what exactly is being tokenized? RDS, OWL, Neo4j, ...?

how is the knowledge graph serialized?

tessierashpool91y ago

isn't this a key question? anybody here knowledgeable and care to reply?

dartos1y ago

I worked at a small company experimenting with RAG.

We used neo4j as the graph database and used the LLM to generate parts of the spark queries.

1 more reply

slowmovintarget1y ago· 1 in thread

How does this compare to the Model Context Protocol?

https://modelcontextprotocol.io/introduction

febin1y ago

MCP is a protocol for tool usage, where as KAG is for knowledge representation and information retrieval.

flimflamm1y ago· 1 in thread

Paper also here https://arxiv.org/pdf/2409.13731

iamnotempacc1y ago

'tis different. It's 112 vs 33 pages. And the content is not the same.

mentalgear1y ago

I like their description/approach for logical problem solving:

2.2.

"The engine includes three types of operators: planning, reasoning, and retrieval, which transform natural language problems into problem solving processes that combine language and notation.

Kerbiter1y ago

nextworddev1y ago

Constructing and maintaining a knowledge graph is one thing.

Retrieving one with low latency is another.

ritiksharma231y ago

How will this work on Scale??

j / k navigate · click thread line to collapse