undefined | Better HN

0 pointsTrueDuality2y ago0 comments

Yeah precisely. Knowledge graphs are simple to think about but as soon as you look into them you realize all the complexity is in the creation of a meaningful ontology and loading data into that ontology. I actually think LLMs can be massively useful for building up the ontology but probably not in the creation of the ontology itself (far too ambiguous and large/conceptual task for them right now).

0 comments

4 comments · 1 top-level

kiminemism2y ago· 3 in thread

How do we build ontology using LLMs? Will the building blocks be like the different parts of a brain? P.S I am assuming that by "creation of ontology itself" means creation of AGI.

dbish2y ago

Ontologies are just defining what certain category, words, and entity types mean. Commonly used in NLP for data representation (“facts”/triples/etc.) in knowledge graphs and other places where the definition of an ontology helps provide structure.

This doesn’t have anything to do with AGI or brains. They are typically created or tuned by humans and then models fit/match/resolve entities to match the ontology.

TrueDualityOP2y ago

@dbish nailed it, but I can give you a bit more concrete example. Continuing off the light example I started off with. An ontology for knowing what city a person currently is in. We have two classes of entities, a person, and a city. There is a single relationship type "LocatedAt" you can add and remove edges to indicate where a person is and you can construct some verification rules such as "a person can only be in one city at time".

To have an LLM construct a knowledge graph of where someone is (and I know this example is incredibly privacy invasive but its a simple concrete example not representative). Imagine giving an LLM access to all of your text messages. You can imagine giving it a prompt along the lines of "identify who is being discussed in this series of messages, if someone indicates where they are physically located report that as well" (you'd want to try harder than that, keeping it simple).

You could get an output that says something like `{"John Adams": "Philadelphia, PA, US"}`. If either the left or right side are missing create them. Then remove any LocatedAt edges for the left side and add one between these two entities. You have a simple knowledge graph.

Seems easy enough, but try to ask slightly harder questions... When did we know John Adams was in Philadelpha? Have they been there before? Where were they before Philadelphia? The ontology I just developed isn't capable of representing that data. You can of course solve these problems and there are common ontological patterns for representing it.

The point is, you kind of need to know the kind of questions you want to ask about your data when you're building your ontology and you're always going to miss something. Usually you find out the unknown questions you want to ask of the data only after you've already built your system and started asking it questions. It's the follow-ups that kill you.

There has been a lot of work on totally unstructured ontologies as well, but you're moving the hard problem elsewhere not solving it. Instead of having high quality data you can't answer every question with, you have arbitrary associations that may mean the same thing and thus any query you make is likely _missing_ relevant data and thus inaccurate.

Huge headache to go down, but honestly I think it is a worthwhile one. Previously if you changed your ontology to answer a new question, a human would have to go through and manually and painstakingly update your data to the new system. This is boring, tedious, easy-to-get-wrong-due-to-inattention kind of work. It's not complex, its not hard, its very easy to double check but it does require an understanding of language. LLMs are VERY capable of doing this kind of work, and likely more accurately.

lmeyerov2y ago

Yes! My position on KGs largely flipped post GPT-3. Before KGs were mostly a niche thing given the cost vs rewards, and now they're an everyone thing.

- The effort needed for KGs has higher general potential value because RAG is useful

- The KG ontology quality-at-scale problem is now solvable by LLMs automating index-time data extraction, ontology design, & integration

This is an area we're actively looking at: Self-determining ontologies for optimizing RAG, such as during evolving events, news, logs, emails, customer conversations. We have active projects across areas here already like for emergency response, cyber security, news mining, etc. If folks are interested here, we're definitely looking for design partners with challenging problems on it. (And, looking to hire a principal cybersecurity researcher/engineer on it, and later in our other areas too!)

1 more reply

j / k navigate · click thread line to collapse