This doesn’t have anything to do with AGI or brains. They are typically created or tuned by humans and then models fit/match/resolve entities to match the ontology.
To have an LLM construct a knowledge graph of where someone is (and I know this example is incredibly privacy invasive but its a simple concrete example not representative). Imagine giving an LLM access to all of your text messages. You can imagine giving it a prompt along the lines of "identify who is being discussed in this series of messages, if someone indicates where they are physically located report that as well" (you'd want to try harder than that, keeping it simple).
You could get an output that says something like `{"John Adams": "Philadelphia, PA, US"}`. If either the left or right side are missing create them. Then remove any LocatedAt edges for the left side and add one between these two entities. You have a simple knowledge graph.
Seems easy enough, but try to ask slightly harder questions... When did we know John Adams was in Philadelpha? Have they been there before? Where were they before Philadelphia? The ontology I just developed isn't capable of representing that data. You can of course solve these problems and there are common ontological patterns for representing it.
The point is, you kind of need to know the kind of questions you want to ask about your data when you're building your ontology and you're always going to miss something. Usually you find out the unknown questions you want to ask of the data only after you've already built your system and started asking it questions. It's the follow-ups that kill you.
There has been a lot of work on totally unstructured ontologies as well, but you're moving the hard problem elsewhere not solving it. Instead of having high quality data you can't answer every question with, you have arbitrary associations that may mean the same thing and thus any query you make is likely _missing_ relevant data and thus inaccurate.
Huge headache to go down, but honestly I think it is a worthwhile one. Previously if you changed your ontology to answer a new question, a human would have to go through and manually and painstakingly update your data to the new system. This is boring, tedious, easy-to-get-wrong-due-to-inattention kind of work. It's not complex, its not hard, its very easy to double check but it does require an understanding of language. LLMs are VERY capable of doing this kind of work, and likely more accurately.
- The effort needed for KGs has higher general potential value because RAG is useful
- The KG ontology quality-at-scale problem is now solvable by LLMs automating index-time data extraction, ontology design, & integration
This is an area we're actively looking at: Self-determining ontologies for optimizing RAG, such as during evolving events, news, logs, emails, customer conversations. We have active projects across areas here already like for emergency response, cyber security, news mining, etc. If folks are interested here, we're definitely looking for design partners with challenging problems on it. (And, looking to hire a principal cybersecurity researcher/engineer on it, and later in our other areas too!)