LLMs understand context and meaning and genuine intention
It's the same as using LLM for programming, when you have a way to evaluate the output, then it's fine, if not, you can't trust the output as it could be completely hallucinated.
LLM are likelyhood completers, and classifiers. OpenCYC brings some logic and rationale into the classifiers. Without rationale LLM will continue hallucinating, spitting out nonsense.
You could look for contradictions between pages on the same subject in different languages, or different pages on related subjects.
You could synthesise new assertions based on what the current assertions imply, then render it to a sentence and fact-check it.
You could use verified assertions to benchmark language parsing and comprehension for new models. Basically unit test NLP.
You could produce a list of new assertions and implications introduced when a new edit to a page is made.
I intend to add it in the end to my sophisticated rag system wdoc (https://wdoc.readthedocs.io/en/latest/)
I think it would outperform on any language task as language that isn't formal requires interpretation which LLMs excel at.
Later approaches of the same idea coning out of the academic Databases community reinvented this particular wheel with far better PR and branded it the Semantic Web with 'Ontologies' and RDF, OWL and its ilk.
While reasonable (pun intended) for small vertical domains, the approach has never made inroads in more broad general intelligence as IMHO it does not deal well with ambiguities, contradictions, multi level or perspective modeling and circular referential meaning in 'real world' reasoning and also tends to ignore agentive, transformative and temporal situatedness.
Its ideal seems to be a single thruth never changing model of the universe that is simply accepted by all.
OpenCyc specifically? No. But related semantic KB's using the SemanticWeb / RDF stack? Yes. That's something I'm spending a fair bit of time on right now.
And given that there is an RDF encoding of a portion fof the OpenCyc data out there[1], and that may make it into some of my experiments eventually, I guess the answer is more like "Not exactly, but sort of, maybe'ish" or something.
[1]: https://sourceforge.net/projects/texai/files/open-cyc-rdf/
Sounds like it would probably have. If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.
Same for fiction, visual art, raising children, caring for old people, and so on, and so on.
As I write this it feels naive and reminds me of a thousand ill-thought-out AB website tests etc. But still :D
You might be on to something here. The problem being there isn’t billions of tokens worth of CycL out there to train on, or is there?
https://2ro.co/post/768337188815536128
(EZ - a language for constraint logic programming)