Is anyone playing with the combination of generative AI and OpenCyc? (opens in new tab)

(2ro.co)

101 points2ro1y ago71 comments

71 comments

33 comments · 12 top-level

cjbprime1y ago· 7 in thread

Haven't LLMs simply obsoleted OpenCyc? What could introducing OpenCyc add to LLMs, and why wouldn't allowing the LLM to look up Wikipedia articles accomplish the same thing?

cookiengineer1y ago

LLMs have just ignored the fundamental problem of reasoning: symbolic inference. They haven't "solved" it, they just don't give a damn about logical correctness.

wordpad251y ago

logical correctness as in formal logic is a huge step down

LLMs understand context and meaning and genuine intention

1 more reply

drdaeman1y ago

I’m not familiar with Cyc/OpenCyc, but it seems that it’s not just a knowledge base, but also does inference and reasoning - while LLMs don’t reason and will happily produce completely illogical statements.

rcxdude1y ago

Such systems tend to be equally good at producing nonsense: mainly because it's really hard to make a consistent set of 'facts', and once you have inconsistencies, creative enough logic can produce nonsense in any part of the system.

1 more reply

p1esk1y ago

Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.

2 more replies

tpm1y ago

LLMs don't know what is true (they have no way of knowing that), but they can babble about any topic. OpenCyc contains 'truth'. If they can be meaningfully combined, it could be good.

It's the same as using LLM for programming, when you have a way to evaluate the output, then it's fine, if not, you can't trust the output as it could be completely hallucinated.

rurban1y ago

No, they are completely orthogonal.

LLM are likelyhood completers, and classifiers. OpenCYC brings some logic and rationale into the classifiers. Without rationale LLM will continue hallucinating, spitting out nonsense.

JimDabell1y ago· 5 in thread

I think it would be interesting to use an LLM to distill Wikipedia into a set of assertions, then iterate through combinations of those assertions using OpenCyc.

You could look for contradictions between pages on the same subject in different languages, or different pages on related subjects.

You could synthesise new assertions based on what the current assertions imply, then render it to a sentence and fact-check it.

You could use verified assertions to benchmark language parsing and comprehension for new models. Basically unit test NLP.

You could produce a list of new assertions and implications introduced when a new edit to a page is made.

mindcrime1y ago

Along with that, a portion of the content of Wikipedia is already available in structured assertion form, thanks to DBPedia[1] and Wikidata[2]. I don't know the exact percentage, but it's a starting point at the very least.

[1]: https://www.dbpedia.org/

[2]: https://www.wikidata.org/wiki/Wikidata:Main_Page

throw3108221y ago

I've tried to use ChatGPT to produce wikidata queries- it sounds like a great combination. Unfortunately it's pretty hard to make it produce valid queries, and to find the wikidata documentation to teach it.

3 more replies

Ey7NFZ3P0nzAe1y ago

That's exactly one of my project, to use that on my medical anki flashcards, as well as on my many medical pdfs. I'm sure there's a good way to do RAG on it that would be sourced.

I intend to add it in the end to my sophisticated rag system wdoc (https://wdoc.readthedocs.io/en/latest/)

kziemski1y ago

OpenCyc always had that failing that it never felt accessible hidden away. GraphRag for opencyc maybe?

wordpad251y ago

I don't understand why you need OpenCyc for this, you could just chain LLM for both.

I think it would outperform on any language task as language that isn't formal requires interpretation which LLMs excel at.

ornornor1y ago· 4 in thread

At the risk of sounding dumb: I don’t understand what are the practical applications of OpenCyc. I get LLMs, you can ask questions and they’ll answer, they can write an article, they can summarize documents… what are the practical applications of OpenCyc?

PeterStuer1y ago

OpenCyc is the remaining evolution of Cyc, which was based on the idea that symbolic AI (knowledge graphs/Semanic Networks) would lead to AGI through scaling the knowledge base. You formalize the world so you can use logic to reason about it.

Later approaches of the same idea coning out of the academic Databases community reinvented this particular wheel with far better PR and branded it the Semantic Web with 'Ontologies' and RDF, OWL and its ilk.

While reasonable (pun intended) for small vertical domains, the approach has never made inroads in more broad general intelligence as IMHO it does not deal well with ambiguities, contradictions, multi level or perspective modeling and circular referential meaning in 'real world' reasoning and also tends to ignore agentive, transformative and temporal situatedness.

Its ideal seems to be a single thruth never changing model of the universe that is simply accepted by all.

mindcrime1y ago

All true in general, but allow me to add that there has been work on incorporating probabilistic / "soft" reasoning into the Semantic Web / RDF / OWL world. For some time now there has been PR-OWL (Probabilistic OWL)[1], and the recent work on RDF* (RDF-STAR)[2][3] emphasizes its application in terms of being able to - among other things - do stuff like adding weights (confidence scores, fuzzy probabilities, what-have-you) to RDF assertions. So the need to pursue these paths is understood, although I suppose one can argue that progress has been slow and painstaking.

[1]: https://www.pr-owl.org/

[2]: https://www.w3.org/2021/12/rdf-star.html

[3]: https://w3c.github.io/rdf-star/UCR/rdf-star-ucr.html

1 more reply

2roOP1y ago

Cyc apparently addresses this issue with what are termed "microtheories" - in one theory something can be so, and in a different theory it can be not so: https://cyc.com/archives/glossary/microtheory/

kurokikaze1y ago

Formalized knowledge representation, basically.

mindcrime1y ago· 3 in thread

> Is anyone playing with the combination of generative AI and OpenCyc?

OpenCyc specifically? No. But related semantic KB's using the SemanticWeb / RDF stack? Yes. That's something I'm spending a fair bit of time on right now.

And given that there is an RDF encoding of a portion fof the OpenCyc data out there[1], and that may make it into some of my experiments eventually, I guess the answer is more like "Not exactly, but sort of, maybe'ish" or something.

[1]: https://sourceforge.net/projects/texai/files/open-cyc-rdf/

jillesvangurp1y ago

Microsoft published quite a bit on the notion of graph RAG which is about enhancing regular RAG (retrieval augmented search) with graph databases. The idea is that instead of just pulling semantically related information to the query, it also pulls in information about things connected to those things. This gives the LLM more contextual information to work with.

Sounds like it would probably have. If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

kziemski1y ago

Combine it with prolog to prove reasoning.

1shooner1y ago

>If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

I think that's included in the Graphrag project released from MS.

pcblues1y ago· 2 in thread

I'll be the antiquated person here. Writing/speaking well has brought people to knowledge because the authors are uniquely positioned to use genius, humour, gentleness and generosity to bring an inquisitive but ignorant person into a new area of knowledge. When the value of that can be quantified, then we can compare AI "generation" of "efficiently written, useful knowledge" with what we had/have.

Same for fiction, visual art, raising children, caring for old people, and so on, and so on.

willvarfar1y ago

With big enough cohorts we can AB test by quantifying outcomes? Treatment group A has AI-generated texts and Treatment group B has the originals. We can have a questionnaire for how the group felt about the material etc, but we can also perhaps measure life outcomes over a bigger period e.g. performance at school or in the market place?

As I write this it feels naive and reminds me of a thousand ill-thought-out AB website tests etc. But still :D

bryanrasmussen1y ago

exactly how long are you planning on running these tests for - sounds like minimum 20 years?

1 more reply

dannyobrien1y ago

Unsure specifically, but there's a long-standing movement to combine GOFAI symbolic approaches, and modern neurally-influence systems. https://en.wikipedia.org/wiki/Neuro-symbolic_AI

K0balt1y ago

Hmm… maybe we could train /tune a model on symbolic logic similar to or even using CycL instead of python, and then when we have it “write code” it would be solving the problem we want it to think about, using symbolic logic?

You might be on to something here. The problem being there isn’t billions of tokens worth of CycL out there to train on, or is there?

2roOP1y ago

if anyone is interested:

https://2ro.co/post/768337188815536128

(EZ - a language for constraint logic programming)

amelius1y ago

How would you phrase that question in opencyc?

thom1y ago

Probably, but the bitter lesson still applies.

brokensegue1y ago

sometimes i think projects like cyc are like 3n+1 problem for AI. it's so alluring.

transfire1y ago

Isn’t Cycorp?

j / k navigate · click thread line to collapse

71 comments

33 comments · 12 top-level

cjbprime1y ago· 7 in thread

Haven't LLMs simply obsoleted OpenCyc? What could introducing OpenCyc add to LLMs, and why wouldn't allowing the LLM to look up Wikipedia articles accomplish the same thing?

cookiengineer1y ago

LLMs have just ignored the fundamental problem of reasoning: symbolic inference. They haven't "solved" it, they just don't give a damn about logical correctness.

wordpad251y ago

logical correctness as in formal logic is a huge step down

LLMs understand context and meaning and genuine intention

1 more reply

drdaeman1y ago

rcxdude1y ago

1 more reply

p1esk1y ago

Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.

2 more replies

tpm1y ago

LLMs don't know what is true (they have no way of knowing that), but they can babble about any topic. OpenCyc contains 'truth'. If they can be meaningfully combined, it could be good.

It's the same as using LLM for programming, when you have a way to evaluate the output, then it's fine, if not, you can't trust the output as it could be completely hallucinated.

rurban1y ago

No, they are completely orthogonal.

LLM are likelyhood completers, and classifiers. OpenCYC brings some logic and rationale into the classifiers. Without rationale LLM will continue hallucinating, spitting out nonsense.

JimDabell1y ago· 5 in thread

I think it would be interesting to use an LLM to distill Wikipedia into a set of assertions, then iterate through combinations of those assertions using OpenCyc.

You could look for contradictions between pages on the same subject in different languages, or different pages on related subjects.

You could synthesise new assertions based on what the current assertions imply, then render it to a sentence and fact-check it.

You could use verified assertions to benchmark language parsing and comprehension for new models. Basically unit test NLP.

You could produce a list of new assertions and implications introduced when a new edit to a page is made.

mindcrime1y ago

[1]: https://www.dbpedia.org/

[2]: https://www.wikidata.org/wiki/Wikidata:Main_Page

throw3108221y ago

3 more replies

Ey7NFZ3P0nzAe1y ago

That's exactly one of my project, to use that on my medical anki flashcards, as well as on my many medical pdfs. I'm sure there's a good way to do RAG on it that would be sourced.

I intend to add it in the end to my sophisticated rag system wdoc (https://wdoc.readthedocs.io/en/latest/)

kziemski1y ago

OpenCyc always had that failing that it never felt accessible hidden away. GraphRag for opencyc maybe?

wordpad251y ago

I don't understand why you need OpenCyc for this, you could just chain LLM for both.

I think it would outperform on any language task as language that isn't formal requires interpretation which LLMs excel at.

ornornor1y ago· 4 in thread

PeterStuer1y ago

Its ideal seems to be a single thruth never changing model of the universe that is simply accepted by all.

mindcrime1y ago

[1]: https://www.pr-owl.org/

[2]: https://www.w3.org/2021/12/rdf-star.html

[3]: https://w3c.github.io/rdf-star/UCR/rdf-star-ucr.html

1 more reply

2roOP1y ago

kurokikaze1y ago

Formalized knowledge representation, basically.

mindcrime1y ago· 3 in thread

> Is anyone playing with the combination of generative AI and OpenCyc?

OpenCyc specifically? No. But related semantic KB's using the SemanticWeb / RDF stack? Yes. That's something I'm spending a fair bit of time on right now.

[1]: https://sourceforge.net/projects/texai/files/open-cyc-rdf/

jillesvangurp1y ago

Sounds like it would probably have. If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

kziemski1y ago

Combine it with prolog to prove reasoning.

1shooner1y ago

>If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

I think that's included in the Graphrag project released from MS.

pcblues1y ago· 2 in thread

Same for fiction, visual art, raising children, caring for old people, and so on, and so on.

willvarfar1y ago

As I write this it feels naive and reminds me of a thousand ill-thought-out AB website tests etc. But still :D

bryanrasmussen1y ago

exactly how long are you planning on running these tests for - sounds like minimum 20 years?

1 more reply

dannyobrien1y ago

Unsure specifically, but there's a long-standing movement to combine GOFAI symbolic approaches, and modern neurally-influence systems. https://en.wikipedia.org/wiki/Neuro-symbolic_AI

K0balt1y ago

You might be on to something here. The problem being there isn’t billions of tokens worth of CycL out there to train on, or is there?

2roOP1y ago

if anyone is interested:

https://2ro.co/post/768337188815536128

(EZ - a language for constraint logic programming)

amelius1y ago

How would you phrase that question in opencyc?

thom1y ago

Probably, but the bitter lesson still applies.

brokensegue1y ago

sometimes i think projects like cyc are like 3n+1 problem for AI. it's so alluring.

transfire1y ago

Isn’t Cycorp?

j / k navigate · click thread line to collapse