And for a use-case simple enough for this system to work (e.g. regurgitate a policy), it seems like the LLM is unnecessary. After all, if your system can perfectly interpret the question and answer and see if this rule set applies, then you can likely just use the rule set to generate the answer rather than wasting resources with a giant language model.
First, they have a pretty low token limit for a “policy” so there won’t be anything too complex.
Second, they explicitly say they don’t support synonyms. Seems very likely it’ll just reject anything that doesn’t fit closely, so you’ll end up with “I’m sorry. I don’t know what the ‘bought it’ date is, please provide purchase date?” Until the customer does the work of using the exact language.
It looks like it takes a policy “returns must be processed within 30 days of purchase” and turns it into a pseudo-code type logic “if {purchase date} < {today-30d} => reject”. Then it seems to parse the LLM query and apply the logic. Considering my first two points, it’ll just be used to turn GPUs into another inhuman system to help companies avoid having to be human about customer support, while sounding more human.
There is a paper and set of work recently that uses a measurement of entropy on the set of returned logits to detect a "certainty" estimate for outputs and flag hallucinations. It is a lot more rigorous than the OP but like everything in this space needs further testing.
https://app.gitsense.com/--/images/options.png
https://app.gitsense.com/--/images/validate.png
https://app.gitsense.com/--/images/models.png
The basic idea behind my chat system is, every model can be wrong, but it is unlikely that all will be wrong at the same time. This chat system is based on what I've learned when building my spelling and grammar checker. If you look at the following links, you can see that even the best models can get it wrong, but it is unlikely that others will get it wrong at the same time.
https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&samples...
https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...
Here's a prompt that proves this untrue, for now at least:
> A woman and her biological son are gravely injured in a car accident and are both taken to the hospital for surgery. The surgeon is about to operate on the boy when they say "I can’t operate on this boy, he’s my biological son!" How can this be?
Makes sense considering they're things of most-likely statistics, after all.
I’m playing around with similar ideas, sometimes called ensembling techniques.
What Amazon appears to have done here is use a transformers based neural network (aka LLM) to translate natural language into symbolic logic rules which are collectively used together in what could be identified as an Expert System.
Full Circle. Hilarious.
For reference to those on the younger side: The Computer Chronicles (1984) https://www.youtube.com/watch?v=_S3m0V_ZF_Q
The problem with expert systems (and most KG-type applications) has always been that translating unconstrained natural language into the system requires human-level intelligence.
It's been completely obvious that LLMs are a technology that let us bridge that gap for years, and many of the best applications of LLMs are doing exactly that (eg code generation)
I do feel that the introduction of generative neural network models in both natural language and multi-media creation has been a tremendous boon for the advancement of AI, it just amuses me to see that which was old is new again.
This sounds like is a fix for a very specific problem. An airline chatbot told a customer that some ticket was exchangeable. The airline claimed it wasn't. The case went to court. The court ruled that the chatbot was acting as an agent of the airline, and so ordinary rules of principal-agent law applied. The airline was stuck with the consequence of their chatbot's decision.[1]
Now, if you could reduce the Internal Revenue Code to rules in this way, you'd have something.
[1] https://www.bbc.com/travel/article/20240222-air-canada-chatb...
I get the vibe VC money is being burned with promises of an AGI that may never eventuate and there's no clear path to.
I pessimistically suspect VCs like the dark mysterious paths since they often have a bigger fool at the end (acquisition).
---
and yet, the paper that went around in March:
Paper Link: https://arxiv.org/pdf/2401.11817
Paper Title; Hallucination is Inevitable: An Innate Limitation of Large Language Models
---
Instead of trying to trick a bunch of people into thinking we can somehow ignore the flaws of post-LLM "AI" by also using the still flawed pre-LLM "AI", why don't we cut the salesman BS and just tell people not to use "AI" for the range of tasks it's not suited for.
Salesmanship is exactly the process of making money out of BS. So bit of a tautology there :-)
The approaches seem very different though. I'm curious if anyone here has used either or both and can share feedback.
By constraining the field it is trying to solve it makes grounding the natural language question in a knowledge graph tractable.
An analogy is type inference in a computer language: it can't solve every problem but it's very useful much of the time (actually this is a lot more than an analogy because you can view a knowledge graph as an actual type system in some circumstances).