I can only hope this data is being incorporated in some way that makes hallucinations less likely.
These models will interject information from their training whether or it is relevant or not. This is just due to the nature of how these models work.
Anyone trying to argue that it doesn't happen that often or anything is missing the key problem. Sure it may be right most of the time, but all that does is build a false sense of security and eventually you stop double checking or clicking through to a source. Whether it is a search result, manipulating data, or whatever.
This is made infinitely worse when these summaries are one and done, a single user is going to see the output and no one else will see it to fact check. It isn't like an article being wrong that everyone reading it is reading the same article, can then comment that something is wrong, it get updated, and so on and so forth. That feedback loop is non-existent with these models
Same problem existed before AI summaries.
"Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know."
– Michael Crichton (1942-2008)
The key word is "real-time". LLMs can't be trained in realtime, so it's obviously going to call an API that pulls up and reads from AP news, just like their search engine.
The on device model that it uses is also literally 1% the size of the large models like Gemini
I would expect that number to go down from 1.3% to below 1% over the course of the year.
There's always a chance what you're reading is wrong - due to purposeful deception, negligence, or accident.
Realistically, hardly anything is 100% accurate besides math.
I work with investigative reporters on stories that take many months to produce. Every time we receive a leak there is an extensive process of proving public interest before we can even start looking at the material. Once we can see it in we have to be extremely careful with everything we note down to make sure that our work isn't seen as prejudiced if legal discovery happens. We're constantly going back and forth with our editorial legal team to make sure what we're saying is fair and accurate. And in the end, the people we're reporting are given a chance to refute any of the facts we're about to present. Any mistakes can result in legal action that can ruin the lives of reporters and shut down companies.
Now, imagine I were to go to a reporter who has spent 6 months working on a story about, for example, a high profile celebrity sexually assaulted multiple women, how the royal family hides their wealth and are exempt from laws, or how multinational corporations use legal loopholes to avoid paying taxes, and said, "oh, 1% of people reading this will likely be given some totally made up details".
Given that stories often have more than a million impressions, this would lead tens of thousands of people with potentially libellous "hallucinations".
It simply should not be allowed.
LLMs have their place, for sure, but presenting the news is not it.
I am quite certain my personal hallucinations level is more than 1.3%, obviously we want our machines to be better than us, but my doctor once said folic acid is not a vitamin.
I'm much more excited about eventual emergence of underground homebrew models without any guardrails...
Not if AI gatekeepers and interest groups have anything to say about it. AI without guardrails could be classified as a "weapon" and made illegal such that we are only allowed to use models produced by regulated entities and meet certain "safety standards" (like how medical software has to be approved by FDA).
Edit: oh, I guess "underground" could be interpreted in a way that these models are still produced and distributed (but secretly, illegally, etc)
We still argued, but we did it from a place of passion, not commission.
Here's some simple example code in Go, for RAG with 5000 arXiv paper abstracts: https://github.com/philippgille/chromem-go/tree/v0.7.0/examp... (full disclosure it's using a simple vector DB I wrote)