We all know now that ChatGPT is just autocomplete on steroids. It produces plausibly convincing patterns of speech.
But from the way it's built and trained, it's not like there's even any kind of factual confidence level you could threshold, or anything. The concept of factuality doesn't exist in the model at all.
So, is any progress being made towards internet-scale ML "fact engines" that also have the flexibility and linguistic expressiveness of ChatGPT? Or are these just two totally different paths that nobody knows how to marry?
Because I know there's plenty of work done with knowledge graphs et al., but those are very brittle things that generally need plenty of human curation and verification, and can't provide any of the (good) "fuzzy thinking" that ChatGPT can. They can't summarize essays or write poems.
Yes. It's a very active area of research. For example:
Discovering Latent Knowledge in Language Models Without Supervision (https://arxiv.org/abs/2212.03827) shows an unsupervised approach for probing a LLM to discover things it thinks are facts
Locating and Editing Factual Associations in GPT (https://arxiv.org/pdf/2202.05262.pdf) shows an approach to editing a LLM to edit facts.
Language Models as Knowledge Bases? (https://aclanthology.org/D19-1250.pdf) is some slightly older work exploring how well LLMs store factual information itself.
Yann Lecun has posted a lot recently about this but basically LLMs are a "useful offramp on the road to AGI".
In fact many propose that when you train an LLM, in order to be able to predict the next word with enough accuracy, it must internally build a world model.
Yann Lecun is very salty about chatgpt, I wouldn't take his word seriously.
At its core using an LM alone to solve factual problems seems silly: It's not unlike asking Dall-E to draw DOT compliant road signs.
I've gone at length at how unfortunate it would be if LMs start to get a bad rap because they're being shoehorned into being "Ask Jeeves 2.0" when they could be so much more.
I love that. That's going to be my new explanation for people around ChatGPT.
For some reason it seems so much more obvious when Dall-E does something close but still totally wrong (e.g. 3 or 6 fingers, 3 arms, etc.), but it's not immediately obvious with text. But it's still the same underlying principles.
Then there are efforts that look like this one: https://news.ycombinator.com/item?id=34821414 They go probing for specific capabilities of Transformers to figure out which cell fires under some specific stimulus. But think a little bit more about what people might want from explainability and you quickly find that something like this is insufficient.
There may be a tradeoff we're looking at where explainability (for some definition of it) will have to be exchanged for performance (under some set of tasks). You can build more interpretable models these days, but you usually pay for it in terms of how well you do on benchmarks.
Specific example: "why do we have first-person subjective experiences? List current theories. For every theory, assign a truthiness float 0..1, where 0 means you're sure it is wrong, and 1 means you're absolutely sure it is true"
From experimenting with this, it will shift the output, sometimes drastically so, as the model now has to reason about it's own certainty; it tends to make significantly less shit up (for example, the non-truth-marked version of the output for the query above also listed panpsychism; whereas the truth-marked version listed only scientific hypotheses).
So the model _can_ reason about it's certainty, and truth-value; and I strongly suspect it was just not rewarded during RLHF for omitting things it knew to be false -basically, percolating the social lies people tell to eachother- which seems to show up in coding as well.
Edit: see https://twitter.com/sdrinf/status/1629084909422931969 for results
I promise you most people do not know this.
This is an example of a whole range of beliefs about LLMs that are very common (even in the field itself), because they were obviously true for small models, but that might not necessarily hold for larger models. There's a lot that we don't know about LLMs, but we do know that they exhibit emergent behaviors as they scale. Smaller models don't really have world models, just language models, but these larger models have started developing clear world models once given the capacity and data to do so.
As for the existence of a concept of factuality, I found this paper[1] very interesting. It details an unsupervised method to identify which internal activations of the model correspond to factual statements, regardless of what the model ends up saying. Looking at those internal activations rather than just the model's output even reduces the model's susceptibility to prompts that lead it towards saying the wrong answer.
I wouldn't hold my breath. The whole idea of statistical language modelling (much more ancient than Transformer-trained large language models, btw) is to represent structure without having to represent meaning, because we have no idea how to represent meaning. Or, seen another way, we know how to represent structure, but not how to represent meaning, so let's focus on structure and cross our fingers that meaning will naturally sort of emerge, when it feels like it.
So far, we got structure down pat (it's been a few years now, or quite a few, depending on how you see it) but meaning is nowhere to be seen.
Nevertheless, this is an interesting scientific result: one can have smooth, grammatically correct linguistic structure without meaning. Progress has been achieved (and no, this is not sarcasm).
I'm not super familiar with ChatGPT internals, but there are plenty of ways to tack on uncertainty estimates to predictions of typical "large scale ML models" without touching Bayesian stuff (which only work for small scale academics problems). You can do simple parametric posteriors estimation or if all you have is infinite compute and don't even want to bother with anything "mathy", bootstrapping is the "scalable / easy" solution.
you could cross check this stuff too with yet more models.
Whether or not the information that comes back from those searches is reliable is a whole other question.
I would love to learn what the latest research is into "factual correctness" detection. Presumably there are teams out there trying to solve that one?
Personally, I use ChatGPT (the paid version) and Copilot every day and find them awesome enhancers.
The rich keep getting richer.
Monetize it!
Evil answer: Partner with an advertiser and sell https://api.opencagedata.com/geocode/v1/json as an ad space. This may be the first opportunity for an application/json-encoded advertisement.
Nice answer: Partner with an actual phone lookup platform and respond with a 301 Moved Permanently at the endpoint.
It’s an unprincipled hack, a bizarre dependency to add to your project, it probably feels like admitting defeat to the all powerful AI… but it does 90%-solve the problem.
https://www.robertxiao.ca/hacking/locationsmart/ is an example of one provider's public demo (requiring only a phone number) being used to provide non-consensual location data.
I think white pages are still a thing, no?
For the young 'uns - the white pages were part of the physical phone book in every city. You got a new phone book delivered to your doorstep each year for free. Yellow pages listed the phone numbers of every business, white pages listed the phone numbers of the residents.
The crazy part is: almost everyone added their numbers voluntarily to the white pages, because you wanted people to be able to easily find and reach you.
https://www.whitepages.com/reverse-phone
> Whitepages free reverse phone lookup service allows you to enter a phone number and quickly find out who called you. Find the phone owner's full name, address, and more.
[snip]
> Anyone can do a reverse lookup to identify cell phone, residential, and business numbers for free.
That, or you could get a normal white pages and process it using some sort of data processing tool... nah, that's science fiction.
uh, you mean stalker / scammer platform? This would be a major privacy violation.
So "7 Carmine St, New York, NY 10014" will return "(40.7305290, -74.0020706)" and vice versa.
There are youtube tutorials claiming you can do phone lookups using their service. What these youtube tutorials really do is use some other library to determine the country name from the phone number. Then they call the OpenCage geolocation API with the country name as the address input.
I am hoping that in a year from now people will be more skeptical of what they hear from conversational AI. But perhaps that is optimistic of me.
It’s worse than that. It’s wrong, you cannot correct it and it makes up supporting citations on the fly. Very few humans behave like that.
https://www.economist.com/science-and-technology/2023/02/22/...
But in the past, HN users "corroborated" that Apple is spying on them etc. Fabrication is well and alive among us.
Providing detailed information on the usage of a service that has never existed is a brand new kind of incorrect that is carelessly causing the rest of us grief.
I trust Alexa & Siri completely though.
ChatGPT can only bullshit
Could someone push "wrong" opinion heavily online to sway the opinion of AI?
I can only imagine a bot that learned from 4chan.
Dreams can come true…
Even if it’s wrong, dangerous, misleading, fundamentally flawed as a concept whatever. Big tech and money will find ways to keep putting it in front of us.
Even worse because it has no clue when it might be completely wrong and yet it will be confident in its answer.
Dozens of people are signing up to our site every day, then getting frustrated when "it doesn't work".
Please do NOT trust the nonsense ChatGPT spits out.
If this was new market opportunity, just publishing a falsehood would do the same job.
the malady is that LLMs cannot do operational adhoc changes such as these kinds of errors at scale
For example I asked ChatGPT to show me how to use an AWS SDK "waiter" to wait on a notification on an SNS topic. It showed me code that looked right, but was confusing functions in the SQS library for those that would do the thing with SNS (but SNS doesn't support what I wanted)
For example - https://platform.openai.com/playground/p/default-translate-c...
The codex models are intended for doing work with code rather than language and may give better results in that context. https://help.openai.com/en/articles/6195637-getting-started-...
It may be more helpful to look for better answers on Amazon's help pages for SNS and AWS SDK.
It could just as well state that humans have 3 legs depending on its training set and/or time of day. In fact it has said similar BS.
I think that’s a bit pedantic and not very helpful… I’m not typing this comment, my brain is just sending signals to my hands which causes them into input data into a device that displays pixels that look like a comment
Well, if you're just fed a corpus, with no real-time first-person strem of experience that you control, no feedback mechanism, no higher level facilities, and you're not a member of a species with a proven track record of state-of-the-art in nature semantic understanding, then maybe...
And even that, they do badly.
I mean, you could say that about a person too, as you don't know how much that they are saying is bullshit.
For one, you are technically correct about ChatGPT not recommending. It cannot perform such action. On the other hand, from the POV of the questioner, it's hard not to feel being recommended something when you ask "What do you recommend" and it says "I recommend that...". You are, for some intents and purposes, being recommended something at that point.
Now, humans could very well also be statistical inference machines. But they have way more tricks up their semantic-level understanding sleeves than ChatGPT circa 2023.
Honestly it looks more like OpenCage is trying to rehash the same issue for more clicks by spinning it off the hugely popular ChatGPT keywords. Wouldn't be too surprised if they created the original python utilities themselves just to get some publicity by denouncing them.
1. https://blog.opencagedata.com/post/we-can-not-convert-a-phon...
We do have python tutorials and SDKs showing how to use our service for ... geocoding, the actual service we provide.
I wrote the post mainly to have a page I can point people to when they ask why "it isn't working". Rather than take the user through a tour of past posts I need something simple they will hopefully read. But fair point, I can add a link to last year's post about the erronious youtube tutorials as well.
What I think you can't appeciate is the difference of scale. A faulty youtube video drives a few users. In the last weeks ChatGPT is sending us several orders of magnitude more frustrated sign-ups.
It may be an old problem, but I guess users are more use to a random YouTube video with wrong information. But the computer is always right so ChatGPT is always right, so users may be more annoyed to discover that the recommendation is wrong and blame them instead of ChatGPT.
This is a service that OpenCage provides, and for whatever reason OpenCage happens to be one of the popular services for this use case. (Maybe it’s because you get the text description of location back right away without having to do a round trip through a heavyweight on-screen map, maybe their free tier allows more requests than most, maybe their api is easier to use, maybe they are lucky or skilled with SEO and their tutorial happens to be the first result for some common phrases, who knows.)
So there’s this process that starts with a search for “convert phone location to address”, often involves the OpenCage api, and ends with a happy developer getting the information they wanted. Various algorithms pick up on the existence and repeated traversal of this happy path.
In another part of the internet, code tutorial content farms notice a demand for determining an incoming call’s location from the number that’s calling. They search for things like “convert phone number to location” and “convert phone number to address”. Some of these searches end up falling into the nearby well-trodden path of “convert phone location to address” and the content farmer is presented with the OpenCage api. They mess around with the api for a bit and find they can start from a phone number and get a successful api call that returns a lat/lon pair. A successful api call that returns legitimate-looking lat/lon data is all they need to make a video, they make it and post it. Higher-quality, more scrupulous code tutorials attempt to answer this same demand but find it’s not possible, so those tutorials don’t get made, leaving the less scrupulous ones that stop with a successful-looking api call to flourish in this space. The tutorial is doing well, so the content farms endlessly recycle it into blogspam.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is these tutorials, and makes a post about it.
Some time later, ChatGPT is released. People are astounded with its ability to write code and start using it for this purpose. Naturally, some of those people have the same demand as the previous generation of devs who stumbled onto the unscrupulous code tutorials. Because of the blogspam, ChatGPT’s training data includes many variations on the tutorial, and just as naturally it ends up reproducing that tutorial when asked - except ChatGPT’s magic kicks in and instead of including (what its embeddings see as) some weird unrelated area-code-to-string nonsense from the tutorial, it just bullshits some plausible-sounding data plumbing code instead. Unfortunately, because the tutorial never worked in the first place, that weird hacky irrelevant bit that ChatGPT ignored happened to be the secret sauce that makes the whole thing superficially appear to work.
As a result, OpenCage starts getting weird usage patterns, tracks them down, finds the source is ChatGPT, and makes a post about it.
In deference to Hacker News’ policy of keeping comments pleasant, I will elide the analysis of the process that leads to comments accusing OpenCage of nefariously engineering the whole thing for attention.
And it further implies that these people don't immediately follow that thought with: "That's surely impossible, since it would be a privacy nightmare if literally everyone in the world could track everyone else in the world's every move".
Or perhaps with this alternative thought, which would lead to the same conclusion: "let's not worry about privacy, how would this even work? Does every phone company in the world pro-actively send every customer's location data to OpenCage, just in case someone queries it? Or does OpenCage wait until it gets a query, and only then query the cell phone company 'just-in-time'? Both of these sound like a lot of work for each phone company to support ... what's the incentive?"
Honestly, I'm a bit surprised that the OpenCage blog post is so calm about this, instead of just yelling incoherently "why WHY why would anyone think like this?!?"
But yeah, don't trust ChatGPT for anything. Just earlier today I tried my darnedest to convince it that 2 pounds of feathers doesn't weigh the same as 1 pound of bricks, and it just would not listen, presumably because it just regurgitated stuff related to the common "1 pound of feathers and 1 pound of bricks" question.
By the way, the last paragraph has some typos:
> I wrote this post to have a place to send our new ChatGPT users when they ask why it isn’t work, but hopefully also it serves as a warning to othrs - you absolutely can not trust the output of ChatGPT to be truthful,
> does 2 pounds of feathers weigh more than 1 pound of bricks?
> No, two pounds of feathers weigh the same as one pound of bricks. The weight of an object is determined by its mass, and one pound of any object weighs the same as one pound of any other object. However, feathers are less dense than bricks, so two pounds of feathers will take up more space or have a larger volume than one pound of bricks.
Huh. It's interesting that some of these "wrong paths" are quite so repeatable. Wonder if that means there's also a way to band-aid them when found, the way a human might be corrected for a misconception.
> does 2 pounds of bricks weigh more than 1 pound of bricks?
> No, 2 pounds of bricks do not weigh less or more than 1 pound of bricks. 2 pounds of bricks and 1 pound of bricks refer to the same unit of weight, which is a pound. Therefore, they weigh the same, which is one pound. The difference between them is only the quantity, not the weight.
> It's similar to saying that two one-dollar bills do not have more value than one five-dollar bill. Even though you have two bills in one case and one bill in the other case, the total value is the same.
Agreed. But then it begs the question: what purpose does ChatGPT serve (other than for entertainment purposes or cheating on your HS/college exam)? If you have to verify its information by other means, then you're not really saving much effort.
Give it some structured data and ask it to summarize it (e.g. hourly weather data and it gives a better summarization than a template based one).
Give it HN titles and the categories and it does a passable zero shot tagging of them ( https://news.ycombinator.com/item?id=34156626 ).
I'm toying around with making a "guided bedtime story generator". A friend of mine uses it to create a "day in the life of a dinosaur" stories for a child (a different story each day!)
The key is to play to its strengths rather than testing its bounds and complaining that they break in weird ways when they will inevitably break in weird ways.
Just like any piece of code we write. We have to test, debug, verify and it still might have errors after that. And in scientific papers the conclusions are often contradicted by other papers.
The correct way to use it is to set up a verification mechanism. Fact checking, code tests, even ensembling predictions to see if they are consistent might help. In some cases we can set up a game and use the game winner as indication of correctness (like AlphaGo).
But sometimes only running a real life experiment will suffice. That's why human scientists need experiments - because humans are just like LLMs, but with external verification as part of a game (of life).
I'm curious: why not? It seems like a lot of people would be interested in this if you could figure out how to provide it.
If a phone number is for a mobile phone then looking up the location doesn't make sense at all: mobile phones are mobile.
I guess you could try and crawl an index of business phone numbers and associate those with the listed address for businesses, but that's a completely different business from running a geocoder.
You could provide a bit of geographical information about the first three digits of a US phone number. I imagine that's not what users are actually looking for though.
I expect there are also patterns in other countries?
If you are a mobile network operator.
Or, you can convince people to install something on their phone that sends you their location along with their phone number.
you mean like scammers and stalkers? (ok, and probably Meta)
It correctly stated that they are, but when it went on to prove that this was the case, it generated a table of the frequencies of the individual letters in these two words, and the table looked like this.
Letter | Frequency in | Frequency in
| “pannekake” | “kannepake”
- - - - - - - - - - - - - - - - - - -
a | 2 | 2
e | 2 | 2
k | 2 | 2
n | 2 | 2
p | 2 | 2
This reminded me that yes indeed, AI just isn’t quite there yet. It got it right, but then it didn’t. It hallucinated the frequency count of the letter “p”, which occurs only once, not twice in each of those words.The OP's example is Unwanted demand, but it clearly shows that ChatGPT can funnel potential customers towards a product or service.
Or it marks the beginning of the next "AI Winter."
> but it clearly shows that ChatGPT can funnel potential customers towards a product or service.
And the next logical step is "chatgpt keywords advertising." Which is right back where we started.
but the real api doesnt give results that the user asked ChatGPT for
that is amusingly alarming
It’s so common to want to know where does a incoming call come from that it’s built-in in iOS. It has nothing to do with stalking, just with guessing if who’s calling you is a scammer or a company trying to sell you stuff.
Bonus points for using ChatGPT to implement this end-to-end.
Screenshots https://imgur.com/a/sNR87c7 You can see the OpenCage logo on the bottom right of the images. We wrote a separate blog post about that about a year ago, we felt today's blog post would be too long if we added those screenshots, too.
At the top you should have a diagram like this:
Lat, lon <- opencage -> address
With a few examples underneath.
Your actual homepage is indeed much better.
But also last night I tried for 30 minutes to get it to write me some fairly simple HTML parsing code. The tricky part was I couldn't use DOMParser since it was running on Cloudflare Workers and it could never produce any working implementation using HTMLRewriter or regex no matter how many examples I gave it
Anyways, I wrote a solution using HTMLRewriter in 10 minutes...
I'm waiting for people to start calling me to ask questions about something ChatGPT said, and I'll tell them it's wrong. Then they'll start arguing with me and saying if ChatGPT said it, it must be right, and I must be wrong. And then I'll need to waste time proving that this idiotic chat bot that is spewing out garbage is, in fact, spewing out garbage.
It's unconscionable. If there was no robot in the loop here, and it was people mis-transcribing youtube to compile e.g. Google search optimisation we'd call it fraud.
A better name for now would be PlausibleGPT.
They have to get an API key from you. What about a large warning at the start of that process telling them that this isn't a service you provide?
ChatGPT as business line lead generator—is there anything it can’t do?
One of the simplest AIs is a recommender. We put guardrails on using its predictions inside ecommerce apps by limiting what it learns from (purchases for instance) and limiting what it is used to predict (purchases). When Facebook uses a recommender it learns from time-on-site (a value to FB but not necessarily to the user and a complex behavior that can be comprised of may non-beneficial sub-behaviors) and use it to recommend things that lead to more time-on-site. This application is dangerously devoid of guardrails as so much recent evidence has shown.
Now we have a text generating AI that has been trained from a great swath of human knowledge. That means the teachings of Gandhi as well Hitler, etc. What do you expect it to "know" as truth? Generative AI that is used to generate thoughts from this training corpus MUST have contradictory and downright evil ideas since it has no way to judge between ideas it learns from.
Generative AI in this form can be nothing but psychopathic until guardrails can be devised to limit its psychopathic responses OR the corpus it learns from can be labeled in a way to flag what is "bad", if we can even agree on what that means.
Psychopaths can be useful if they are knowledgeable but beware, you are talking to a psychopath in ChatGPT.