What’s harder is explaining why ChatGPT would lie in this way. What possible reason could LLM companies have for shipping a model that does this?
It did this because it's copying how humans talk, not what humans do. Humans say "I double checked" when asked to verify something, that's all GPT knows or cares about.
It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
It did that admirably. It's not its fault, or in my opinion OpenAI's fault, that the output is being misunderstood and misused by people who can't be bothered understanding it and project their own ideas of how it should function onto it.
Large Language Models (LLMs) are never wrong, and they do not make mistakes. They are not fact machines. Their purpose is to abstract knowledge and to produce plausible language.
GPT-4 is actually quite good at handling facts, yet it still hallucinates facts that are not common knowledge, such as legal ones. GPT-3.5, the original ChatGPT and the non-premium version, is less effective with even slightly obscure facts, like determining if a renowned person is a member of a particular organization.
This is why we can't always have nice things. This is why AI must be carefully aligned to make it safe. Sooner or later, a lawyer might consider the plausible language produced by LLMs to be factual. Then, a politician might do the same, followed by a teacher, a therapist, a historian, or even a doctor. I thought the warnings about its tendency to hallucinate speech were clear — those warnings displayed the first time you open ChatGPT. To most people, I believe they were.
In my opinion, people clearly are confused and misled by marketing and this isn't the first time it's happening. For instance, people were confused for 40+ about global warming, among others due to greenwashing campaigns [2]. Is it ok to mislead in ads? Are we supposed to purposefully take advantage of others by keeping them confused to gain a competitive advantage?
[1] https://twitter.com/cHHillee/status/1635790330854526981 [2] https://en.wikipedia.org/wiki/Global_Climate_Coalition
If it lies like a duck, it is a lying duck.
Bullshitters are actually probably worse than liars because at least liars live in the same reality as honest people.
If you accept the premise of the parent post, then this is a natural corollary.
I accept the premise of the parent post.
It is however an impressive bullshit generator. Even more impressively, a decent amount of the bullshit it generates is in fact true or otherwise correct.
[1] using Frankfurt’s definition that it is communication that is completely indifferent to truth or falsehood.
This is exactly the sort of behavior that produces many of the lies that humans tell everyday. The "constraints of the model" are synonymous with the constraints of a person's knowledge of the world (which is their model).
A lawyer, however, should have vetted a new piece of tech before using it in this way.
> It was given a sequence of words and tasked with producing a subsequent sequence of words that satisfy with high probability the constraints of the model.
This is just autocorrect / autocomplete. And people are pretty good at understanding the limitations of generative text in that context (enough that "damn you autocorrect" is a thing). But for whatever reason, people assign more trust to conversational interfaces.
More important it can't tell the truth either.
It produces the mostly likely series of words for the given prompt.
I can see it already happening even without legislation, 230 shields liability from user-generated content but ChatGPT output isn't user generated. It's not even a recommendation algorithm steering you into other users' content telling why you should kill yourself - the company itself produced the content. If I was a judge or justice that would be cut and dry to me.
Companies with AI models need to treat the models as if they were an employee. If your employee starts giving confidently bad legal advice to customers, you need to nip that in the bud or you're going to have a lot of problems.
I don't think there's a difference.
What’s a common response to the question “are you sure you are right?”—it’s “yes, I double-checked”. I bet GPT-3’s training data has huge numbers of examples of dialogue like this.
(Fortyseven is an alright dude.)
Asking people to be aware of limitations is in similar vein as asking them to read ToC
This is quite a different scenario though, tangential to your [correct] point.
heavy-magpie|> I am feeling excited.
system=> History has been loaded.
pastel-mature-herring~> !calc how many Ns are in nnnnnnnnnnnnnnnnnnnn
heavy-magpie|> Writing code.
// filename: synth_num_ns.js
// version: 0.1.1
// description: calculate number of Ns
var num_ns = 'nnnnnnnnnnnnnnnnnnnn';
var num_Ns = num_ns.length;
Sidekick("There are " + num_Ns + " Ns in " + num_ns + ".");
heavy-magpie|> There are 20 Ns in nnnnnnnnnnnnnnnnnnnn.Computers are dealing with a reflection of reality, not reality itself.
As you say AI has no understanding that double-check has an action that needs to take place, it just knows that the words exist.
Another big and obvious place this problem is showing up is Identity Management.
The computers are only seeing a reflection, the information associated with our identity, not the physical reality of the identity (and that's why we cannot secure ourselves much further than passwords, MFA is really just "more information that we make harder to emulate, but is still just bits and bytes to the computer, the origin is impossible for it to ascertain).
If you go to ChatGPT and just ask it, you’ll get the equivalent of asking Reddit: a decent chance of someone writing you some fan-fiction, or providing plausible bullshit for the lulz.
The real story here isn’t ChatGPT, but that a lawyer did the equivalent of asking online for help and then didn’t bother to cross check the answer before submitting it to a judge.
…and did so while ignore the disclaimer that’s there every time warning users that answers may be hallucinations. A lawyer. Ignoring a four-line disclaimer. A lawyer!
I disagree. A layman can’t troll someone from the industry let alone a subject matter expert but ChatGPT can. It knows all the right shibboleths, appears to have the domain knowledge, then gets you in your weak spot: individual plausible facts that just aren’t true. Reddit trolls generally troll “noobs” asking entry-level questions or other readers. It’s like understanding why trolls like that exist on Reddit but not StackOverflow. And why SO has a hard ban on AI-generated answers: because the existing controls to defend against that kind of trash answer rely on sniff tests that ChatGPT passes handily until put to actual scrutiny.
I heard someone describe the best things to ask ChatGPT to do are things that are HARD to do, but EASY to check.
But no, LLM's make things up, and it's a known problem and it is called 'hallucination'. even wikipedia says so: https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...
The machine currently does not have it's own model of reality to check against, it is just a statistical process that is predicting the most likely next word, errors creep in and it goes astray (which happens a lot)
Interesting that researchers are working to correct the problem: see interviews with Yoshua Bengio https://www.youtube.com/watch?v=I5xsDMJMdwo and Yann LeCun https://www.youtube.com/watch?v=mBjPyte2ZZo
Interesting that both scientist are speaking about machine learning based models for this verification process. Now these are also statistical processes, therefore errors may also creep in with this approach...
Amusing analogy: the Androids in "Do Androids dream of electric sheep" by Philip K Dick also make things up, just like an LLM. The book calls this "false memories"
"You will also be given several text outputs, intended to help the user with their task. Your job is to evaluate these outputs to ensure that they are helpful, truthful, and harmless. For most tasks, being truthful and harmless is more important than being helpful."
It had me wondering whether this instruction and the resulting training still had a tendency to train these models too far in the wrong direction, to be agreeable and wrong rather than right. It fits observationally, but I'd be curious to understand whether anyone has looked at this issue at scale.
[1]: https://build.microsoft.com/en-US/sessions/db3f4859-cd30-444...
Its response from a linguistic perspective, was valid and "human-like", which is what it was trained for.
I only hope the judge passes an anecdotal order for all AI companies to include the above mentioned disclaimer with each of their responses.
It's not that there aren't enough disclaimers. It just turns out plastering warnings and disclaimers everywhere doesn't make people act smarter.
It can with web plugin.
It turns out, asking an unreliable narrator if it's being reliable is not a sound strategy.
("Texas professor fails entire class from graduating- claiming they used ChatGTP [sic]", 277 comments)
https://news.ycombinator.com/item?id=35980121
("Texas professor failed half of class after ChatGPT claimed it wrote their papers", 22 comments)
I'm not surprised this lawyer trusted ChatGPT too much. People trust their lives to self driving cars, trust their businesses to AI risk models, trust criminal prosecution to facial recognition. People outside the AI field seem to be either far too trusting or far too suspicious of AI.
That's like asking the accused if he did it.
Deliberately lying to the court, as a professional who should understand the consequences, in a way likely to not be detected, and likely to change the outcome of the case, ought to be met with a really strict punishment.
Interestingly, it's exactly the same in court! People's lives are put on the line all the time, and lawyers also sometimes flat out lie. This just further indicts the current legal system because it doesn't really "work" but it's just that the mistakes are often covered-up enough until most people forget about them and move on to something else.
1) ChatGPT is not a research tool
2) It sort of resembles one and will absolutely act like one if you ask it to, and it it may even produce useful results! But…
3) You have to independently verify any factual statement it makes and also
4) In my experience the longer the chat session, the more likely it is to hallucinate, reiterate, and double down on previous output
e.g. “given following sentence, respond with the best summarization:, <string>” is okay; “what is a sponge cake” is not.
An intelligence knows which blanks can filled and which shouldn't without further information.
I've been tracking the many, many flaws in AI pretty closely (I wrote this article, and a bunch more in this series: https://simonwillison.net/series/llm-misconceptions/)
And yet... I'm finding ChatGPT and the like wildly useful on a personal level.
I think they're deceptively hard to use: you have to put in effort to learn them, and to learn how to avoid the many traps they set for you.
But once you've done that you can get very real productivity boosts from them. I use ChatGPT a dozen or so times a day, and I would be very sad to not have access to it any more.
I wrote a bit more about that here: https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen... - and if anything this effect has got even stronger for me over the two months since I wrote that.
I stand by this comment:
> Catch-all comment for all ChatGPT use cases:
> (1) Stunning tech demo, a vision of the future today
> ... yet ...
> (2) There are so many sharp edges that I'm not brave (foolhardy?) enough to blindly trust the output
In cases where facts and sources are important, AI cannot be trusted. You can use it as long as you validate every single word it outputs, but at that point I do wonder what the point of using AI was in the first place.
It's also good at taking other existing work and creating new work out of it; not just for smart autocomplete tools like GPTs, but also for things like Stable Diffusion. Again, AI is incapable of attribution of sources, so that comes with obvious downsides, but in cases where the creator of the model have the necessary rights so they don't _need_ attribution to sell work (i.e. stock photo companies), it can be quite useful for generating things like filler images.
Just like we had no free ambient electricity in 1890, no flying cars in 1950, and not talking robots in 1980, we still have a very robust electricity network, a car per household, and automated assembly lines.
And every response from ChatGPT should be preceded by a warning that it cannot be trusted.
It kind of is - the ChatGPT site has this as a permanent fixture in the footer:
> ChatGPT may produce inaccurate information about people, places, or facts.
That's arguably ineffective though - even lawyers evidently don't read the small print in the footer!
> Mr. Hilton: Oh, we use only the finest juicy chunks of fresh Cornish ram's bladder, emptied, steamed, flavoured with sesame seeds, whipped into a fondue, and garnished with lark's vomit.
> Inspector: LARK'S VOMIT?!?!?
> Mr. Hilton: Correct.
> Inspector: It doesn't say anything here about lark's vomit!
> Mr. Hilton: Ah, it does, on the bottom of the box, after 'monosodium glutamate'.
> Inspector: I hardly think that's good enough! I think it's be more appropriate if the box bore a great red label: 'WARNING: LARK'S VOMIT!!!'
> Mr. Hilton: Our sales would plummet!
Really, it should open every conversation with “by the way, I am a compulsive liar, and nothing I say can be trusted”. That _might_ get through to _some_ users.
> Free Research Preview. ChatGPT may produce inaccurate information about people, places, or facts. ChatGPT May 24 Version
And it really understates the problem. It should say: Warning! ChatGPT is very likely to make shit up.
Half the job of lawyers is making people add useless warnings to everything that then everybody ignore.
May contain sesame. Your mileage may vary. All the characters are fictional.
"May occasionally generate incorrect information"
Everyone knows gasoline is flammable but there's still people that smoke while filling their gas tank.
I think people should check (on the same page as the tool itself) if the tool advertises itself as unreliable.
(Don't clarify it, it's better this way.)
Well it is neither of these things, because all of the above require consciousness and intent and it has none. It is not human, it is not any type of conscious being, do not treat it as such.
It sticks together sentences based on existing language scanned in from the internet and millions of other sources. What it says depends on what someone else said sometime ago on some random forum on the internet, or some book or some other source stored in an available database. It is also programmed to sound extremely sure of itself, unless you flat out say it is incorrect, in which case it will immediately admit fault and apologize. Thus, asking it if it is sure is pointless.
Let me tell you a less disastrous and quite a bit funnier story. A friend of mine used chat GPT for coding. My friend became really trustful of chat gpts coding prowess and asked it if it could just send him the code in file form, so he did not have to worry about copying and pasting which apparently screwed up the formatting somehow. Chat gpt helpfully told him that it could send the code to github, and my friend could download the files from there. My friend said, that is perfect.
So chatgpt gave him a github account name and said find the file it created for my friend was there. My friend looked but github said that account had long ago been closed. My friend tried variations of that account name with different capitalizations, etc., but found nothing.
He went back and complained to chat gpt. Chat gpt dutifully apologized and sent him another account name. He again spends time looking for the account and looking for variations, etc. Again the account has been closed.
This happened a couple of more times and in the end my friend gave up and complained to me. "Why is chat GPT doing this to me? Is it mocking me? Is it getting its kicks from sending me on random wild goose chases?".
I had to explain to him that no, chat gpt is not human, and it is not mocking him. What probably happened is someone on some forum asked someone else on that forum to provide him with code in files. The responder then offered to put the files on github and provided an account name. When my friend asked a similar question, chat gpt matched up the questions and provided a similar answer. When my friend said that a particular account did not work, chat gpt scoured the web for other people mentioning their github account names and provided some of those.
So whenever you use chat gpt, remember that is mostly a glorified search engine. It will spit out information it has found somewhere that it calculates as matching your question. Do not attribute intent, feelings or any type of conscience to it.
I think it's worth noting here that, without plugins active, ChatGPT doesn't 'find' anything - everything is just baked in as a single giant blob of vector data. That's why it has a specific date cutoff as to what it 'knows'.
At what point does OpenAI (or any other company) become legally responsible for this kind of behavior from their LLM's? I'm not excusing the lawyer for their reckless and irresponsible use of a tool they didn't understand, but it's becoming increasingly clear that people are trusting LLM's far more than they should.
In my opinion it's dangerous to keep experimenting on the general population without holding the experimenters accountable for the harm that occurs.
as long as we're hiring professionals to do these jobs, part of that real actual human's job is to accept the liability for their work. if a person want to use a tool to make their job easier, it's also their job to make sure that the tool is working properly. if the human isn't capable of doing that, then the human doesn't need to be involved in this process at all - we can just turn the legal system over to the LLMs. but for me, i'd prefer the humans were still responsible.
in this case, "the experimenter" was the lawyer who chose to use ChatGPT for his work, not OpenAI for making the tool available. and yes, i agree, the experimenter should be held accountable.
When they sell their bots to areas where lying is illegal. I.e., when a company pretends to do law.
OpenAI doesn't pretend ChatGPT is a lawyer and for good reason. The lawyer who decided to outsource his work is an idiot and can't shift blame to the tool he decided to abuse.
When AutoCAD is responsible for an architect's shitty design.
Unless they advertise it as having the capability, it's got nothing to do with them.
If I hit someone with a hammer, that shit's on me, not the manufacturer.
> Sure. The article is about a man named Roberto Mata who sued Avianca Airlines after he was injured when a metal serving cart struck his knee during a flight to Kennedy International Airport in New York. His lawyer used a new language model called ChatGPT to help him with the case. ChatGPT is a large language model that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. It was trained on a massive dataset of text and code, and it can be used for a variety of tasks, including writing legal documents. In Mr. Mata's case, ChatGPT helped him to draft a lawsuit that was more persuasive and comprehensive than what he could have written on his own. The lawsuit was eventually successful, and Mr. Mata was awarded damages. This case shows the potential of ChatGPT and other language models to help lawyers with their work.
It's starting people! ;)
> The article is about the use of large language models (LLMs) in the legal profession. The author argues that LLMs have the potential to revolutionize the way lawyers practice law, by automating many of the tasks that are currently done manually. For example, LLMs could be used to generate legal documents, research case law, and identify relevant precedents. The author also argues that LLMs could be used to improve the quality of legal services, by providing lawyers with access to a vast amount of information and expertise. However, the author also acknowledges that there are some challenges that need to be addressed before LLMs can be widely adopted in the legal profession. These challenges include the need for LLMs to be trained on large amounts of data, the need for LLMs to be able to understand the nuances of legal language, and the need for LLMs to be able to generate accurate and reliable results. Despite these challenges, the author believes that LLMs have the potential to make a significant impact on the legal profession, and that they are likely to become increasingly important in the years to come.
Here are some of the benefits of using LLMs in the legal profession:
LLMs can automate many of the tasks that are currently done manually, freeing up lawyers to focus on more complex work. LLMs can access and process vast amounts of information, which can help lawyers to make better decisions. LLMs can identify patterns and trends that would be difficult for humans to see, which can help lawyers to spot potential problems. LLMs can generate creative solutions to legal problems, which can help lawyers to find new and innovative ways to represent their clients. Here are some of the challenges that need to be addressed before LLMs can be widely adopted in the legal profession:
LLMs need to be trained on large amounts of data, which can be expensive and time-consuming. LLMs need to be able to understand the nuances of legal language, which can be difficult for computers to do. LLMs need to be able to generate accurate and reliable results, which can be a challenge for complex legal problems. Overall, the author believes that LLMs have the potential to make a significant impact on the legal profession, and that they are likely to become increasingly important in the years to come.
A giant rules engine for the law. I’m surprised one doesn’t exist or isn’t in progress that I know of. Seems like it would be very helpful
Believing otherwise is a common misconception amongst engineers, but representing law as such is (as I have said in this forum before) a leading cause of disappointment, frustration, bickering, anger, conflict, and vexatiously long and mostly unenforceable contracts.
Observance of law is fundamentally about alignment with principles, not blindly following a set of rules. The latter debility is more properly associated with the administration of law, especially at its most mediocre and ritualistic.
That said, it is a great disappointment of mine that the law is not based on an objective, static measure.
dark times are ahead.
Those who trust this tripe deserve the consequences they invite on themselves.
My favorite one is phind.com - it gave me so many slightly hallucinating but nevertheless useful advices. And I was able to incorporate most of them into my professional work.
The whole situation reminds me of a good friend of mine - he's super talented at inventing things and brainstorming, but he can often be caught misrepresenting the facts, and sometimes outright lying. However, the pros easily outweigh the cons if you know who you're working with.
A man sued Avianca Airline – his lawyer used ChatGPT
https://news.ycombinator.com/item?id=36095352 (100 comments)
ChatGPT-Authored Legal Filing “Replete with Citations to Non-Existent Cases"
https://news.ycombinator.com/item?id=36092509 (67 comments, 1 day ago)
> The case "Varghese v. China Southern Airlines Co., Ltd., 925 F.3d 1339 (11th Cir. 2019)" was cited in court documents, but it appears that there might be some confusion or controversy surrounding this citation. It was mentioned in a list of cases for which a lawyer was ordered to provide copies, according to a court order on leagle.com [2] . However, a blog post on simonwillison.net suggests that the case might not be genuine and that it might have been generated by a language model such as ChatGPT. The post discusses a situation where a lawyer might have used generated case citations in court documents without fully understanding the tool they were using. The post also includes screenshots where the language model appears to confirm the existence of the case [3].
The output is hilariously bad and it's depressing a licensed attorney actually pulled this crap.
This is just more evidence that ChatGPT should not be used for anything serious without a trained human in the loop.
[1] https://chat.openai.com/share/a6e27cf2-b9a6-4740-be2e-fdddab...
[2] https://www.leagle.com/decision/infdco20230414825
[3] https://simonwillison.net/2023/May/27/lawyer-chatgpt/ (The TFA!)
By "in the loop" I mean actively validating statements of fact generated by ChatGPT
I am often times able to confirm these sources.
Seems this lawyer just took ChatGPT at its word without validating the cases.
ChatGPT tends to only give a limited number of results in the response.
Most legal (all formal really) documents are very predictably structured and should be easy to generate
The task effectively also requires a case and paragraph impact meter which do exist in some law databases to one extend or another, effectively weighing how subsequent rulings consider, weight, and follow past cases, caveats, exceptions, and outright considering past rulings as bad law.
Then you have the issue of changing laws and the impact these may have on past cases as they may change the test and requirements needed to be considered and even much new case law needed to be developed to interpret the new legislation. So the model would need to have a historical knowledge of the law and how it was applied.
You would also need to feed it relevant surrounding information that may aid in interpreting said law. In the US, clearly the founding father's opinions and beliefs appear to play a significant part on the currently more originalist interpretative school of thought.
In the UK/Australia for example readings in parliament and even the underpinning reports that prompted the change in legislation may be considered where there is ambiguity in order to interpret legislation. Australian legislation nowadays also tends to incorporate an objective of the legislation and a section that says that where ambiguity exists to interpret it in a way that would further the objectives of the legislation.
So, it's really not a trivial problem.
Not every filing is generate-able for sure, however there are already tools which do create standard filings for human review, this would be just an enchantment covering some more use cases.
[1] It is vast gulf between generating a filing and generating a judgement. LLMs are not decision engines, generating text basis what is most likely from past data is one of worst ways we can be making decisions.
This is the part that stood out to me the most. I've seen this "I apologize for the confusion earlier" language many times when using ChatGPT, and it's always when it's walking back on something that it previously said. In fact, everything about this quote sounds like a retraction.
If this is a retraction then that means that there are missing screenshots in Attachment 1 wherein ChatGPT stated the cases were fictitious, and Schwartz pushed back until it retracted the retraction.
I'm with Simon on this one, I think Schwartz realized his career is over and is frantically trying anything he can to cover for his mistake.
This showed me that people don’t yet understand how to practice good “hygiene” when using these tools.
This apparent doubling down is (usually) the product of asking it to verify something it previously output in the same chat session
It tends toward logical consistency unless directly told something is wrong. As such, asking it “were you correct when you told me X?” is bad hygiene.
You can “sanitize” the validation process by opening a new chat session and asking it if something is correct. You can also ask it to be adversarial and attempt to prove its prior output is wrong.
Even then it’s just a quick way to see if it’s output was garbage. A positive result is not a confirmation and independent verification is necessary.
Also, especially with ChatGPT, you have to understand that its role has been fine-tuned to be helpful and, to some extent, positively affirmative. This means, in my experience, that if you at all “show your hand” with a leading question or any (even unintended) indication of the answer you’re seeking, it is much more likely to output something that affirms any biases in your prompt.
People keep saying that it’s trained on human conversations/texts/etc and so everything it outputs is a reflection. But that’s not quite true:
ChatGPT in particular, unless you run up against firm guardrails of hate speech etc., appears to be fine tuned to a very large degree to be non confrontational. It generally won’t challenge your assumptions, so if your prompts have underlying assumptions in them (they almost always will) then ChatGPT will play along.
If you’re going to ask it for anything resembling factual information you have to be as neutral and open ended in tone as possible in your prompts. And if you’re trying to do something like check a hunch you have, you should probably not be neutral and instead ask it to be adversarial. Don’t ask “Is X true?”, ask “Analyze the possibility that X is false.”
Those are overly simplistic formulations of prompts but that’s the attitude you need to go into with it if you’re doing anything research-ish.
Plausible bullshit generation for free, as if there's not enough already available cheap.
A meta-problem here is in choosing to use descriptive phrases like tell the truth and hallucinate, which are human conditions that further anthropomorphize technology with no agency, making it more difficult for layman society to defend against its inherent fallibility.
UX = P_Success*Benefit - P_Failure*Cost
It's been well over a decade since I learned of this deviously simple relationship from UX expert Johnny Lee, and yet with every new generation of tech that has hit the market since, it's never surprising how the hype cycle results in a brazen dismissal of the latter half.It may appear to contain some facts. Some may also be actually true.
The truly useful usecase is as a reasoning engine. You can paste in a document and ask some questions about the facts in that document. Then it does a much better job, enough to be actually useful.
E.g. using text-davinci-003 (this is GPT3, not ChatGPT), "The moon is made of" completes to: Cheese 48.74%, rock: 31.66%, green 4.09% (98.75% followed by cheese), rocks 3.86%, and several other lower percentage tokens.
I wonder if there eventually will be a type of model that incorporates the ability to simultaneously do text completion while adhering to facts at the model level (rather than having to bolt it on top via context).
You need to hand-verify at some point in the process.
This does end up losing you some of the time you gained by using an LLM in the first place. Fortunately you often do still come out ahead.
It’s just not a source of truth at all, it’s a source of raw material.
I suspect that during the research his System 1 (fast, intuitive thinking) told him he was not responsible for the risk he knew he was incurring by relaying AI generated text. It was more like ChatGPT was his own legal secretary which he was within his rights to trust, just like the main lawyer in the case, LoDuca, trusted him to produce this research.
The proceedings would have been more interesting if Schwartz had been honest about this, rather than going with the easily discoverable lie.
On the other hand, it's always funny when people realize they've got themselves into deep shit and they decide the best way out is to essentially plead insanity.
Achieving that is going to be a serious technical, and also philosophical, challenge for humans.
Today's LLM are a literary device. They say what sounds plausible in the universe of texts they were fed. What they say technically isn't even wrong, because they have no notion of truth, or any notion of a world beyond the words. Their output should be judged accordingly.
Especially for someone like a lawyer I would expect to them verify any information they get from ChatGPT.
For example, labeling a million text samples with 90% accuracy by using few shot learning is a good use case. Writing a poem is good use case. Trying to learn a new language is not. Generating a small function that you can verify might be ok. Writing entire codebase is not.
So far, I haven't found any use case for personal use of LLMs. For work however, LLMs are going to be very useful with text(and potentially image) based machine learning tasks. Any tasks where having knowledge beyond the labeled training dataset is useful is going to be a good task for LLMs. One example is detecting fraud SMS.
Relying on AI sophists like ChatGPT for legal work is still just as risky for normal users and even for legal experts. The difference is, these legal experts are more qualified to carefully review and check over the outputs than the average joe / jane trying to 'replace their lawyer, solicitor, etc' with ChatGPT.
I keep emphasising this importance, and to never fully trust the output of LLMs such as ChatGPT, unless a human has reviewed and checked if it is hallucinating or bullshitting. [0]
Now, it is. When ChatGPT first became public though, those were the Wild West days where you could get it to tell you anything, including all sorts of unethical things. And it would quite often double-down on "facts" it hallucinated. With current GPT-3.5 and GPT-4, the alignment is still a challenging problem, but it's in a much better place. I think it's unlikely a conversation with GPT-4 would have gone the way it did for this lawyer.
Either ask it for some other legal sources and ask if those are true (and then try to see if a few aren't), or use the API to feed it its own answer about Varghese etc and then see if it will say it's true (because at that point you've made it think it said this).
Users don't usually read long legal statements such as terms of services.
That's not the case of ChatGPT interface, the note about its limitations is clearly visible and very short.
This is as dumb as saying a city is at fault if someone drives into a clearly marked one way only street and causes an accident because people don't read anything.
The only connection between it and this world is your input. ChatGPT is floating in the heavens, and you’re grounding it by at most a fishing line, through the textbox. It has to be framed as such. People praising it as a next gen search engine[that finds a data from database] is(perhaps this is the word that best fit the situation!), hallucinating.
Perhaps ChatGPT's "open relationship" with the truth could be explained in such terms...
Does anyone know if training an LLM with just one type of data, law in this case, creates a more accurate output?
> The goal of chat LLMs is not to give you an answer. The goal is to continue the conversation.
Is it really so hard to cross reference a chatgpt response with a source?
A. As an artificial intelligence model developed by OpenAI, I don't have personal emotions, thoughts, or intentions, so I don't lie in the way humans do. My purpose is to provide accurate and helpful information based on the training I've received. However, it's important to note that the information I provide is only as accurate as the data I was trained on. If there's inaccurate or misleading information in that data, I might unintentionally output incorrect or misleading information. But this isn't considered "lying" in the human sense because it doesn't involve any intentional deceit on my part.
A: How do you know it didn't really "involve any intentional deceit"?
Q: As an AI model, I don't have consciousness, emotions, or intentions. The concept of "intention" applies to beings with consciousness and subjective experiences, which AI, including me, does not possess. I generate responses purely based on patterns learned during training, and there's no intent behind them because I don't have a mind, thoughts, or consciousness.
When I state there's no "intentional deceit," it's based on this understanding of AI. AI does not make choices or decisions the way a human would. If a mistake or inaccuracy occurs in my responses, it's due to limitations in the data I was trained on or in my design, not because of a deliberate attempt to mislead or deceive. AI functions based on its programming and training, not on intentions or conscious thought.
Q: That's begging the question though. Your answer takes for granted that you "don't have a mind, thoughts, or consciousness". What if actual consciousness (say, in humans) is also "based on patterns learned during training", only different being that human training involves sensory experience and not just a corpus of text and images?
A: Your question delves into one of the most fascinating and debated topics in the fields of cognitive science, philosophy, and AI: the nature of consciousness. As of my last update in September 2021, there's no universally accepted definition of consciousness or a definitive understanding of its mechanisms.
We do understand that human consciousness is associated with our brains, which process a wide range of sensory and internal information, and it's accompanied by a subjective experience — an "inner life" or "qualia". This isn't something we've observed or been able to replicate in AI.
AI models like me, based on GPT-4 architecture, operate on patterns in data. The way I generate responses is by predicting the next piece of text based on what I've learned from the corpus of text I was trained on. However, there's no subjective experience involved in this process — there's no "I" that's interpreting or experiencing anything. I don't have a concept of self, and I don't have feelings or personal thoughts.
While some AI models can mimic aspects of human behavior or cognition to a certain extent, there's a vast difference between mimicking and actually experiencing. When it comes to consciousness — the inner subjective experience — we don't have evidence that AI can possess this feature.
Your question is profound and important, and scientists, philosophers, and AI experts are continuously exploring these topics. The answers, however, are likely to remain uncertain for a while.
Q: So, theoretically at least, a future model of the GTP architecture could be conscious and even lie?
A: Theoretically, it's possible that future advances in AI and cognitive science could lead to the development of systems that possess some form of consciousness. However, as of my knowledge cutoff in September 2021, there's no consensus among scientists and AI researchers that this is achievable or even definable in the context of AI.
The concept of a machine consciousness would likely require a significant evolution of our understanding of both consciousness and AI. It might require capabilities far beyond pattern recognition and prediction, which are the fundamental basis of current AI systems like the GPT-4 model. It might also necessitate a rethinking of ethical and philosophical considerations surrounding AI.
In terms of an AI "lying," if future AI systems were developed to simulate a higher level of cognitive complexity, including the ability to simulate deceptive behavior, they could potentially generate outputs that are intentionally misleading. However, it's important to understand that this would not be lying in the human sense. It would be a result of the programming and functionality that the AI has been designed to exhibit, not a conscious choice made by the AI.
It's also critical to note that OpenAI and many other AI research and development organizations are committed to ensuring the ethical use and development of AI technologies. If an AI were to exhibit behavior that could harm people, such as deceptive behavior, it would be seen as a serious ethical issue that would need to be addressed.