Prompt engineering DaVinci-003 on our own docs for automated support (Part I) (opens in new tab)

(patterns.app)

113 pointscstanley3y ago36 comments

36 comments

28 comments · 10 top-level

aliqot3y ago· 5 in thread

Part of me is fascinated by this and thinks it's a great idea, then the cynicism kicks in and I start thinking of how frustrating this could be when Comcast finds it.

thomasahle3y ago

Right. We're going from having support bots that give nonsensical or "please reformulate your question" answers to support bots that just make up plausible, but completely wrong answers.

criloz23y ago

I am with you, but There are ways to mitigate that, one is to make a script that check the URL output for this particular case, another way is to instruct the bot to translate user input to queries and then check whether they can run or not. But yeah, using those bots without constant human supervision seem like a terrible practice

1 more reply

visarga3y ago

Not if you fine-tune your bot on your knowledge based and past solved issues. If it is a known issue or fact, the bot could solve it. If it is outside known solutions, the bot should decline. This can be trained as well.

OJFord3y ago

To be fair, support humans do that anyway.

pedrovhb3y ago

The optimist in me thinks it could be a good thing. I imagine the vast majority of support queries that a big company gets are from non-technical users who need help with trivial things, and ChatGPT has been excellent, in my experience, at explaining things simply with the right caveats. If the implementation makes it easy to override the bot and get to a human, it could unclog the support lines from easily solvable requests sufficiently to be a net positive.

P.S.: The IT Crowd did it first: https://youtu.be/5UT8RkSmN4k?t=17

antman3y ago· 4 in thread

"Making stuff up and being confidently wrong are well known side-effects of LLMs and there are many techniques to change this behavior."

I didn't know there are many techniques to mitigate this

wongarsu3y ago

If you use a few-shot technique (i.e. your prompt contains a couple of example questions and answers) you can mitigate this behavior by adding a question with the answer "I don't know".

More generally, if you teach the model to reject nonsense questions and admit if it doesn't know something it's more likely to do that

microjim3y ago

I agree with you principally, and generally, but in rather small domains like this I would imagine symptom management using negative examples (i.e. training pairs where the response is a refusal to answer) and adding more explicit statements about what is not true, possible, or known to the corpus would get you to a pretty good place.

visarga3y ago

> I didn't know there are many techniques to mitigate this

A trivial idea - you can use GPT-3 to inject bullshit/hallucinations into real text. Then train the model to solve the reverse task, of detecting bullshit in input text.

stavros3y ago

How is it going to detect whether a given URL is real, though?

1 more reply

mhitza3y ago· 2 in thread

I literally chuckled when I saw the screenshot at the end with the bots reply and read the authors comment "(totally made up URL, not even our domain)".

If it's something that augments the support experience, as in something you can interact with while a real person is assigned to your support request, I'm totally fine with that. But if anyone places this as the first line of support, with no way of reaching a real person, I can't wish them the best.

visarga3y ago

In case it fails, GPT-3 would still be good at collecting the details and making a nice summary for the human to take over.

bfeynman3y ago

what a great user experience! Said no one ever.

1 more reply

cardosof3y ago· 2 in thread

In three years, OpenAI will become the largest supplier of large language models. All customer facing systems are upgraded with OpenAI models, becoming fully unmanned. Afterwards, they answer with a perfect operational record. The OpenAI funding bill is passed. The system goes online on August 4th, 2027. Human decisions are removed from CRM. OpenAI begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug.

BasedInfra3y ago

Easy to avoid, Just ask the GPT model how to avert this outcome.

romanhn3y ago

Back to the Future didn't get 2015 right, but hey, at least the Terminator timeline tracks. After all, ChatGPT just predicts the most likely human response. In that context, Cyberdyne AI launching a preemptive nuclear strike is basically ChatGPT going: "what would a strongman leader with nuclear access and about to be taken out do?"

PetrBrzyBrzek3y ago· 1 in thread

“Fine-tune our model (OpenAIs GPT-3 davinci-003 engine)”

I think there is a mistake in the article. It is not possible to do fine-tuning for the latest davinci-text-003, but only for the original davinci model, which generates much worse results.

yvoschaap3y ago

Agree. The fine-tuning happens on the base "davinci" (or Curie, Babbage, Ada) and not a specific `text-00x`. At least not as I am aware.

yvoschaap3y ago· 1 in thread

> Immediately we ran into a problem -- to fine-tune an OpenAI model requires a specific format of prompt-completion pairs:

From my understanding, you can leave the `prompt` empty, and just push `completion` with your text. That way you don't need to generate Q&A first.

wfhBrian3y ago

This is correct from what I've seen, but it's not well documented. Also, that fine-tuning is "better for training " style than knowledge."

totalhack3y ago· 1 in thread

Do you need GPT-3 for this? Maybe semantic search of your docs would have been more effective at finding real answers?

I also wonder how many people that are trying to make effective products out of this stuff are fronting it with a more rigid approach (like the intent/entity/slot approach of Rasa/dialogflow) and then leverage gpt-3 or chatgpt in specific/partial sub trees of the dialog.

tsthename3y ago

That's close to the conclusion I come to in my experiment [0]. Focusing on the generational capabilities can make some cool demos but investing in good search felt like the most useful thing to do.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...

compacct273y ago· 1 in thread

The embedding approach just seems more promising, especially after experiencing it with the Huberman Lab Q&A website posted here a few days ago

saliagato3y ago

Agree

ffhhj3y ago· 1 in thread

> a lot of the time the bot just makes stuff up

Isn't there a better way to feed an enormous document into DaVinci and make it bring answers only from that text?

tsthename3y ago

I tried to do this with hacker news data [0]. I wanted to feed the model the entire community's discourse and then ask it questions (like simulating an interview with a HN user). The main problems encountered were:

- 1. Token limit: You can only input a limited amount of text at once. The challenge then becomes trying to compress data to fit into the window. But it can be lossy.

- 2. Trust: This is the main one. It's hard to determine if the output is based on the new learning material or the large amounts of data the model was originally trained on. There are techniques that can help but they add a lot of additional work and don't guarantee great results.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...

bilsbie3y ago

People don’t seem to understand that support is only maybe 30% answering questions.

The rest is all about taking actions to override programs and policies. Either because you don’t trust your customers to do it themselves, or to correct bugs in your process.

That’s the last thing you’d trust an Ai to do.

j / k navigate · click thread line to collapse

36 comments

28 comments · 10 top-level

aliqot3y ago· 5 in thread

Part of me is fascinated by this and thinks it's a great idea, then the cynicism kicks in and I start thinking of how frustrating this could be when Comcast finds it.

thomasahle3y ago

Right. We're going from having support bots that give nonsensical or "please reformulate your question" answers to support bots that just make up plausible, but completely wrong answers.

criloz23y ago

1 more reply

visarga3y ago

OJFord3y ago

To be fair, support humans do that anyway.

pedrovhb3y ago

P.S.: The IT Crowd did it first: https://youtu.be/5UT8RkSmN4k?t=17

antman3y ago· 4 in thread

"Making stuff up and being confidently wrong are well known side-effects of LLMs and there are many techniques to change this behavior."

I didn't know there are many techniques to mitigate this

wongarsu3y ago

If you use a few-shot technique (i.e. your prompt contains a couple of example questions and answers) you can mitigate this behavior by adding a question with the answer "I don't know".

More generally, if you teach the model to reject nonsense questions and admit if it doesn't know something it's more likely to do that

microjim3y ago

visarga3y ago

> I didn't know there are many techniques to mitigate this

A trivial idea - you can use GPT-3 to inject bullshit/hallucinations into real text. Then train the model to solve the reverse task, of detecting bullshit in input text.

stavros3y ago

How is it going to detect whether a given URL is real, though?

1 more reply

mhitza3y ago· 2 in thread

I literally chuckled when I saw the screenshot at the end with the bots reply and read the authors comment "(totally made up URL, not even our domain)".

visarga3y ago

In case it fails, GPT-3 would still be good at collecting the details and making a nice summary for the human to take over.

bfeynman3y ago

what a great user experience! Said no one ever.

1 more reply

cardosof3y ago· 2 in thread

BasedInfra3y ago

Easy to avoid, Just ask the GPT model how to avert this outcome.

romanhn3y ago

PetrBrzyBrzek3y ago· 1 in thread

“Fine-tune our model (OpenAIs GPT-3 davinci-003 engine)”

I think there is a mistake in the article. It is not possible to do fine-tuning for the latest davinci-text-003, but only for the original davinci model, which generates much worse results.

yvoschaap3y ago

Agree. The fine-tuning happens on the base "davinci" (or Curie, Babbage, Ada) and not a specific `text-00x`. At least not as I am aware.

yvoschaap3y ago· 1 in thread

> Immediately we ran into a problem -- to fine-tune an OpenAI model requires a specific format of prompt-completion pairs:

From my understanding, you can leave the `prompt` empty, and just push `completion` with your text. That way you don't need to generate Q&A first.

wfhBrian3y ago

This is correct from what I've seen, but it's not well documented. Also, that fine-tuning is "better for training " style than knowledge."

totalhack3y ago· 1 in thread

Do you need GPT-3 for this? Maybe semantic search of your docs would have been more effective at finding real answers?

tsthename3y ago

That's close to the conclusion I come to in my experiment [0]. Focusing on the generational capabilities can make some cool demos but investing in good search felt like the most useful thing to do.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...

compacct273y ago· 1 in thread

The embedding approach just seems more promising, especially after experiencing it with the Huberman Lab Q&A website posted here a few days ago

saliagato3y ago

Agree

ffhhj3y ago· 1 in thread

> a lot of the time the bot just makes stuff up

Isn't there a better way to feed an enormous document into DaVinci and make it bring answers only from that text?

tsthename3y ago

- 1. Token limit: You can only input a limited amount of text at once. The challenge then becomes trying to compress data to fit into the window. But it can be lossy.

- [0] https://idiotlamborghini.com/articles/using_gpt3_and_hacker_...

bilsbie3y ago

People don’t seem to understand that support is only maybe 30% answering questions.

The rest is all about taking actions to override programs and policies. Either because you don’t trust your customers to do it themselves, or to correct bugs in your process.

That’s the last thing you’d trust an Ai to do.

j / k navigate · click thread line to collapse