undefined | Better HN

0 pointsthfuran2y ago0 comments

I haven't tried more than a handful of queries, but I think I've gotten 100% rate of hallucination or generic useless response to specific question.

0 comments

6 comments · 1 top-level

p1esk2y ago· 5 in thread

Can I try your question? Just curious.

kwppen2y ago

It is very easy to find such questions. A very recent example is a thread about AIs not having a concept of correlation:

https://news.ycombinator.com/item?id=40751756

In that thread multiple people posted wrong answers from GPT-4o but assumed that the answers were correct and praised the AI.

This matches my experience that anything that deviates from an encyclopedia lookup or web search is very likely to be wrong.

p1esk2y ago

What does this have to do with hallucinations?

thfuranOP2y ago

I don't remember exactly, but they were broadly "How can I do $WEIRD_NICHE_THING with $GENERAL_FEATURE of gradle / some java library?"

hmottestad2y ago

You might need to help it out with some more context. I find that LLMs act a lot like humans because they are trained on data that is mostly produced by humans. Sometimes having a bit of a conversation with it based on the general theme of your question first will help it focus on that part of its knowledge.

I’ve started using the chat feature in Github Copilot in IntelliJ. I wanted it to add some logging to my code for me, since it was a tedious task. I started off with a few relevant files and an explanation of what I wanted. Naturally it didn’t get it right on the first try, I don’t think any humans would either. But I could continue as conversation explaining what I thought was wrong and how I wanted it to actually be. I even realised that I didn’t know exactly what I wanted before I had seen some of the suggestions.

Once I was happy with the result I added another file to the chat and asked it to do the same with this file. I had a handful of files that were structured very similarly and all needed the same kind of logging. It did a great job and I could use the response without further editing. I tried to add more files but realised that the replies got slower and slower, so instead I reverted the conversation back to the state where I had initially been happy with the results and asked it to do the same thing but this time to a different file.

I find that it takes some practice to get good at getting the best results from LLMs. One great place to start is the prompt engineering guide by OpenAI https://platform.openai.com/docs/guides/prompt-engineering

When using something like GPT-4 for developing I try to think of it as a junior developer or a grad student. With a search engine you need to include the correct keywords to get the best results. For LLMs you need to set the right mood by writing a good prompt and holding a conversation before getting to the point. I also find that GPT-4 is fairly good at answering factual questions, but it’s much more useful and powerful when used to create things or discuss an approach.

danielbln2y ago

"do a web search to validate what you've told me"

j / k navigate · click thread line to collapse