https://news.ycombinator.com/item?id=40751756
In that thread multiple people posted wrong answers from GPT-4o but assumed that the answers were correct and praised the AI.
This matches my experience that anything that deviates from an encyclopedia lookup or web search is very likely to be wrong.
I’ve started using the chat feature in Github Copilot in IntelliJ. I wanted it to add some logging to my code for me, since it was a tedious task. I started off with a few relevant files and an explanation of what I wanted. Naturally it didn’t get it right on the first try, I don’t think any humans would either. But I could continue as conversation explaining what I thought was wrong and how I wanted it to actually be. I even realised that I didn’t know exactly what I wanted before I had seen some of the suggestions.
Once I was happy with the result I added another file to the chat and asked it to do the same with this file. I had a handful of files that were structured very similarly and all needed the same kind of logging. It did a great job and I could use the response without further editing. I tried to add more files but realised that the replies got slower and slower, so instead I reverted the conversation back to the state where I had initially been happy with the results and asked it to do the same thing but this time to a different file.
I find that it takes some practice to get good at getting the best results from LLMs. One great place to start is the prompt engineering guide by OpenAI https://platform.openai.com/docs/guides/prompt-engineering
When using something like GPT-4 for developing I try to think of it as a junior developer or a grad student. With a search engine you need to include the correct keywords to get the best results. For LLMs you need to set the right mood by writing a good prompt and holding a conversation before getting to the point. I also find that GPT-4 is fairly good at answering factual questions, but it’s much more useful and powerful when used to create things or discuss an approach.