undefined | Better HN

0 pointssuperfrank2y ago0 comments

Not knocking the approach, but how do you do quality control on the posts? Are you just spot checking? How often have you found bad data?

I've thought about doing something similar (using ChatGPT to structure and categorize unstructured data) for a different project in a completely different space and I'm worried about ChatGPT hallucinating things, especially when it comes to numbers.

0 comments

4 comments · 2 top-level

jonnycoder2y ago· 2 in thread

The quality control is a good question, and one that can probably be addressed using evaluation as taught by some of the deeplearning.ai short courses (1).

I made an interactive resume ai bot on my personal website and there is an instance where I can ask it "tell me about your intel experience" and it added in C++ as one of the languages, but that is untrue. I had done C++ at a different company.

1. https://www.deeplearning.ai/short-courses/

superfrankOP2y ago

Can you give more details? No offense, but I'm not going to sign up for a random site to watch a video of unknown quality and length.

jonnycoder2y ago

I posted the short courses just as answer to how to address quality control. I'm not selling anything, and those courses are free anyway. deeplearning.ai was cofounded by Andrew Ng, who is probably the most well known for his work on teaching machine learning through deeplearning.ai, Coursera, Stanford, etc. He has taught and influenced millions.

https://en.wikipedia.org/wiki/Andrew_Ng

In regards to "evaluation", I think these is what those short courses will cover:

Self-Evaluation with the LLM: The idea is to use the language model to generate an answer and then use the same or a different model to evaluate that answer. The evaluation could involve asking the model to rate the answer's accuracy, coherence, relevance, or any other desired metric. This self-evaluation process can be automated and scaled, although it's important to be aware of the limitations, as the model might inherit biases or blind spots from its training data.

LangChain for Structured Evaluation: LangChain can be used to structure this self-evaluation process. It can orchestrate the flow where the LLM first generates an answer and then follows a series of steps to evaluate it. This might include breaking down the evaluation into specific questions or tasks that the LLM must perform to assess its initial response.

bernawil2y ago

Well to be fair, the original who is hiring post doesn't do much quality control. Then, the other apps do neither. Honestly, this whole thing came out just of my frustration using one of those and filtering for Remote, reading the text and finding out it wasn't remote at all.

As for quality control, there's a step for categorization that returns some tags. Posts that don't match any are rejected, that's kind of filters for relevancy.

j / k navigate · click thread line to collapse