undefined | Better HN

story

0 pointssimonw3y ago0 comments

This is genuinely the cutting edge of how you do interesting things with language models like GPT-3 at the moment.

Training these models with extra data turns out to be incredibly expensive and relatively ineffective.

Instead, the most interesting research is all around tricks like this - figuring out ways to round-trip to the language model, then query other sources of data for the information that it needs, then sending more prompts to the language model again.

I wrote a tutorial about a pattern for doing that a couple of weeks ago, but this SQL trick is a lot more sophisticated than what I've done so far: https://simonwillison.net/2023/Jan/13/semantic-search-answer...

0 comments

andyreagan3y ago

> Training these models with extra data turns out to be incredibly expensive and relatively ineffective.

I can see that it's expensive, but have you tried it for effectiveness?

BTW, your approach is very cool here.

simonwOP3y ago

I've only done two experiments with it myself - training a tagging model on my blog's content and using that to suggest tags for untagged entries - and I found the results very unimpressive fur both a cheaper and the most expensive model.

I've seen a few other people suggest that time tuning GPT is unlikely to give better results than just feeding the regular model a few examples in a regular prompt.

I've yet to see anyone talking about a GPT3 fine tuning project that went really for them. Maybe I haven't looked in the right places.

j / k navigate · click thread line to collapse