But Q&A websites do contain information that might not be in other sources, so there would be some loss.
These LLMs could not exist without them, but now they're expected to compete?
If all of the Q&A platforms die off, how are LLM training datasets going to get new information?
This whole AI boom is typical corporate shortsightedness imo. Kill the future in order to have a great next quarter
I hope I'm wrong. If I am right, then I hope we figure this out before AI has bulldozed everything into dust
You just take arbitrary data and ask the LLM to put it in Q&A format and generate the synthetic training data. Unless you are suggesting Quora is the source of new information, which I don't agree with.
Quora does not care about the user experience. Their obsession with pay-walling killed the site for me across a decade. They literally could not get me to sign up and boy did they try (I really needed an answer once too!). My soul really remembers hostile sites.
> These LLMs could not exist without them, but now they're expected to compete?
Yea, those damn tractor makers - they ate the food that the hand farmers used to make! How are hand farmers expected to compete with tractors now, when it's so much more efficient and can do 100x the work!?
This comes from a reaction to the previous model of forums where it was smaller bits of data spread across multiple comments or posts. I recall going through forums in the days before Stack Overflow, trying to find out how to solve a problem. https://xkcd.com/979/ was very real.
Stack Overflow (and its siblings) was an attempt to change this to a "one spot that has all the information".
That model works, but it is a high maintenance approach. Trying to move from a back and forth of information that can only be understood in its entirety across a conversation to become one that more closely resembles a Wikipedia page (that hides all of the work of Talk:Something). The key thing is it takes a lot of work to maintain that Q&A format.
And yet, users often don't know what they want. They want that forum model with interaction and step by step hand holding by someone who knows the answer. Stack Overflow was intentionally designed to make that approach difficult in an attempt to make the Q&A the easier solution on the site.
ChatGPT provides the users who want the step by step hand holding an infinitely patient thing behind the screen that doesn't embarrass them in public and is confident that it knows the answer to their problem.
Stack Overflow and Quora and other Q&A forums are the abomination. People want Perlmonks https://www.perlmonks.org/?node_id=11164039 and /r/JavaHelp where its interacting with another and small steps rather than Q&A.
---
The future of "well, if people stop using the sites that is generating the information that is being used to train the models that people are using to get information" ... that becomes an interesting problem.
I am reminded of Accelerando ( https://www.antipope.org/charlie/blog-static/fiction/acceler... ) and the digital civilizations being various forms of scams and the currency is things that can think new ideas.
The currency is new material that is to be sold. The information gets locked behind some measures to try to make scraping impractical and then sold off wholesale. Humans still talk and answer questions. There are new posts on Reddit about how to solve problems even while ChatGPT is out there. And Reddit is presumably trying to make harvesting the content within its walls something that others have to pay for to get at for training.
This sounds like the seed for a business model to pitch in the next upcoming hype cycle: "short-term hiring of experts with quarter-hourly billing increments enabled by our web- or app-based user interface". :-)
If I want someone to walk me through making muffins from scratch, is a human on the other and of that line (competing with $1/day rates for ChatGPT Pro - its cheaper than that, but that's the comparison) and are they better than what ChatGPT can do?
It would have to be... quite a bit more than what the LLM would be priced at. The minimum it could reasonably be (without any other things) would get close to $4/15m... and that's minimum wage.
I really don't think that humans are competitive on that timescale or rate.
It would probably be better to hire people at some higher rate to write content for your private model. Brandon Sanderson is considered one of the faster writers (in the fantasy genre) and averages at about 2500 words / day ( https://famouswritingroutines.com/collections/daily-word-cou... ) - and while he makes a lot more than most authors, lets go to a more typical $75,000 USD / year. 250 working days per year and we're at $300 / day. And we're to $0.12 per word. ... Which puts a person in the intermediate to experienced price per word range https://uxwritinghub.com/writers-salary/
Not that I'm suggesting that's the way to do it, but something for LLMs to consider - hire experts to write content for their LLM. $125 per 1000 word blog post.
298 words. I'd like my $37.25 please. Not that I'm asking you for that, but rather that's what my words as training material would be worth.