undefined | Better HN

story

0 pointss17n1y ago0 comments

> This is obviously extremely silly, because that's exactly how OpenAI got all of its training data in the first place - by scraping other peoples' data off the internet.

OpenAI has also invested heavily in human annotation and RLHF. If all DeepSeek wanted was a proxy for scraped training data, they'd probably just scrape it themselves. Using existing RLHF'd models as replacement for expensive humans in the training loop is the real game changer for anyone trying to replicate these results.

0 comments

KennyBlanken1y ago

"We spent a lot of labor processing everything we stole" is...not how that works.

That's like the mafia complaining that they worked so hard to steal those barrels of beer that someone made off with in the middle of the night and really that's not fair and won't someone do something about it?

s17nOP1y ago

Oh, I don't really care about IP theft and agree that it's funny that openai is complaining. But I don't think its true that deepseek is just doing this because they are too lazy to scrape the internet themselves - its all about the human labor that they would otherwise have to pay for.

KennyBlanken1y ago

That's assuming what a known prolific liar has said is true...

The most famous example would be him contacting ScarJo's agent to hire her to provide her voice for their text-to-speech bot, them being told to go pound sand, and doing it anyway, and then lying about (which they got away with until her agent released a statement saying they'd approached her and she told them to fuck off.)

Ukv1y ago

> and doing it anyway, and then lying about

To my understanding, this is not true. The "Sky" voice was based on a real voice actor they had hired months before contacting Johansson, with the casting call not mentioning anything about sounding like Johansson. [0]

I think it's plausible that they noticed some similarity and that's what prompted them to later reach out to see if they could get Johansson herself, but it's not Johansson's voice and does not appear to be someone hired to sound like her.

[0]: https://archive.is/BNFvh

j / k navigate · click thread line to collapse

0 comments

KennyBlanken1y ago

"We spent a lot of labor processing everything we stole" is...not how that works.

s17nOP1y ago

KennyBlanken1y ago

That's assuming what a known prolific liar has said is true...

Ukv1y ago

> and doing it anyway, and then lying about

[0]: https://archive.is/BNFvh

j / k navigate · click thread line to collapse