undefined | Better HN

0 pointspetercooper3y ago0 comments

I think it might be even less problematic with something like Whisper than with DALLE/SD? Merely consuming data to train a system or create an index is not usually contrary to the law (otherwise Google wouldn't exist) – it's the publication of copyright content that's thorny (and is something you can begin to achieve with results from visual models that include Getty Photos logo, etc.)

I think it'd be a lot harder to make a case for an accurate audio to text transcription being seen to violate the copyright of any of the training material in the way a visual could.

0 comments

1 comments · 1 top-level

jefftk3y ago

They're not just training a system but publishing the trained system

j / k navigate · click thread line to collapse