undefined | Better HN

story

0 pointspilooch1y ago0 comments

Any ML based service with an API is basically a dataset builder for more ML. This has been known forever and is actually a useful "law" of ML-based systems.

0 comments

sho_hn1y ago

Aye, this should be obvious even to non-technical folks. Much has been written about how LLMs regurgitate the data they were trained on. So if you're looking for data to train on, you can certainly extract it there.

Plus of course for people within the tech bubble, plenty of research results on the value of synthetically augmented and expanded training data that put the impact past just regurgitating source data.

This whole episode is a failure of reporting what to expect next and projecting running costs etc. most of all.

amelius1y ago

This is why models should be open. Or at least they should have a local option.

j / k navigate · click thread line to collapse

0 comments

sho_hn1y ago

This whole episode is a failure of reporting what to expect next and projecting running costs etc. most of all.

amelius1y ago

This is why models should be open. Or at least they should have a local option.

j / k navigate · click thread line to collapse