undefined | Better HN

0 pointszmmmmm1y ago0 comments

they give you the code and they give you the model it runs, and you can customise and redistribute both. It's all open source in that respect.

What people are complaining about (totally unreasonably in my view) is obviously Meta is not "open sourcing" all the training data, so nobody can retrain the model from scratch themselves. This argument to me is just silly. The whole point of these models is they distil pretraining on massive data sets you wouldn't have access to otherwise. If you insist on them releasing the data set, they will have to cut it down to 0.1% of the size and you will be getting what you had access to already in the first place.

0 comments

2 comments · 2 top-level

Flimm1y ago

That's not the only thing that people are complaining about. Even the code is not open source, despite being called open source.

frabcus1y ago

They could release the code that gathers and curates the data. Give a reproducible system for getting the pre training data. And presumably they own the post training RLHF stuff so could open that.

Without those you're locked in to them in terms of licensing of future versions.

j / k navigate · click thread line to collapse