undefined | Better HN

0 pointstwo_in_one2y ago0 comments

Next step will be to ask for GPU time. Because even with data, model code and training framework you may have no resources to train. "The equivalent would be" someone gives you the code, but no access to mainframe which is required to compile. Which would make it not open source(?) There are other variations, like original compiler was lost, current compilers aren't backward compatible. Does that make old open source code closed now?

In other words there should be a reasonable line when model is called open source. In extreme view it's when the model, the training framework, and the data are available for free. This would mean open source model can be trained only on public domain data. Which makes class of open source models very, very limited.

More realistic is to make the code and the weights available. So that with some common knowledge new model can be trained, or old fine tuned, on available data. Important note: weights cannot be reproduced even if original training data is available. It will be always a new model with (slightly) different responses.

0 comments

1 comments · 1 top-level

two_in_oneOP2y ago

Down voted, hmm... I'll add bit more then. Sometimes it's even good that model cannot be easily reproduced. Original developers usually have some skills and responsibility. While 'hackers' don't. It's easy to introduce bias into the data , like removing selected criminal records, and then publish model with similar name. That would be confusing, some may mistake fake one for the real.

PS: If I ever make my models open I can't open the data anyway. License on images directly prohibits publishing them.

j / k navigate · click thread line to collapse