> 3. Use Restrictions
> You must not use any of the AlphaFold 3 Assets:
> 1. for the restricted uses set forth in the AlphaFold 3 Model Parameters Prohibited Use Policy; or
> 2. in violation of applicable laws and regulations.
AlphaFold 3 Model Parameters Prohibited Use Policy states:
> You must not access or use nor allow others to access or use the the AlphaFold 3 Assets:
> On behalf of a commercial organization or in connection with any commercial activities, including research on behalf of commercial organizations.
Wasn't a relevant question for AlphaFold2, as the weights for it were CC BY 4.0 license.
These model weights (and many other ml weights) are clearly very useful in a commercial settings, but google thinks it can scare people into not using them with the wording of their license, are they right?
In the old days of genomics there were massive patent wars. First, the human genome project itself. Craig Venter got massive funding to sequence the human genome with the understanding he'd patent all the genes. So there was a space race of sorts where the private sector sought to beat him - lead by Francis Collins now head of the NIH. It came out a tie (or that's what they called it), Bill Clinton brought them both on a stage and said "great job! also genes aren't patentable!"
Then a whole stink arose around Myriad Genetics who patented a BRCA test. Now that's a bigtime gene far as cancer goes see: Angelina Jolie. Then in 2013 the supreme court ruled genes cannot be patented.
So what is alphafold 3? Is it a ground truth of which protein interacts with what? In which case it seems not patentable. Or is it a method, or algorithm, to estimate protein interactions? That's more grey area. Idk. If google wanted to monetize it proper they'd probably keep it as an internal black project and cook up pharma collabs and such. But they've made it public(ish). Still a long way to go, or at least some more steps. If we say Protein A interacts with Protein B, we then have to ask whether they're expressed in the same cell, which itself is not enough! Most bio measurements are in big batches of millions of cells. It has to be same cell at the same time. So if our batch is a million cells w/ protein A, a million cells w/ protein B, then it looks like both are "on" in our batch of 2 million cells. But the truth is more nuanced. And then even then, other considerations such as post translational modifications and which cellular compartment these proteins reside in.
Yes, they can come after the leaker. If they can identify the leaker, which they probably can't. But even crucifying the leaker won't put the genie back into the bottle.
Ultimately, the working parts of a given model are completely unknowable to even the smartest humans once you get to doing anything past bare basics. We know the shape of the model, the number of layers, and what inputs/outputs correlate to, but not really anything else. It's the product of a machine trying things randomly until something works, then the best model produced is selected for production.
Not altogether different on a high level perspective from generating an image, or piece of text using a model. You're introducing a random factor, number of steps, and the machine uses this unknowable model to produce something a person can understand.
I do think the law should update and grant some protections to people who produce models, because losing all protection would mean the death of open model releases, and then we'd be even more seriously staring down the barrel of corpos controlling the entirety of the technology moreso than we are now. At least open models provide some semblance of control for end users.