RFdiffusion: Diffusion model generates protein backbones (opens in new tab)

(github.com)

114 pointsjajoosam3y ago19 comments

19 comments

12 comments · 6 top-level

zack-m3y ago· 2 in thread

Corresponding paper to RFdiffusion: https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

Some context: Been waiting for this to come out for a while! Main innovation is leveraging RosettaFold (protein folding neural net) to generate protein backbones via diffusing in 3D space! From backbones, we can generate sequences that would fold into said structures via sequence design algorithms (check out: proteinMPNN, Rosetta FastDesign).

In terms of applications: This is super relevant for our ability to create strongly binding protein binders (ex timely creation of proteins that bind to virus spike proteins), and designing enzyme from scratch!

Prior methods suffered from much lower success rates for generating “good” backbone structures. Extremely exciting!! If you want to learn more, check out the Baker group at UW!

baq3y ago

So in essence, if I understand correctly, instead of generating Balenciaga Pope or arrested Trump fake images, we can now dream up fake protein things which may actually be viable for whatever purpose if synthesized in the real world?

mimischi3y ago

Dreaming up a static three dinensional structure does not guarantee that it is stable in a given environment, or that production of this structure in a lab is viable. A huge problem in the space is protein folding–concerned with figuring out how you get from an unfolded linear string of amino acids to this three dimensional structure.

Folding takes into account many variables, and a big chunk of current experimental structure determination is concerned with controlling/adjusting these variables.

So this dreaming up will provide a potential “quicker way” into what a folded protein might look like, but it will not guarantee you that humanity knows how to actually produce it in the real-world.

Disclaimer: someone correct me if I’m wrong. I might be rusty on the latest developments, as I’ve left the field after my PhD.

1 more reply

lysozyme3y ago· 2 in thread

It’s a fun time to be interested in AI for proteins. Every new ML model type is inevitably tried out on proteins. As the functional molecules of life, proteins are uniquely important and fundamental to every process in biology. As the targets for every drug, the tools for every cellular job, and the squishy, wiggly, moving and alive parts of living things, proteins presents both exciting possibilities and deep technical challenges for those who design them. A protein can be understood simply as a string of letters about 300 long using the alphabet ACDEFGHIKLMNPQRSTVWY. This turns out to be a great representation for sequence models like transformers. One big public database, UniProt, has 200 million protein sequences you can train your model on.

The very largest plain transformer models trained on protein sequences (analogous to plain text) are about 15B parameters (I am thinking of Meta AI’s ESM-2 [1]). These can do for protein sequences what LLMs do for text (that is, they can “fill in the blank” to design variations, generate new proteins that look like their training data), and tell you how likely it is that a given sequence exists.

Some cool variations of transformers have applications for protein design, like the now-famous SE(3) equivariant transformer used in the structure prediction module of AlphaFold [2], now appearing in the research paper [3] accompanying TFA, as well as variations on the transformer such as the message passing model ProteinMPNN [4], which builds on a neighbor graph-structured transformer [5]

1. https://github.com/facebookresearch/esm

2. https://github.com/deepmind/alphafold

3. https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

4. https://github.com/dauparas/ProteinMPNN

5. https://github.com/jingraham/neurips19-graph-protein-design

folli3y ago

I'm interested in finding out more about de novo binder design to a given protein. Besides RFdiffusion, do you know of any other tools worth a look?

jajoosamOP3y ago

Check out ColabDesign!

waynenilsen3y ago· 2 in thread

So I guess this means easier drug discovery? Honesty those wiggly diagrams are meaningless to me I have no bio background

ramraj073y ago

Easier drug discovery is what they tell public and grant agencies. In a roundabout way it’s true. Maybe. Many other hurdles still exist. But what this and other similar tools really are, is significantly advancing basic science in creating our own protein designs.

Before alphafold changed this field, creating your own protein design was considered an insane task (not impossible, bakers lab and others have done it a couple times). But these tools (now we have multiple) allow you to create new proteins From scratch that can do exactly what you want (caveats galore). New enzymes that can catalyze reactions never found in nature for example.

Before this all we could do was take proteins that already exist in nature and modify them. So you can imagine how new this world is.

famouswaffles3y ago

Large Language models can also generate novel and working protein structures that adhere to a specified purpose https://www.nature.com/articles/s41587-022-01618-2

1 more reply

Robotbeat3y ago

Can stuff like this be used for other polymers, like thermoplastics? Can you speed up molecular modeling of thermoplastic crystallization?

mr-ai3y ago

What I find fascinating about RFDiffusion is that it puts together two very powerful yet distinct deep learning architectures: Diffusion models and Graph Neural Networks. I wrote about this here: https://www.assemblyai.com/blog/ai-trends-graph-neural-netwo...

jajoosamOP3y ago

ML for protein engineering is incredibly fascinating — and pretty much all of it, including RFdiffusion is built on structure prediction models.

This series of talks by Nazim Bouatta is exceptional, helped me appreciate and make sense of these models. Incredible how you can engineer neural nets to learn with way lesser data when you incorporate the right inductive biases: https://youtube.com/playlist?list=PL0NRmB0fnLJQPDZh-6utVnRpF...

j / k navigate · click thread line to collapse

19 comments

12 comments · 6 top-level

zack-m3y ago· 2 in thread

Corresponding paper to RFdiffusion: https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

Prior methods suffered from much lower success rates for generating “good” backbone structures. Extremely exciting!! If you want to learn more, check out the Baker group at UW!

baq3y ago

mimischi3y ago

Folding takes into account many variables, and a big chunk of current experimental structure determination is concerned with controlling/adjusting these variables.

Disclaimer: someone correct me if I’m wrong. I might be rusty on the latest developments, as I’ve left the field after my PhD.

1 more reply

lysozyme3y ago· 2 in thread

1. https://github.com/facebookresearch/esm

2. https://github.com/deepmind/alphafold

3. https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

4. https://github.com/dauparas/ProteinMPNN

5. https://github.com/jingraham/neurips19-graph-protein-design

folli3y ago

I'm interested in finding out more about de novo binder design to a given protein. Besides RFdiffusion, do you know of any other tools worth a look?

jajoosamOP3y ago

Check out ColabDesign!

waynenilsen3y ago· 2 in thread

So I guess this means easier drug discovery? Honesty those wiggly diagrams are meaningless to me I have no bio background

ramraj073y ago

Before this all we could do was take proteins that already exist in nature and modify them. So you can imagine how new this world is.

famouswaffles3y ago

Large Language models can also generate novel and working protein structures that adhere to a specified purpose https://www.nature.com/articles/s41587-022-01618-2

1 more reply

Robotbeat3y ago

Can stuff like this be used for other polymers, like thermoplastics? Can you speed up molecular modeling of thermoplastic crystallization?

mr-ai3y ago

jajoosamOP3y ago

ML for protein engineering is incredibly fascinating — and pretty much all of it, including RFdiffusion is built on structure prediction models.

j / k navigate · click thread line to collapse