The Unreasonable Redundancy of Nature's Protein Folds (opens in new tab)

(research.ligo.bio)

163 pointsray__24d ago61 comments

61 comments

43 comments · 13 top-level

jyounker24d ago· 8 in thread

None of this seems particularly surprising to someone who was an undergraduate level of biochemistry knowledge. Thirty years ago the professor in my Proteins class made a few relevant important points in his lectures:

1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)

2) Those were generally in the reaction center.

3) Almost all single sequence replacements had no measurable effect on protein structure and function.

4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.

Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.

[Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]

DrScientist23d ago

You are missing the point - sure a particular enzyme's function is resilent to large levels of substitution because:

1. The number of residues actively involved in catalysis might be small and 2. Most other residues can be safely replaced with something else either similar if part of the structure or anything if the side chain is pointing out on the surface.

However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

Is that because the stable protein fold structural space is actually small ( due to the limited secondard structure patterns used etc ), or is that because evolution hasn't had time to to search the enormous available structural space?

ie is it a sampling problem or an instrinic property of protein space.

The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.

This to me isn't surprising - the idea that evolution is somehow complete and all possible solutions have already been explored seems to me to be unlikely - a lot of evolution happens via gene duplication and then gradual functional drift - which would favour reuse of existing folds over the generation of completely new ones.

jyounker22d ago

I have a 30 year old book on protein structure on my shelf. One of the primary themes is the recurrence of the same structural motifs in proteins. The fact that biologic proteins use the same patterns for different functions isn't new information.

The result also fits in with the rest of biochemistry. While there are a vast variety of interesting chemicals in living things, and they do all sorts of amazing stuff, there are really only a handful of classes of chemicals.

The variety of classes of chemicals that can exist dwarfs what gets used in biochemistry. Why would we expect structure to be different?

We're in agreement though, that it would be interesting to understand what the constraints are.

1 more reply

resiros23d ago

> However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

That's a basic fact in bio. Check the rossman fold page for example: https://en.wikipedia.org/wiki/Rossmann_fold it's a template used for many functions.

1 more reply

Windchaser23d ago

It seems just obvious that it's at least a sampling problem. Assuming an average protein length of 400 amino acids and 20 possible amino acids, that's about 10^520 different possibilities for sequences, which is a mind-bogglingly large number.

We haven't even begun to explore the biological universe.

1 more reply

HarHarVeryFunny23d ago

So what are the lessons here?

- that structure is as/more important than sequence ?

- that "reaction centers" are what matter, and the rest is just "protection" ?

What do you mean by "reaction center" - surely not physically central within the folded structure (isn't it the surface shape that determines reactivity) ?

flobosg23d ago

> that structure is as/more important than sequence?

Structure is determined by sequence, so they are equally important. Structure is more conserved than sequence, mainly due to the physicochemical constraints that govern protein folding.

> that "reaction centers" are what matter, and the rest is just "protection"?

Sometimes not even protection. Many enzymes can have plenty of its sequence/structure removed and still be functional. Natural proteins carry lots of evolutionary cruft.

> What do you mean by "reaction center" - surely not physically central within the folded structure

I think they borrowed the term from photosystems/photosynthesis. But, to be more precise, what they actually meant is the active site of an enzyme; the location where the catalyzed reaction takes place.

> (isn't it the surface shape that determines reactivity) ?

Shape is not enough, the chemical nature of the amino acid residues involved is also important. A single mutation in a key catalytic residue will shut down the enzyme even if the shape stays the same.

jyounker22d ago

> What do you mean by "reaction center"

An enzymatic reaction center is also known as an "active size". It's the location within an enzyme's 3D structure where catalysis happens.

jyounker22d ago

> So what are the lessons here?

The only lesson is that, to a biochemist, the result is not surprising.

hirenj24d ago· 6 in thread

This approach is pretty much like the TED approach from a few years back. As far as I remember there wasn’t a ridiculous amount of fold diversity there either. It turns out evolution isn’t averse to a bit of liberal protein plagiarism.

https://www.science.org/doi/10.1126/science.adq4946

flobosg24d ago

> Natural selection has no analogy with any aspect of human behavior, However, if one wanted to play with a comparision, one would have to say natural selection does not work as an engineer works. It works like a tinkerer - a tinkerer who does not know exactly what he is going to produce but uses whatever he finds around him whether it be pieces of string, fragments or wood, or old cardboards; in short it works like a tinkerer who uses everything at his disposal to produce some kind of workable object.

―François Jacob, “Evolution and Tinkering” (https://web.mit.edu/~tkonkle/www/BrainEvolution/Meeting9/Jac...)

canadiantim23d ago

Tinker tailor fold or die?

gilleain24d ago

They found "several thousand" novel folds? I had remembered that there were around 1000:

https://pmc.ncbi.nlm.nih.gov/articles/PMC7072414/

Oh ok, I misremembered:

"This review has focused only on small fragments of fold space with examples given for folds generated from a single secondary structure string consisting of around ten SSEs. Even in this small corner, the number of possible folds, under the current constraints, is of the order of 1000"

hirenj24d ago

I think there was a Twitter/Bluesky thread on the results from adding all the predicted folds from metagenomics too, and not ending up with many new clusters. If this continues to hold true as we keep looking at stuff, I will be relieved that at least natural protein folds and domains has a limited (tractable) solution space. All we need to do now is annotate the variation of these couple of thousands of fold variants. Challenging, but at least a bounded problem.

jeejay124d ago

What plagiarism even means in context of proteins? That one protein steals a fold of another protein without giving proper credit to it?

gilleain24d ago

I understood it as metaphor - just that evolutionarily distant sequences can adopt the same (or very similar) folds because there are only a limited number of stable, accessible folds that are possible.

2 more replies

Schlagbohrer24d ago· 6 in thread

Can we please retire the headline trend of "The Unreasonable ___ of ____ "

HarHarVeryFunny23d ago

I think it's a useful meme, as long as applied appropriately - where it truthfully promises some sort of surprise and potential insight.

It seems to have originated with Eugene Wigner's 1960 "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".

bl0rg24d ago

At some point someone will analyze this pattern and post an article named "The Unreasonable effectiveness of the 'The Unreasonable X of Y' template".

tux324d ago

Everything old is new again! We've had "Go To Statement Considered Harmful" Considered Harmful [1].

Now it's the Unreasonable Effectiveness of "The Unreasonable Effectiveness of X".

It seems like "X is All You Need" is All You Need.

[1]: https://web.archive.org/web/20090320002214/http://www.ecn.pu...

1 more reply

pfdietz23d ago

The Unreasonable Annoyance of Cliches

But then, this thead is all about proteins incorporating structural cliches, isn't it?

theideaofcoffee23d ago

This and "How I learned to stop worrying and love ___". I can't identify what grinds my gears so much about it, perhaps it's the laziness.

ramraj0723d ago

Competing with "x is all you need"

spwa424d ago· 4 in thread

This is just repeating the fact that the proteins life actually uses are a very small part of the total possible ones. First, there's no real length limit, but all life's proteins are limited to a few thousand amino acids. Most barely get past hundred.

(note: there are bigger proteins, including ones so big you can see them with the naked eye (e.g. a hair) but they consists of multiple repeats of the same small building block. There are many such building blocks. And the very few exceptions to that are "not really" part of eukaryot cells, but of cell organelles that have their own DNA)

But even if you just take the first 4 amino acids, there's half a million possible combinations. Life uses less than 1000 of those.

In other words: DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design. Or at least, it is pretty obvious that it's possible to do A LOT better than natural selection.

gilleain24d ago

This is about folds, not amino acids - even if you used a larger alphabet of residues, I somehow doubt that you would get many more folds.

Thinking more about the question of protein _length_ - I'm also not convinced that longer proteins (more than say 750aa) would produce more novel folds. Larger proteins tend to be multi-domain; that is, a longer chain will fold into multiple compact domains, each one a separate fold.

I suppose there could be 'megafolds' out there in fold space, beyond 1000aa - like a 12-bladed beta propeller, or a beta-helix with alpha helices on the outside or some other wacky thing. Whether that would substantially increase the numbers of total folds, I doubt, but that is of course a guess.

(ref - https://pmc.ncbi.nlm.nih.gov/articles/PMC10251718/ for protein lengths)

spwa424d ago

Amino acid (sequence) defines the folds.

And really? Just any random sequence gets you a new fold. I mean, it won't be very useful if you pick a random one, but it'll work and be a new one.

I think this is just an artifact of natural selection basing new proteins on existing ones, not an actual useful ("rational" if you can call natural selection rational) selection limit. I don't think that if you designed proteins from first principles you'd see this limitation in your results.

1 more reply

suncemoje23d ago

> DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design.

I like how you say evolution is able to think when in reality it's just a mysterious function of variation, selection, and time.

IAmBroom23d ago

I find it completely daunting to speak of evolutions processes without some anthropomorphism sneaking in, despite being a hardcore atheist.

It's all so complex, and our verbs that more literally describe the billions of nanosecond operations going on in the cells feel inadequate. "When a protein molecule in an appropriate folded shape and orientation happens to be bounced by kinetic energy into the attractive region of a corresponding protease..." versus "The protease grabs the protein and cuts it into..."

resiros24d ago· 3 in thread

Evolution discovered a bunch of structural patterns at different layers (fragments, folds..) that are energetically favorable, versatile, easily foldable, robust to mutations and then kept reusing them. As a result it sampled more and more in these parts of the space. That's why the fold space is uneven.

Are there any folds and patterns that evolution evolution has not discovered that are also useful? I think Baker Group created a bunch of new folds. I'm not sure if they are as useful as the one discovered by Evolution. After all, Evolution had more compute power than us.

noduerme24d ago

Evolution takes surprisingly little time to home in on solutions which are durable enough to handle local conditions. It's not demonstrably good at preparing its offspring for anything that would be useful outside the local environment. It also has a way of forgetting anything before the most recent data set (or global reset).

Our compute capacity isn't deployed to brute force Monte Carlo sims (mostly). So it's apples and oranges.

rustyhancock23d ago

And it seems very few proteins appear to be significant problems.

The most famous is the prion protein which can misfold in ways to cause a variety of contagious diseases. Like mad cow disease, chronic wasting disease, scrapie and in humans CJD and vCJD, fatal familial insomnia, Kuru, GSS.

Perhaps because misfoldings of the prion protein can convert others but why is it all affecting that same protein? Always baffled me why aren't other/many proteins suspitible to becoming a prion?

There are others we call "prionoid" because they can have shades of the catetrosphic misfolding prion can.

alexpotato23d ago

This reminds of the fact that certain fundamental proteins get created even if the DNA for them has errors.

The thinking is that evolution created error correction for the critical proteins to account for mutations.

Fascinating stuff.

photochemsyn23d ago· 1 in thread

This does reveal the weakness of AlphaFold approaches for answering questions like “what is possible in the protein folding space if you use the 20 canonical amino acids” since the data used to train AlphaFold is limited to existing experimentally determined protein structures.

We don’t even know if this is like body plans (four legs for mammals, why not six?) i.e. is this about physical limitations of the folding space (did evolution explore most of the space and hold onto the most useful folds, or are the common set of folds one of those accident-of-history results?). Then there’s the issue that folding takes place as the protein chain exits the ribosomal tunnel so that’s a whole other constraint on what kinds of folds might be selected. For that matter, why not other genetically determined complex amino acids instead of just the canonical set?

Also, a common evolutionary process in eukaryotes is duplication of protein sequences and shuffling of code blocks which might represent folding domains, which might tend to lock in the existing collection of folds rather than generating novel folds. That’s not so clear.

This weakness of AlphaFold has some modern practical relevance since non-canonical amino acids and modified proteins are increasingly used medically, and their structures mostly seem to be determined using the direct experimental methods, eg:

https://pmc.ncbi.nlm.nih.gov/articles/PMC10296201/

“Non-Canonical Amino Acids as Building Blocks for Peptidomimetics: Structure, Function, and Applications” (2023)

flobosg23d ago

> since the data used to train AlphaFold is limited to existing experimentally determined protein structures

Protein sequences, but the point still stands.

dekhn23d ago· 1 in thread

I worked with a foodie who was also a protein scientist (https://scienceandfooducla.wordpress.com/2016/02/23/kent-kir...) and he once pointed out: nearly everything you need to know about protein folding, you can learn from an egg.

nickpsecurity22d ago

How so?

novia24d ago· 1 in thread

gosh the scrolling on that site was so jumpy!

omnifischer24d ago

Agree... There should be some penalty to sites that want to show off their reports only to people with high end devices...

h_a_n_k24d ago

cool post! it's funny how many things in this world are naturally graphs. i think it's neat how, especially in biology, a lot of high-dimensional objects, like protien sequences, converge onto lower-dimensional representations, like protein structures.

i did neuroscience for grad school, and i was always amazed by how often complex neural activity could be well represented by lower dimensional representations--clean manifolds, attractor dynamics, etc. i think, in general, biology (evolution) doesn't penalize against redundancy too hard (hence things like genetic drift, neutral theory of evolution, etc.).

anyway, super cool stuff. agree with you that probs more useful to explore the search space via 'less natural' structures, given how forgiving evolution is to redundancy. probs where the most information can be found

dekhn23d ago

Proteins are truly amazing. I've studied them for decades and they still manage to surprise; for example, i worked with protein structural prediction for decades and assumed that structure was necessary for function, but some proteins remain mostly unfolded and still carry out complex mechanistic tasks.

flobosg24d ago

My PhD thesis addressed a similar question. I did a survey of sub-domain sized fragments shared between different protein folds. It turns out that there are plenty, even among folds considered evolutionarily distant.

ifh-hn24d ago

No real clue what this stuff is about, way over my head, but kudos on an article where it's all there on the page instead of needing scripts to pull text and images from different places!

throwaway8152324d ago

This crashed my browser. Use reader mode.

j / k navigate · click thread line to collapse

61 comments

43 comments · 13 top-level

jyounker24d ago· 8 in thread

1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)

2) Those were generally in the reaction center.

3) Almost all single sequence replacements had no measurable effect on protein structure and function.

4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.

Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.

[Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]

DrScientist23d ago

You are missing the point - sure a particular enzyme's function is resilent to large levels of substitution because:

However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

ie is it a sampling problem or an instrinic property of protein space.

The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.

jyounker22d ago

The variety of classes of chemicals that can exist dwarfs what gets used in biochemistry. Why would we expect structure to be different?

We're in agreement though, that it would be interesting to understand what the constraints are.

1 more reply

resiros23d ago

> However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

That's a basic fact in bio. Check the rossman fold page for example: https://en.wikipedia.org/wiki/Rossmann_fold it's a template used for many functions.

1 more reply

Windchaser23d ago

We haven't even begun to explore the biological universe.

1 more reply

HarHarVeryFunny23d ago

So what are the lessons here?

- that structure is as/more important than sequence ?

- that "reaction centers" are what matter, and the rest is just "protection" ?

What do you mean by "reaction center" - surely not physically central within the folded structure (isn't it the surface shape that determines reactivity) ?

flobosg23d ago

> that structure is as/more important than sequence?

Structure is determined by sequence, so they are equally important. Structure is more conserved than sequence, mainly due to the physicochemical constraints that govern protein folding.

> that "reaction centers" are what matter, and the rest is just "protection"?

Sometimes not even protection. Many enzymes can have plenty of its sequence/structure removed and still be functional. Natural proteins carry lots of evolutionary cruft.

> What do you mean by "reaction center" - surely not physically central within the folded structure

> (isn't it the surface shape that determines reactivity) ?

jyounker22d ago

> What do you mean by "reaction center"

An enzymatic reaction center is also known as an "active size". It's the location within an enzyme's 3D structure where catalysis happens.

jyounker22d ago

> So what are the lessons here?

The only lesson is that, to a biochemist, the result is not surprising.

hirenj24d ago· 6 in thread

https://www.science.org/doi/10.1126/science.adq4946

flobosg24d ago

―François Jacob, “Evolution and Tinkering” (https://web.mit.edu/~tkonkle/www/BrainEvolution/Meeting9/Jac...)

canadiantim23d ago

Tinker tailor fold or die?

gilleain24d ago

They found "several thousand" novel folds? I had remembered that there were around 1000:

https://pmc.ncbi.nlm.nih.gov/articles/PMC7072414/

Oh ok, I misremembered:

hirenj24d ago

jeejay124d ago

What plagiarism even means in context of proteins? That one protein steals a fold of another protein without giving proper credit to it?

gilleain24d ago

2 more replies

Schlagbohrer24d ago· 6 in thread

Can we please retire the headline trend of "The Unreasonable ___ of ____ "

HarHarVeryFunny23d ago

I think it's a useful meme, as long as applied appropriately - where it truthfully promises some sort of surprise and potential insight.

It seems to have originated with Eugene Wigner's 1960 "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".

bl0rg24d ago

At some point someone will analyze this pattern and post an article named "The Unreasonable effectiveness of the 'The Unreasonable X of Y' template".

tux324d ago

Everything old is new again! We've had "Go To Statement Considered Harmful" Considered Harmful [1].

Now it's the Unreasonable Effectiveness of "The Unreasonable Effectiveness of X".

It seems like "X is All You Need" is All You Need.

[1]: https://web.archive.org/web/20090320002214/http://www.ecn.pu...

1 more reply

pfdietz23d ago

The Unreasonable Annoyance of Cliches

But then, this thead is all about proteins incorporating structural cliches, isn't it?

theideaofcoffee23d ago

This and "How I learned to stop worrying and love ___". I can't identify what grinds my gears so much about it, perhaps it's the laziness.

ramraj0723d ago

Competing with "x is all you need"

spwa424d ago· 4 in thread

But even if you just take the first 4 amino acids, there's half a million possible combinations. Life uses less than 1000 of those.

gilleain24d ago

This is about folds, not amino acids - even if you used a larger alphabet of residues, I somehow doubt that you would get many more folds.

(ref - https://pmc.ncbi.nlm.nih.gov/articles/PMC10251718/ for protein lengths)

spwa424d ago

Amino acid (sequence) defines the folds.

And really? Just any random sequence gets you a new fold. I mean, it won't be very useful if you pick a random one, but it'll work and be a new one.

1 more reply

suncemoje23d ago

> DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design.

I like how you say evolution is able to think when in reality it's just a mysterious function of variation, selection, and time.

IAmBroom23d ago

I find it completely daunting to speak of evolutions processes without some anthropomorphism sneaking in, despite being a hardcore atheist.

resiros24d ago· 3 in thread

noduerme24d ago

Our compute capacity isn't deployed to brute force Monte Carlo sims (mostly). So it's apples and oranges.

rustyhancock23d ago

And it seems very few proteins appear to be significant problems.

Perhaps because misfoldings of the prion protein can convert others but why is it all affecting that same protein? Always baffled me why aren't other/many proteins suspitible to becoming a prion?

There are others we call "prionoid" because they can have shades of the catetrosphic misfolding prion can.

alexpotato23d ago

This reminds of the fact that certain fundamental proteins get created even if the DNA for them has errors.

The thinking is that evolution created error correction for the critical proteins to account for mutations.

Fascinating stuff.

photochemsyn23d ago· 1 in thread

https://pmc.ncbi.nlm.nih.gov/articles/PMC10296201/

“Non-Canonical Amino Acids as Building Blocks for Peptidomimetics: Structure, Function, and Applications” (2023)

flobosg23d ago

> since the data used to train AlphaFold is limited to existing experimentally determined protein structures

Protein sequences, but the point still stands.

dekhn23d ago· 1 in thread

nickpsecurity22d ago

How so?

novia24d ago· 1 in thread

gosh the scrolling on that site was so jumpy!

omnifischer24d ago

Agree... There should be some penalty to sites that want to show off their reports only to people with high end devices...

h_a_n_k24d ago

dekhn23d ago

flobosg24d ago

ifh-hn24d ago

No real clue what this stuff is about, way over my head, but kudos on an article where it's all there on the page instead of needing scripts to pull text and images from different places!

throwaway8152324d ago

This crashed my browser. Use reader mode.

j / k navigate · click thread line to collapse