Moderna mRNA sequence released to GitHub [pdf] (opens in new tab)

(github.com)

1248 pointsaty2685y ago367 comments

367 comments

209 comments · 56 top-level

drtz5y ago· 53 in thread

I'm sure it's more complex than I grasp as a layperson, but I'm utterly amazed at how simple this _appears_. I get the feeling that this is something I have a better chance of understanding than the average SaaS Terms and Conditions.

I expected to have to scroll through pages upon pages of indecipherable text. Instead it's no bigger than a large paragraph of text, and I can easily fit it on my screen.

azernik5y ago

The protein they're trying to manufacture is indeed quite simple - AFAIU both BioNTech and Moderna put together their sequences in a weekend. (Though there was a more involved process of winnowing down the sequences for the most effective ones.)

The technically challenging parts are:

- delivery mechanism: you need to take a very unstable molecule, protect it from the environment - both external, and when inside the patient - and insert it into a human cell. (This is called the "platform", and is usually developed independently from the specific payload.)

- manufacturing: both producing the mRNA itself at a large scale, and inserting it into the delivery mechanism, at a large scale and in low-temperature conditions

- testing: the newly-developed payload and the existing platform were integrated at small scales within weeks, but testing the thing for safety and efficacy took months

EDIT: As schoen pointed out, this was not actually released by Moderna, but reverse engineered by third-party researchers. Original text was: "Hence they feel safe releasing this. Their moat is not the gene sequence, their moat is everything else."

dnautics5y ago

sequence is actually released by Moderna in their patent:

https://www.modernatx.com/sites/default/files/US10702600.pdf

though they do present multiple sequences, so I guess you'd have to go to the FDA application to figure out exactly which one got used.

2 more replies

dylan6045y ago

> put together their sequences in a weekend

meh, I could do that over a weekend never sounded so scary, or impressive at the same time. That weekend just so happened to stand on the shoulders of prior decades of research though.

i guess this is big pharma's version of `apt-get install`

3 more replies

wespiser_20185y ago

from what I've gathered, the rate limiting step for production as of yet, is creating the lipid vesicles and getting the RNA inside of them. Only a few companies have a process for this, and the supply chain for the precursors is limited as well.

1 more reply

outworlder5y ago

> delivery mechanism: you need to take a very unstable molecule, protect it from the environment - both external, and when inside the patient - and insert it into a human cell. (This is called the "platform", and is usually developed independently from the specific payload.)

Of note, the immune system is pretty good at destroying foreign mRNA so you also need to evade it.

This article is pretty good: https://berthub.eu/articles/posts/reverse-engineering-source...

2 more replies

mschuster915y ago

> - delivery mechanism: you need to take a very unstable molecule, protect it from the environment - both external, and when inside the patient - and insert it into a human cell. (This is called the "platform", and is usually developed independently from the specific payload.)

The most amazing thing is that now that the platform is proven secure in dozens of millions of people, it should be be very easy and fast to get approval for other payloads. Biontech for example wants to go after cancers - a platform that can deliver payloads targeted to an individual's cancer is nothing short of a game changer in cancer treatment because the current standard of blasting the patient's body with a lot of highly toxic chemicals is arcane compared to letting the body's immune system do the cleanup.

1 more reply

schoen5y ago

> Hence they feel safe releasing this. Their moat is not the gene sequence, their moat is everything else.

One or more of the vaccine developers may have released such details, but this particular file is a reverse engineering effort by unaffiliated scientists based on analyzing the dregs of used vaccine vials (!).

Edit: See https://news.ycombinator.com/item?id=26628594 for more substantive discussion about this.

1 more reply

MuffinFlavored5y ago

> but testing the thing for safety and efficacy took months

What kind of tweaks were made from "the version they threw together in a weekend" to "the version that is in production now"? What's a typical "mRNA" feedback iteration loop like?

2 more replies

amelius5y ago

Would it be possible to use the same delivery mechanism for other mRNA sequences?

1 more reply

The_rationalist5y ago

Sounds like a problem you solve once and for all, for any vaccine. And also that this problem was already solved since decades (e.g viral vectors)

- testing: the newly-developed payload and the existing platform were integrated at small scales within weeks, but testing the thing for safety and efficacy took months And so many people have been killed by this overly conservative testing, phase ~<2.5 was enough

2 more replies

Yizahi5y ago

Additional reading (was posted here some time ago):

https://blogs.sciencemag.org/pipeline/archives/2021/02/02/my...

Why manufacturing of these vaccines is a hard part.

jldugger5y ago

Liken it to the 4kb demoscene: it's amazing what can be done with a little bit of information, as long as you don't have to describe the machine running it.

Or the distribution method, or even really invent the thing, since you're mostly just copying someone else's work. Plus it doesn't have to even do anything. In fact, doing anything might be a problem, so best to just sit there and look menacing (and spikey).

GuB-425y ago

> Liken it to the 4kb demoscene

Coincidentally, the mRNA sequences for both vaccines are about 4kb (kilobase) long.

lettergram5y ago

It really is that “simple.”

Getting it designed and building it is more difficult.

At its core, it’s a piece of mRNA that creates a protein. That code gets transcribed into a protein (often those are relatively short). That protein then triggers your bodies immune response, which trains it to attack covid19.

Inject this mRNA into a cell and it’ll create the protein. Anything can be injected at this point once the mechanism for injection is developed

wombatpm5y ago

Which makes me wonder. Could you place the entire virus genome in these liposomes and get them to hijack the machinery to make an entire virus? Like plasmid but for viral structures?

2 more replies

flobosg5y ago

Sequencing technologies have improved immensely over the last decade and a half. And, in this particular case, getting the sample RNA is incredibly easy, since its purity and integrity in the vial is quite high.

shellfishgene5y ago

I didn't look at the details of how they sequenced it, but given that there are chemically modified bases in the mRNA vaccines there is a chance the normal methods for sequencing (and the first step of translating to DNA) don't work. Well, I guess in practice they did.

1 more reply

mattnewton5y ago

I think it's a bit like a private key- the difficulty is in finding some combination that works in an absolutely massive space of possible proteins, not necessarily in the length of the protein.

DecoPerson5y ago

Check out this video by The Thought Emporium to see how far we’ve come in these matters:

https://youtu.be/J3FcbFqSoQY

This should hopefully provide you with some useful perspective.

gerdesj5y ago

"but I'm utterly amazed at how simple this _appears_."

Biology is a funny old thing. You can look at that concise description - the orange and so on blocks of a few letters and a few short groupings.

Now ATCG are basic building blocks but they consist of quite a lot of stuff. I think it's a bit more complex than that because this is RNA not DNA so ATCG might not be quite right. Each of those bases are horrifically complicated depending on scale. Search "ATCG" - this is a good start: https://en.wikipedia.org/wiki/Nucleobase

Now dive into one of those bases and decompose it to its constituent atoms. Now look at the maths around this stuff. It gets quite complicated, quite quickly.

That said, the fact that a bloody complicated thingie can be described so concisely is absolutely amazing and as you say it looks so simple.

flemhans5y ago

It'd be cool to make an easy-to-use interface, still.

puzzlingcaptcha5y ago

Here is a breakdown https://berthub.eu/articles/posts/reverse-engineering-source... discussed previously https://news.ycombinator.com/item?id=25538820

tablespoon5y ago

> This is somewhat of a problem for our vaccine - it needs to sneak past our immune system. Over many years of experimentation, it was found that if the U in RNA is replaced by a slightly modified molecule, our immune system loses interest. For real.

> So in the BioNTech/Pfizer vaccine, every U has been replaced by 1-methyl-3’-pseudouridylyl, denoted by Ψ. The really clever bit is that although this replacement Ψ placates (calms) our immune system, it is accepted as a normal U by relevant parts of the cell.

Neat.

4 more replies

purple_ferret5y ago

Any individual protein doesn't seem that complex since it's just a combination of some 20 amino acids, but the variations are endless:

"Since each of the 20 amino acids is chemically distinct and each can, in principle, occur at any position in a protein chain, there are 20 × 20 × 20 × 20 = 160,000 different possible polypeptide chains four amino acids long, or 20n different possible polypeptide chains n amino acids long. For a typical protein length of about 300 amino acids, more than 10^390 (20^300) different polypeptide chains could theoretically be made. This is such an enormous number that to produce just one molecule of each kind would require many more atoms than exist in the universe."

abfan11275y ago

proteins are also unique in that not just their sequence matters, but also their physical shape. 2 proteins can have the same sequence but a different physical shape, and therefore have different impacts on the body's chemistry. I started a PhD researching DSP methods for matching protein sequences and locations of amino acids. Fun stuff.

Gatsky5y ago

Then there are also post translational modifications, like addition of acetyl or phosphate groups, and sugars to the protein (glycoproteins).

I mean, I can understand how an eye or a brain can evolve by natural selection, but I’m still stunned by abiogenesis. I guess we’ll never know for sure how it all started.

MauranKilom5y ago

The exponentiation signs got lost in your quote. Would you mind adding them back in?

phreeza5y ago

People tend to think of genetic code as a sort of assembly language which is very verbose, but I wonder if the correct way to view it is in fact a very terse domain-specific language, because it actually depends on the entire complex machinery of the cell to be present in order to work, which in itself contains a lot of information?

jldugger5y ago

> I wonder if the correct way to view it is in fact a very terse domain-specific language

Honestly, na. It's pretty verbose. There's a lot of weird ass things in there like "Skip basepairs until you find the matching terminating sequence" (I think it's AG .* GA but its been a decade since my bioinformatics course), but you still have to include the non-AA-coding basepairs in the middle of that.

Compensating for that is the fact that there are like, multiple independent programs; if a ribosome is offset by a single base pair, the result is entirely different. If it runs the other strand, the result is different. And instead of crashing like any program would, biology just learns to use all of those possible encodings. In part, this works because there are 64 possible codons but only 20 amino acids, and the redundancy allows a substitution to affect only some of the offsets.

Tuna-Fish5y ago

Yes. Another important metaphor is that the common idea of DNA as blueprints is entirely wrong. It's not blueprints, it's a recipe. A blueprint describes what something is. A recipe describes the steps needed to make something, making use of a lot of complex existing machinery and parts with only a reference to them.

2 more replies

airstrike5y ago

> and I can easily fit it on my screen.

...with GATACCA right in the middle, but unfortunately with no GATTACA that I could find.

staplung5y ago

Heh. Technically, there isn't even GATTACA in there since it's RNA and hence all the T's are actually U's. It's just convention to use the T's. GAUUACA doesn't have the same ring to it.

andagainagain5y ago

I'm estimating roughly 90-ish characters in a row, roughly 40 rows encoding the spike protein. So about 3600 base pairs. There are 3 base pairs per amino acid, so That's 1200 amino acids.

For comparison, the smallest chain that they technically call a protein is 100 amino acids that's an arbitrary limit to separate proteins from enzymes. So this thing isn't tiny tiny.

But Titin (also called connectin), a giant protein responsible for passive elasticity in mucles, is ~27,000-35,000 amino acids. So this thing isn't even close to the biggest proteins out there.

flobosg5y ago

> that's an arbitrary limit to separate proteins from enzymes

Do you mean “to separate polypeptides from proteins”? Enzymatic activity has nothing to do with size. For example, one of the smallest enzymes in humans has 62 amino acid residues. And, under certain conditions, even single amino acids can be catalytic.

But yeah, the polypeptide-protein threshold can get fuzzy, especially with the recent advances in miniprotein characterization.

1 more reply

weinzierl5y ago

> "Instead it's no bigger than a large paragraph of text, and I can easily fit it on my screen."

When I saw it, I thought that it could almost fit in a tweet, so I just did it:

https://twitter.com/weinzierl/status/1376807707957719041?s=2...

The sequence takes 16 tweets, 15 if you don't split at line endings and remove spaces (4175 nucleobases / 280 nucleobases/tweet ~ 14.9 tweets).

lifthrasiir5y ago

Or you can use base2048 [1] to compress it down to 3 tweets (4175 nucleobases * 2 bits per nucleobase / 3080 bits per base2048 tweet = 2.7 tweets).

[1] https://github.com/qntm/base2048/

xjlin05y ago

"but I'm utterly amazed at how simple this _appears_."

Remind me the joke of the consultant engineer knows where to make X by the chalk. LOL

anxrn5y ago

Not Moderna, but this [1] was a very useful primer on grokking how the Pfizer vaccine works, especially for computer programmers.

[1] https://berthub.eu/articles/posts/reverse-engineering-source...

nraynaud5y ago

the way I see it we're just at the beginning, and we're mainly copy/pasting a lot of code, we understand some small parts, and generally in the teenage years of genome programming.

I don't know how long it will be before we get a bit more serious with it, but geneticists have a big obstacle in their understanding, any change might needs a thousand strong lifelong population study to be understood. That's way crappier than dumping the assembly or only having the documentation in Chinese.

I will add that moreover the developers might have been even more conservative in their code because they knew it was going for large scale deployment, they probably avoided the cutting edge as much as they could.

ohmyzee5y ago

Bravo! Nice execution of the tips from yesterday's article!

https://www.cs.purdue.edu/homes/dec/essay.criticize.html

softwaredoug5y ago

Great quote from Maurice Hilleman, creator of many (most?) of our childhood vaccines goes something like “Don’t be smart. Instead be careful and accurate”

Lots of these things aren’t complicated. It’s the careful systematic testing and public trust building that’s the hard part.

learnstats25y ago

The genetic code itself is reasonably comparable to ASCII in complexity - every 6 bits is the code for one amino acid in a string, which will fold itself into the required protein.

devenvdev5y ago

I remember a lot of features and especially bug fixes where I had to change one line of code, it took hours to figure out how exactly though. I guess this is kinda similar?

gremlinsinc5y ago

The way it reads like source code, truly makes me circle back to the idea we're all living in a simulation.

biolurker15y ago

Mathematical truths about abstract notions of string theory fit in a line.

fiftyfifty5y ago

The New York Times published an article last year with the entire genome of the SARS-Cov-2 virus, with a breakdown of different sections to explain what protein the RNA codes for and what that protein does. Like you said it was amazing that it all fit within an [albeit long] newspaper article. It doesn't surprise me that the RNA for the vaccine, which only targets a single protein, is even smaller than that. Here's the NY Times article I was referring too:

https://www.nytimes.com/interactive/2020/04/03/science/coron...

amluto5y ago

It appears simple, but a whole lot of work went in to producing that string even pte-COVID. Some of it is generic in the sense that it might apply to any mRNA vaccine. Some is quite specific:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584442/

There’s also (IIRC, no citation right now) prior work suggesting that coronavirus vaccines against the spike are likely to be effective and that vaccines against the N protein might be counterproductive.

fnord775y ago

each one of those letters represents a ~15 atom molecule, so in a way it is a compressed representation

pyinstallwoes5y ago

See RadVac: https://radvac.org/

Make your own, open-source. Really cool.

A user on lesswrong made their own (with no prior experience): https://www.lesswrong.com/posts/niQ3heWwF6SydhS7R/making-vac...

hfjfktmtkrn5y ago

It's not really that simple.

Only two companies in the world succeeded, the French company Sanofi which also tried making a mRNA vaccine failed.

WheelsAtLarge5y ago

True, most pharmaceuticals can't do it now but given the right knowledge, which is known, it can be done relatively fast. I suspect in the next few years there will be many companies that will be able to replicate and advance the process.

fermienrico5y ago

It’s like looking at the binary file and saying “that’s pretty simple” while ignoring the massive amount of machinery that allows us to run that file and use it (CPUs, Motherboards, computers, etc).

I presume a whole bunch goes into making vaccine and this is just the top of the iceberg.

Black1015y ago

so, explain it to me ?

csense5y ago· 9 in thread

The Human Genome Project was completed almost two decades ago, and somebody solved the protein folding problem recently.

Why are we still doing genetics at the machine code level? Shouldn't we have some compilers, assemblers and linkers by now?

6nf5y ago

Protein folding is not solved, that headline was overstating the actual achievement by Google's protein folding solution.

Yizahi5y ago

If I remember correctly "solving" protein folding was essentially some high probability prediction that state A transform to state B with some reasonably high chance, on a big dataset. Or something like that anyway. It's as far from high level work with genetics as creating nanotubes a few molecules long in lab manually is away from industrial production.

lambdadmitry5y ago

The most fundamental reason for that is that it's just not amenable to human mind. We are quite primitive actually, being able to hold only a handful of "things" in our mind at any one time and relying on abstraction to think of more complex things. However, you can't abstract much in biology; there is no locality or separation of concerns, everything affects everything.

Take that piece of RNA. An intuitive mental model is that it's some form of "instruction" or a bunch of instruction, isn't it? It's also wrong, because it just encodes a protein that acts the way it does only because of its shape (that is, one of its potential energy local minimums) and the shape of other proteins around it. That shape is only weakly local, it can be affected by far-away sections of peptide sequence. So it's almost impossible to systematically break it down, you have to consider and model things as a whole , which is insanely complex both computationally and cognitively.

If you want a good mental model of how it works, imagine you assemble a thing from metal balls and springs. You take a few thousands balls and connect most of them with springs of different strengths. You then take this thing and throw it on the floor; it will assume a shape that is implicitly encoded in spring strengths, its environment, and the way you've assembled it. You can even make it change shape if you poke on it the right way. That's how biology works in a nutshell; it's a nightmare to design anything for systems like that. Again, you can't simplify and break down and encapsulate and abstract like you do in programming.

zero_deg_kevin5y ago

Because the problem is significantly more complicated than sequencing and folding.

WJW5y ago

I feel this XKCD describes the situation particularly well: https://xkcd.com/1831/

neuronic5y ago

Maybe after 4 billion years of evolving our code we will get it right.

baby5y ago

My thought exactly. So this thing is like a VM with a bunch of primitive opcode, why can’t someone write a higher-level language or at least some gadgets

WJW5y ago

The problem with trying to program genetics is that there is a bunch of code already running on the system and every variable is a global. You can't just start up a new program with minimal impact on the stuff that is already running, like you can in most human-made computers. Also don't forget that the extremely simplified version of the running system looks like this: https://www.sigmaaldrich.com/technical-documents/articles/bi...

1 more reply

asdff5y ago

Because it's a harder problem than it seems at face value.

koeng5y ago· 8 in thread

The thinking behind attaching a PDF with colors and not a Genbank file is why we can't have nice things in biotechnology.

rubatuga5y ago

Wait, you mean you don't extract genomic data from Excel? The MARCH1 gene brings many interesting surprises.

perl4ever5y ago

Excel finally has a facility for manipulating data that keeps it where you put it. It also incorporates a fairly decent functional programming language. It's called Power Query, not to be confused with all the other things that MS has named starting with "Power" and have no relationship at all and are mostly awful.

The only real annoyance I have with it is that the editor window is modal, like it blocks all the spreadsheets you have open on your machine, and it's primitive even compared to VBA, especially for debugging.

It's not just that it's given me the experience of "this is the way a spreadsheet or BI tool should work" but also "this is the way SQL should work". It's a little cumbersome to do the standard SQL-type operations, but the clean integration of functions means you can implement anything that's missing. Like say, Oracle has grouping sets - you can, and I did, just write a function to do that. I always felt that having a separate procedural language in your database was wrong, but I'd never seen the alternative until now. And I've been falling in love with higher order functions.

1 more reply

chromatin5y ago

I am fond of September 2, myself.

For those not in the know:

https://genomebiology.biomedcentral.com/articles/10.1186/s13...

1 more reply

julienchastang5y ago

Exactly. FAIR (Findable, Accessible, Interoperable and Reusable) principles are at a loss here [1]. The "Reusable" part seems to be especially problematic as the sequence is buried in a PDF file though all aspects of FAIR are compromised here. Edit: It looks like there is now a PR to address this issue [2]

[1] https://www.nature.com/articles/sdata201618

[2] https://github.com/NAalytics/Assemblies-of-putative-SARS-CoV...

ImaCake5y ago

Things are getting better, but it still so so bad. The funny thing about that Nature article is that I recently had to parse a html table from a recent Nature article. Thankgod pd.read_html did a decent job and I then only needed another hour to hunt down all the typos and weird text issues.

brian_herman5y ago

Here you go! https://github.com/brianherman/Assemblies-of-putative-SARS-C...

shellfishgene5y ago

If there is no annotation or metadata FASTA format is usually preferred ;)

2 more replies

flobosg5y ago

My thoughts exactly!

Somewhere, Margaret O. Dayhoff is weeping.

kart235y ago· 8 in thread

What are the purple and blue sections after the stop codon for? I read a little about the 3' region, but for the vaccine, are these sections taken from a particular natural human sequence, or specially engineered for something else?

iso13375y ago

It's the 3' (3-prime) UTR (un-translated region). It can affect the translation of the mRNA.

https://en.wikipedia.org/wiki/Three_prime_untranslated_regio....

The next thing is the poly-A tail:

https://en.wikipedia.org/wiki/Polyadenylation

blasting the 3' UTR, we see it ~50% of it was copied from the human mitochondria

tldr, extra regulatory signals (often not well understood)

dnautics5y ago

Doesn't really make sense for it to copy from the mitochondrion; I presume this thing gets expressed in the cytosol.

fabian2k5y ago

The 3' and 5' untranslated regions are the parts of the mRNA directly before and after the part that encodes the actual protein. So they are themselves not translated into amino acids.

What they actually do can vary, but essentially they can provide places for other things to bind and influence what happens with the mRNA. There are some fancier cases like riboswitches, but you don't see those in humans. The stuff at the start and end of the mRNA also determines stability of the mRNA.

flobosg5y ago

In the BioNTech/Pfizer mRNA, the 3' (or the latter) half of the 3'-UTR comes from human mitochondrial 12S rRNA.

The Moderna one has the 3'-UTR of the alpha subunit of human hemoglobin.

hfjfktmtkrn5y ago

Among other things that purple region determines the "priority"/"intensity" of the whole sequence.

You want it as high as possible to make as much spike protein as possible.

It's proprietary information, mostly they try various possibilities until they find one with high expression.

Asparagirl5y ago

So it’s the mRNA version of !important in CSS?

1 more reply

layoutIfNeeded5y ago

That’s the copyright notice.

_theory_5y ago

That's the part that makes you randomly yell, "Hail Bill Gates!"

andrewcl5y ago· 7 in thread

Cool, but it's the lipid delivery system that is the secret sauce. This is equivalent to giving the source code without a compiler to build it.

airhead9695y ago

Wouldn't the "compiler" be the bioreactor used to mass-produce it and the "installer" be the lipid encapsulation? :)

ineedasername5y ago

Maybe the booster shot can be done with a simple apt-get update.

3 more replies

divbzero5y ago

In the case the bioreactor “compiler” is actually our own cells which read out the mRNA “source code” and translate it into protein. The lipid encapsulation delivers the mRNA to our cells, so perhaps it’s more analogous to a network protocol that delivers source code intact across firewalls and other defenses.

snuxoll5y ago

InstallGene by Flexera

1 more reply

DrAwdeOccarim5y ago

But wasn't the whole Pfizer/BioNTech "secret sauce" leaked online after the EMA was hacked?

subroutine5y ago

Meh, they probably just used lipofectamine (which has been around since the 90s) or something very similar.

https://en.wikipedia.org/wiki/Lipofectamine

https://www.thermofisher.com/us/en/home/brands/product-brand...

Duller-Finite5y ago

lipofectamine is used for in vitro transfection, not in vivo gene delivery. The vaccines use lipid nanoparticles rather than liposomes

2 more replies

sp1rit5y ago· 7 in thread

If my little knowledge from biology class serves me correct, RNA uses Udenine instead of Thymine. But in this document it uses T.

Can somebody explain to me why?

Laforet5y ago

The convention of genomic research is to present all RNA sequences as equivalent cDNA sequences. As this will be the output of most common sequencing platforms.

https://bioinformatics.stackexchange.com/questions/11353/why...

koeng5y ago

DNA is way more stable than RNA. Since you can easily synthesize RNA from DNA, and DNA synthesis technology is much more mature, folks normally synthesize DNA and then derive/make the RNA from it. That makes most researches default to DNA 5' to 3', even when talking about RNA.

feanaro5y ago

You probably mean uracil, not udenine (which doesn't exist AFAIK).

sp1rit5y ago

Yeah, English is my second language. I just thought of a Translation that sounded reasonable rather than looking it up.

shellfishgene5y ago

Note that independently of the notation used the mRNA of those vaccines use even more "weird" bases, such as 1-methyl-3’-pseudouridylyl, to make the vaccine mRNA not be detected by the immune system [1].

[1] https://berthub.eu/articles/posts/reverse-engineering-source...

rolph5y ago

DNA uses base pairs [A,T] and [G,C], this code is for a piece of DNA,. if you keep a DNA sequence in vials for later use, that is much more stable and easier to manipulate, and repair when corrupted.

normally RNA in vivo is complexed with protiens that prevent RNA from folding, and annealing into structure that is not compatible with translation to protien. In the vaccine this isnt happening, this is why RNA is hard to work with and the vaccine must be kept so cold.

This is not to say that DNA is simple to work with, but it solves problems if you dont need direct access to RNA.

Duller-Finite5y ago

RNA uses uracil/uridine rather than thymine, but uridine is actually quite immunogenic. That's what has prevented people from using mRNA as a therapy until recently, when the founders of BioNTech figured out that they could use pseudouridine (abbreviated as Ψ) instead. See [1] for more information.

[1]https://www.wired.co.uk/article/mrna-coronavirus-vaccine-pfi...

VectorLock5y ago· 6 in thread

People joked a lot about "injectible source code / machine code" but it is kind of interesting injecting yourself with something that has the source on github.

dnautics5y ago

Note that this is a sequencing result, so it will lack a lot of nonstandard RNA tricks that these companies are or might be using, like pseudouridines, or fluorobases. I think those would have to be disclosed in the patent.

VectorLock5y ago

Less like original source code, more like a clean room reverse engineering.

vmception5y ago

We aren’t that different from machines, we just need to know more about the CPU and all the co-processors and how the logic gates interact

But for now we can inject code to trigger protein configuration via the immune system

anyfoo5y ago

> We aren’t that different from machines, we just need to know more about the CPU and all the co-processors and how the logic gates interact

Except it is unfortunately not that simple, because it assumes that distinct components such as CPU, co-processors and even logic gates exist in that context, as is totally reasonable to assume on devices created by humans. Abstracting complex machines into distinct components is a proven strategy to engineer a system, but it's not a necessity for functioning systems to exist.

In the case of natural organism, they "just" need to work. They don't have a blueprint, and they don't need to be organized in a way that allows for easy understanding by looking at individual parts in separation.

Consider also the difference between machine learning through neural networks ("we stuff a lot of training data in there and get what we want eventually, we hardly understand what the model does or why it fails"), and a QR code reader ("we carefully designed the format from the top down, including e.g. framing, error correction, and several invariants like rotation; if a QR code does not get recognized, we can usually tell exactly where and why it failed").

3 more replies

_joel5y ago

Does that mean a cytokine storm is the equivalent of a buffer overflow or a DoS?

3 more replies

calylex5y ago

Just because you see the RNA/DNA sequence on Github doesn't mean anything, DNA sequencing has been around since at least the early 70s [0]. Many pharmaceutical drugs already employ such techniques.

[0] https://en.wikipedia.org/wiki/DNA_sequencing#History

joeyh5y ago· 5 in thread

My first thought was `wdiff pdizer moderna`. It's short enough to post here in its entirity, but I guess I had better not, anyway it's easy enough to extract from the pdf. Add a space after every letter and wdiff can find the common sequences nicely.

Short except for flavor, this is from near the beginning:

A[-G-]AGA{+A+}GAA{+ATATAAGAC+}CCCG{+GCGCCG+}CCACCATGTTCGTGTTCCTGGTGCTGCTGCC[-T-]{+C+}

flobosg5y ago

A pairwise sequence alignment done with `needle` starts like this:

  BioNTech_Pfiz      1 -----------GAGAATAAACTAGTATTCTTCTGGTCCCCACAGACTCAG     39
                                  |||||.|.|..||||                |||   ||
  Moderna            1 GGGAAATAAGAGAGAAAAGAAGAGTA----------------AGA---AG     31
  
  BioNTech_Pfiz     40 AGAGA----AC-------CCGCCACCATGTTCGTGTTCCTGGTGCTGCTG     78
                       |.|.|    ||       ||||||||||||||||||||||||||||||||
  Moderna           32 AAATATAAGACCCCGGCGCCGCCACCATGTTCGTGTTCCTGGTGCTGCTG     81
  
  BioNTech_Pfiz     79 CCTCTGGTGTCCAGCCAGTGTGTGAACCTGACCACCAGAACACAGCTGCC    128
                       ||.||||||..|||||||||.|||||||||||||||.|.||.||||||||
  Moderna           82 CCCCTGGTGAGCAGCCAGTGCGTGAACCTGACCACCCGGACCCAGCTGCC    131

type_enthusiast5y ago

Knowing nothing about biotech – if Moderna and Pfizer were working from the same sequencing data, why would their resulting vaccine mRNA sequences be different? Even slightly?

Edit: I guess what I'm asking is: presumably these vaccines both target the spike protein. Do both of these sequences express the same protein? Or is there a "close enough!" thing in the immune system, where it can be a little different and still be targeted by the immune system?

7 more replies

koeng5y ago

Us folks in biotech have a special tool just for this :) https://blast.ncbi.nlm.nih.gov/Blast.cgi

Unfortunately, the core algorithm dates back to 1990, so it can be real slow in some cases. Biotech takes a while to improve :(

flobosg5y ago

I think you meant https://www.ebi.ac.uk/Tools/psa/emboss_needle/ or https://www.ebi.ac.uk/Tools/psa/emboss_water/ ;-)

asdff5y ago

You can also run blast locally if you need to throw more hardware at it.

jonplackett5y ago· 5 in thread

Rather disappointingly, neither sequence includes the string 'GATTACA'

ImaCake5y ago

A given combination of 7 bases has a probability of occurring of 1/16,384. Since the COVID genome is about 22k bases long I guess you have pretty good chance of it appearing in there somewhere. This assumes uniformity, which of course is not true. COVID’s genome is under crazy intense selection pressure!

shellfishgene5y ago

The sequence GATTCA appears 4 times in the reference version of the COVID genome :) (Go to https://www.ncbi.nlm.nih.gov/nuccore/NC_045512, pick "Find in this Sequence" on the right)

1 more reply

calebm5y ago

That would have been a killer easter egg (possibly literally).

joe456432345y ago

Whats special about this string?

Duber5y ago

It´s the title of a cult film: https://www.imdb.com/title/tt0119177/

1 more reply

wonderwonder5y ago· 5 in thread

We are simply programmable machines, its pretty interesting that all of human life can be reduced down to 30k editable microservices.

8note5y ago

That gives me the feeling that those reflexion models could do some help for improving our understanding of those microservices

hutzlibu5y ago

"its pretty interesting that all of human life can be reduced down to 30k editable microservices."

I don't know much about DNA and co, but it sounds as microservice is not the right metapher. Rather just 30k sourcecode?

Because a microservice is something that is already compiled and running..

wonderwonder5y ago

Was looking at it as each gene is a microservice and performs a role. Those microservices can be added to, edited / eliminated or swapped out.

qbasic_forever5y ago

Sure, but if you took that 30k of data and dropped it on a planet just like earth it would still take 10k years or so for us to build civilizations as we know it again.

ngcc_hk5y ago

Not 10k year as it needs to go through the million years scale - rna, hot, uv then dna ... with no oxygen to oxygen etc. Then million of years of evolving ... scale is a bit off.

1 more reply

yrral5y ago· 5 in thread

Related: Here's a article from late last year describing and explaining the source code of Pfizer vaccine:

https://berthub.eu/articles/posts/reverse-engineering-source...

It's a very interesting read and I hope the author makes another post explaining the differences of the two mrna vaccines.

throwawaysea5y ago

From that link:

> The injection contains volatile genetic material that describes the famous SARS-CoV-2 ‘Spike’ protein. Through clever chemical means, the vaccine manages to get this genetic material into some of our cells.

> These then dutifully start producing SARS-CoV-2 Spike proteins in large enough quantities that our immune system springs into action. Confronted with Spike proteins, and (importantly) tell-tale signs that cells have been taken over, our immune system develops a powerful response against multiple aspects of the Spike protein AND the production process.

What happens to the "volatile genetic material" at the end of this? Does it just linger in the body indefinitely? Or does it somehow get destroyed (and what does that mean)? From my reading of the above excerpt, it's the produced spike proteins that get destroyed but not the original genetic material that's injected. The reason I'm asking is to understand how the vaccine designers determine if there are any long-term effects of having this artificial material inside your body. They couldn't have tested it over a long time frame given how quickly all this moved.

fabian2k5y ago

The mRNA is stable for a few hours or so, it is both chemically unstable in solution under the conditions in a cell and also actively degraded by various mechanisms.

atleta5y ago

Read the article, it answers your question in detail:

> The very end of mRNA is polyadenylated. This is a fancy way of saying it ends on a lot of AAAAAAAAAAAAAAAAAAA. Even mRNA has had enough of 2020 it appears.

> mRNA can be reused many times, but as this happens, it also loses some of the A’s at the end. Once the A’s run out, the mRNA is no longer functional and gets discarded. In this way, the ‘poly-A’ tail is protection from degradation.

Also, your cells continuously make mRNAs, depending on what proteins they need to synthesize. And those (have to) get discarded too. And also this is what happens to the actual viral RNA when the virus attacks you for real.

1 more reply

outworlder5y ago

> The reason I'm asking is to understand how the vaccine designers determine if there are any long-term effects of having this artificial material inside your body

The properties of mRNA are well known and have been for decades. Your cells are constantly producing more from the nucleus. It degrades, even more so when it gets transcribed. That's the beauty of this, it's self-limiting.

The only 'artificial' thing about it is the special base that's added to avoid detection by the immune system. Everything else is the exact same compounds present in your cells.

1 more reply

alexobenauer5y ago

It is worth noting that the studies are still ongoing. The Phase 3 trials are two years.

ur-whale5y ago· 5 in thread

What this does, as a non-biotech person, I believe I understand at a high level: plonk this code into a ribosome and out comes the desired protein.

What I don't understand is:

   a) how the m-RNA code relates to the produced protein (i.e I can read C-code and get an idea of what is does fairly quickly, but can the same be said of m-RNA and the resulting protein)?

   b) how did they get their hands on that code in the first place? Do the coronaviruses use m-RNA as well? Was then a coronavirus somehow "dissected" to get at the spike protein "source code"?

koeng5y ago

Answers:

a) From the mRNA you can learn the amino acid sequence of the protein very quickly. You absolutely cannot (yet) learn the function of the protein from that sequence - normally, people just do comparisons with proteins whose functions ARE known. Oftentimes in enzymes there are "domains" or little functional regions that stay consistent over long periods of time, so that's a good way to assign function (given knowledge of other proteins in the same family)

b) Yep. Every virus at some point in their lifecycle use mRNA. You can just sequence the virus and get all that data (I've done that on SARS-COV-2, it's honestly pretty easy). Then you just do homology alignment (as stated above) and you can figure out approximately what each gene does.

The problem of de-novo protein prediction is ONE OF THE HARDEST PROBLEMS IN BIOTECH, but just like getting amino acid sequence, doing homology searches, sequencing viruses, etc, is basic biotech and I'd expect an eager high schooler or undergrad to be able to do them.

ur-whale5y ago

Thanks !

azernik5y ago

a) I don't know if protein-folding software is good enough to figure out the exact structure of the resulting protein given just the gene, but I suspect you could figure out through the equivalent of the strings command - looking for sub-chains of the protein, and looking for matching sequences in the gene

b) Coronaviruses happen to be RNA viruses; that is, their genomes are RNA rather than DNA. DNA viruses also exist and are common. We got full genomes from sequencing early in the pandemic, and continue to use it to monitor the evolution of the virus (see e.g. [1], where the results are available for download). Sequencing is very cheap and easy these days - you take a sample from a patient, use chemicals to break down all the cell membranes and such, sequence all of the DNA and RNA in it, and look through the results for a virus genome (i.e. something that isn't a human chromosome and isn't a known virus or bacterial genome). "m"RNA is more a description of the function than the chemical - tRNA and rRNA are short snippets of RNA used for manufacturing purposes inside the cell, while mRNA is the long chunks that actually carry information from the DNA to the protein manufacturing sites. Virus RNA basically functions as imposter mRNA, getting those manufacturing systems to make more viruses.

[1] https://www.ncbi.nlm.nih.gov/datasets/coronavirus/genomes/ - SARS-CoV-2 is the COVID-19 virus. As of my fetch, there are 71,509 full sequences of the virus, reflecting slight mutations over time and space.

flobosg5y ago

a) Yes, you can translate a mature[1] mRNA sequence, codon by codon, from the start until the stop codon, and it will give you the sequence of the protein it encodes.

b) Coronaviruses have a RNA genome. Researchers extracted it from wild-type viruses and then sequenced it.

[1]: mRNAs can undergo several maturation steps, such as splicing, which removes regions that won’t be translated into protein.

grey4135y ago

Everyone else has had good answers, but I'm also going to note that we knew a ton about covid's general molecular biology well before it ever came into existence. Covid (more properly, SARS-CoV-2) is a cronovirus. Cronoviruses have been studied for some time since some of them cause common colds, and studied very intensely since 2002 when SARS showed pandemic potential. So when Covid showed up, there was a ton of prestablished information and expertise avaliable to help every element of the pandemic response.

elliekelly5y ago· 4 in thread

I’m a little confused by the title? Looking at the document, it seems to me (knowing next to nothing about this field) it includes both Pfizer and Moderna’s protein spike sequence in figures 1 and 2, respectively. Is that correct?

It’s also interesting the way it’s worded: that the sequence was “assembled from $vaccine”. Does that mean whoever published this has backed into these sequences rather than having gathered this information directly from the source(s)?

phcordner5y ago

You are correct. The researchers here sequenced each vaccine starting with the bit of vaccine left in the vial after administration. The goal was to get a raw sequence of the Moderna mRNA component so it can be easily filtered out as being a signal of therapeutic origin. Pfizer's sequence has already been published; it's incldued here to confirm that the result achieved experimentally matches the published sequence.

flobosg5y ago

The authors reverse engineered the sequences of the vaccines, obtaining them from the remaining mRNA present in the vials.

“Assembly” in this case means that they merged several short sequences they obtained, each representing a fragment of the whole mRNA sequence.

hfjfktmtkrn5y ago

They sequenced vaccine leftover remaining in used vials.

So reverse engineering basically.

usrusr5y ago

And reverse engineering only sounds dramatic until you take a step back and acknowledge that it's what they literally do all the time. Only that usually the sequences they read are not the outcome of some human development effort but of naturally occurring evolutionary processes.

aty268OP5y ago· 3 in thread

'A group of Stanford researchers has hacked Moderna’s messenger RNA (mRNA) vaccine for the novel coronavirus, Motherboard first reported on Monday, and published its entire genetic sequence on the open-source code repository Github.'

https://gizmodo.com/stanford-scientists-post-entire-mrna-seq...

throwawaysea5y ago

What does "hacked" mean here? The article makes it sound like this wasn't something illegal:

> Fire and Shoura told Motherboard that they had received permission from the FDA to collect scraps of vaccines that wouldn’t have otherwise been used from empty vials and that they’d notified Moderna in advance of their plans to publish the sequence without receiving any objection in turn.

Also:

> The research team told Motherboard that they didn’t “reverse engineer” the vaccine, they simply “posted the putative sequence of two synthetic RNA molecules that have become sufficiently prevalent in the general environment of medicine and human biology in 2021.”

I'm not familiar enough with how these sequences to work to understand what's being discussed. Is it simply that they took a sample of the vaccine and studied its composition using some standard machine/process?

cwkoss5y ago

It means the gizmodo author is trying to get more views.

flobosg5y ago

> Is it simply that they took a sample of the vaccine and studied its composition using some standard machine/process?

That’s exactly what they did.

nsxwolf5y ago· 3 in thread

ELI5, Why are the sequences different if they result in the same spike protein?

rnestler5y ago

Maybe one could compare it to having different recipes for the same cake. Or different source code to solve the same problem.

ssijak5y ago

Different codons can encode the same amino acids, so different sequences can encode the same protein.

shakow5y ago

I didn't check it was the only explanation, but the DNA -> protein encoding is surjective.

obilgic5y ago· 3 in thread

so how are the first and the second dose different?

meepmorp5y ago

They're the same. It's just a second dose as a booster.

takeda5y ago

I believe only Sputnik vaccine has different first and second dose, but their vaccine is of different category (it belongs to the same as AstraZeneca and Johnson and Johnson). The reason is that these vaccine use a vector (adenovirus) and there's a risk that body will develop antibodies for the vector and the second dose might not be as effective.

jonbaer5y ago

"Some vaccines require two doses because the immune response to the first dose is rather weak. The second dose helps to better reinforce this immune response." - I would have to think over time that could be optimized somehow to just require one w/ ML and test results, etc.

zappo29385y ago· 2 in thread

Wow Looks like it is analogous to having a header on a TCP packet. [0] Here is an animation of mRNA encoding translated to proteins inside a ribosome. [1]

"The ribosome is composed of one large and one small sub unit that assemble around the messenger RNA, which then passes through the ribosome like a computer tape. The amino acid building blocks, that's the small glowing red molecules, are carried into the ribosome attached to specific transfer RNAs; that's the larger green molecules also referred to as tRNA. The small sub unit of the ribosome positions the mRNA so that it can be read in groups of three letters known as a codon."

Very analogous indeed.

[0] https://xerocrypt.wordpress.com/2014/07/22/how-to-read-almos...

[1] https://www.youtube.com/watch?v=TfYf_rPWUdY

retrac5y ago

Some parts of gene transcription are so straightforward one can almost be tricked into thinking it has the logic of a computer program. It may be an illusion. To stretch the metaphor, TCP parsers don't match probabilistically along the entire length of the packet in parallel, and they don't interpret the same part of a packet as data in some contexts, and a header in others.

robbiep5y ago

I ended up majoring in biochemistry and molecular biology in my undergrad because I was browsing on Wikipedia one day and came across an article written on an E. Coli variant that had sentences like:

01J3 e. Coli has a DNA Polymerase that contains 3k’-5’ proofreading capability and 5’-3’ error correcting with a polymerisation rate of 50bps

I’ve made the above up because I have never been able to find a Wikipedia page winxe that as succinctly pointed out to me that biology was a machine and I was hooked

1 more reply

mrfusion5y ago· 2 in thread

The lipid container is weird to me. Is that all it takes to send instructions inside a cell? Seems like a security hole. Why haven’t viruses evolved to have a lipid container?

inportb5y ago

> Why haven’t viruses evolved to have a lipid container?

They have. https://en.wikipedia.org/wiki/Viral_envelope

jforman5y ago

I was going to say you can't get very far without a protein vehicle, but then I remembered that's quite incorrect:

https://en.wikipedia.org/wiki/Retrotransposon

The injection is important, however, as it gets the genetic material past a whole lot of nucleases that cover your epithelia.

mrfusion5y ago· 2 in thread

So what moves the new protein out of your cells once the rna is processed? Don’t most proteins stay inside the cell?

lowdanie5y ago

There is a system that transports protein fragments to the cell surface and “presents” them to the immune cells: https://en.m.wikipedia.org/wiki/Antigen_presentation

mrfusion5y ago

Thanks. So what makes that happen in this case? Is it because internally the cell doesn’t recognize the protein? Or it does this for all proteins it makes? Does say some hemoglobin get transported to the cell surface?

1 more reply

karolkozub5y ago· 1 in thread

It looks like a machine code snippet. I wonder if we'll develop high level languages and compilers for genetic code in the future.

ImaCake5y ago

I imagine we have tools in that direction, but nothing complete. Unlike math and computers, biological systems don’t really go from a uniform set of simple rules to emergent complexity - there is a whole lot of sideways complexity thrown in.

Something that might fit the computation vision of your comment are the various Ontologies for bioinformatics. The Gene Ontology is probably the most complete, although it lags many years behind the literature.

http://geneontology.org/

jturolla5y ago· 1 in thread

Please someone... create some abstraction language for this bio-assembly code. Can we make LLVM compile this? :joy:

jakeogh5y ago

Checkout https://github.com/clasp-developers/clasp

"Clasp: Common Lisp using LLVM and C++ for Designing Molecules": https://www.youtube.com/watch?v=0rSMt1pAlbE

dooopy5y ago· 1 in thread

I compared the spike encoding regions, and it looks like they're quite different...I wonder if the codons wind up coding for different amino acids. And who got it right?

flobosg5y ago

Their codon compositions were optimized differently, but both coding regions translate to the same amino acid sequence.

tibbydudeza5y ago· 1 in thread

So you have a header/footer sequence that we sort of know is required (remember the MZ and chksum for .EXE files) but we have no idea what that bits in between does except we can read the letters and copied it in part from the actual virus.

6nf5y ago

We do know that bit in the middle encodes the structural spike protein of the virus

verytrivial5y ago· 1 in thread

There are people who could memorize this. And it would weirdly be more useful than digits of π!

whitepaint5y ago

I think memorizing any of these two is pretty much totally useless.

stevefrench935y ago· 1 in thread

I wouldn't install beta software on a production system though.

rossdavidh5y ago

Well we do with anti-malware stuff, when a 0-day comes out and we know there are exploits in the wild and beta software is all we've got.

1 more reply

singularity20015y ago· 1 in thread

tangential: do biologists sometimes use some form of base 64 encoding for their triplets? so instead of AAG.TCA.GGA just g5F or something?

other than the obvious advantage of being shorter, it would also be easier to read: the boundaries would be unambiguous and each char would correspond directly to and amino acid (if applicable/coding)

jebus9895y ago

Proteins are written in standardised IUPAC amino acid codes that carry some semantic meaning, e.g. Alanine: A, Glycine: G etc. Also viral genomes often have overlapping transcription with shifted open reading frames. Biology is not as simple as you think.

brian_herman5y ago· 1 in thread

https://github.com/brianherman/Assemblies-of-putative-SARS-C... I posted some txt files with the lines removed and stuff.

flobosg5y ago

If you have the time, it would be nice to transfer the data to the commonly used Genbank format.

3 more replies

ibraheemdev5y ago· 1 in thread

Is this all another medical company needs to start manufacturing and selling the vaccine themselves? Or is this sequence licensed/proprietary in some way?

akkawwakka5y ago

No. The RNA still needs to be fit in a lipid nanoparticle which is Moderna and BioNTech’s secret sauce.

rjvir5y ago· 1 in thread

This should be an NFT, I'd love to own an NFT of the RNA sequence of the Moderna vaccine.

cwkoss5y ago

No reason you can't make your own NFT of it. Heck, if you promise to pay at least 1.337 ETH I'll figure out how to do it myself and make it for you. :-P

1 more reply

plattyp5y ago· 1 in thread

Who would have thought it'd be this simple

  if covid?(dna)
    block_virus(dna)
  end

rantwasp5y ago

read the article linked in the thread. it actually does not work against all of the covid virus. it works against the spikes.

so, the virus is sort of like a ball with these spikes on top (that’s where the corona name comes from) and the vaccine helps your body develop antibodies against the spikes. so when the virus gets in your body, it actually receives a “haircut” which leads to it no longer being able to enter the cells and hijack their internals to produce more viruses.

it’s extremely clever, but it also means that your code is wrong ;))

flemhans5y ago

Despite how complex this really is, and how many "gotchas" there might be when using this repository, it's nice that it gets a shitload of attention. As a united humanity we should strive to solve our common problems.

bionhoward5y ago

we wrote some code last year to build a big Trie of the whole transcriptome -- you could use it to fuzzy-search to see if this mRNA is within some edit distance of any piece of normal human RNA, because then it could theoretically cause side effects via RNA interference. stopped the project because I can't afford to develop a gene therapy right now, but the fuzzy search worked

https://github.com/bionicles/coronavirus

to make the trie use the function here. the variable K is the length of the Kmers (runs of RNA). Larger values are gonna take a lot longer. ( warning: big job, uses multiprocessing...pypy recommended for speed ) https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...

then you could use this recursive function to generate potential matches within some cutoff https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...

the function right below it converts the generator to a list. then you could save that

enjoy

spullara5y ago

I highly recommend reading about Ribosomes. They are made up of two pieces that were likely independent at some time. It becomes quite clear that "life" began as a machine that all it could do was replicate itself:

https://en.wikipedia.org/wiki/Ribosome

You can think of RNA as a copy of a section of DNA. They look very much like computer programs except rather than producing code, the Ribosome can read them and translate each codon for an amino acid into its corresponding actual amino acid that it then binds together into a protein. The execution engine is the environment of the cell. All highly probabilistic rather than deterministic. I can't imagine any programmer not finding them completely fascinating.

ineedasername5y ago

It's also short enough to post the whole thing to Wikipedia, so that's probably inevitable along with some very entertaining edit wars.

flobosg5y ago

> So how different is the mRNA in the Moderna, BioNTech/Pfizer & CureVac vaccines? There are 1274 codon positions. 808 are identical across all 3 vaccines. 103 are unique to Moderna, 249 unique to BioNTech, 230 to CureVac

https://twitter.com/PowerDNS_Bert/status/1375091898797453326

mushroomzulu5y ago

https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_...

em3rgent0rdr5y ago

Are there any visual compilers that simulate the process of using these sequences to assemble a protein?

narrator5y ago

So I guess Josiah Zayner has to pick up on this now and do a DIY Moderna COVID vaccine video. He already did a DIY vaccine video with full open source documentation on how to do it yourself.

http://www.josiahzayner.com/2020/12/i-made-covid-19-vaccine-...

The_rationalist5y ago

I would love to see the output structure from Alphafold of this RNA source code

pknerd5y ago

Can someone give me the link of FASTA files of these sequences?

husamia5y ago

if you have understanding of how the sequence mutates then you can predict what the next strain is going to be and design spike protein that matches it.

stjohnswarts5y ago

ELI5 could this be used by "evil governments" to make designer pathogens to release during doomsday situations (say by North Korean leaders in their suicide bunkers if things went badly) ?

sktrdie5y ago

No package.json found, won't install.

StaticRice5y ago

Archive.org mirror: https://web.archive.org/web/20210326214140/https://raw.githu...

anonu5y ago

This is amazing. It appears quite "simple" - of course I know nothing about this part of the sciences.

I do think back to the early days of Covid when there were all these predictions around when a vaccine would show up. It seemed like there was knowledge that the mRNA platform would be the likely solution and probably by April we knew a vaccine would be possible - it just took 6+ months to test.

Thinking about that timeline amazes me.

peter3035y ago

One of Modernas cofounders, MIT Prof Robert Langer, was profiled on 60 Minutes a few years back as MITs most prolific patent holder. He specialized in nanoparticle delivery systems to any desired internal tissue. One can deliver medicine, nutrients, diagnostics, etc where and when they want. Vaccines are just a small of subset of these applications.

djmips5y ago

Where's the JSON versions?

squarefoot5y ago

As a software/hardware guy who knows less than zero about the subject: is this something that (given the right resources) makes possible to replicate the vaccines? I mean in countries where they can't afford enough vaccines but already have or could invest in the ability to replicate them without caring about patents.

aden1ne5y ago

Why not in fasta format?

omlet5y ago

Where is the 5G stack?

a-dub5y ago

i'm a dna noob: is it possible to do the growing and sampling thing to get the sequence from a sample of the vaccine or does the bubble of fat get in the way?

p0rkbelly5y ago

obligatory:

"I could have done this in a weekend"

person_of_color5y ago

How long before we can 3d print an mRNA vaccine?

bvanderveen5y ago

> .docx.pdf

Cargo-cult much?

savrajsingh5y ago

My question is does the Johnson & Johnson DNA-based vaccine encode for the exact same spike protein, or a different one they chose to target? From this PDF I conclude both the moderna and Pfizer vaccines target the same protein.

omlet5y ago

Where is the code about 5G modem?

j / k navigate · click thread line to collapse

367 comments

209 comments · 56 top-level

drtz5y ago· 53 in thread

I expected to have to scroll through pages upon pages of indecipherable text. Instead it's no bigger than a large paragraph of text, and I can easily fit it on my screen.

azernik5y ago

The technically challenging parts are:

- manufacturing: both producing the mRNA itself at a large scale, and inserting it into the delivery mechanism, at a large scale and in low-temperature conditions

- testing: the newly-developed payload and the existing platform were integrated at small scales within weeks, but testing the thing for safety and efficacy took months

dnautics5y ago

sequence is actually released by Moderna in their patent:

https://www.modernatx.com/sites/default/files/US10702600.pdf

though they do present multiple sequences, so I guess you'd have to go to the FDA application to figure out exactly which one got used.

2 more replies

dylan6045y ago

> put together their sequences in a weekend

meh, I could do that over a weekend never sounded so scary, or impressive at the same time. That weekend just so happened to stand on the shoulders of prior decades of research though.

i guess this is big pharma's version of `apt-get install`

3 more replies

wespiser_20185y ago

1 more reply

outworlder5y ago

Of note, the immune system is pretty good at destroying foreign mRNA so you also need to evade it.

This article is pretty good: https://berthub.eu/articles/posts/reverse-engineering-source...

2 more replies

mschuster915y ago

1 more reply

schoen5y ago

> Hence they feel safe releasing this. Their moat is not the gene sequence, their moat is everything else.

Edit: See https://news.ycombinator.com/item?id=26628594 for more substantive discussion about this.

1 more reply

MuffinFlavored5y ago

> but testing the thing for safety and efficacy took months

What kind of tweaks were made from "the version they threw together in a weekend" to "the version that is in production now"? What's a typical "mRNA" feedback iteration loop like?

2 more replies

amelius5y ago

Would it be possible to use the same delivery mechanism for other mRNA sequences?

1 more reply

The_rationalist5y ago

Sounds like a problem you solve once and for all, for any vaccine. And also that this problem was already solved since decades (e.g viral vectors)

2 more replies

Yizahi5y ago

Additional reading (was posted here some time ago):

https://blogs.sciencemag.org/pipeline/archives/2021/02/02/my...

Why manufacturing of these vaccines is a hard part.

jldugger5y ago

Liken it to the 4kb demoscene: it's amazing what can be done with a little bit of information, as long as you don't have to describe the machine running it.

GuB-425y ago

> Liken it to the 4kb demoscene

Coincidentally, the mRNA sequences for both vaccines are about 4kb (kilobase) long.

lettergram5y ago

It really is that “simple.”

Getting it designed and building it is more difficult.

Inject this mRNA into a cell and it’ll create the protein. Anything can be injected at this point once the mechanism for injection is developed

wombatpm5y ago

Which makes me wonder. Could you place the entire virus genome in these liposomes and get them to hijack the machinery to make an entire virus? Like plasmid but for viral structures?

2 more replies

flobosg5y ago

shellfishgene5y ago

1 more reply

mattnewton5y ago

I think it's a bit like a private key- the difficulty is in finding some combination that works in an absolutely massive space of possible proteins, not necessarily in the length of the protein.

DecoPerson5y ago

Check out this video by The Thought Emporium to see how far we’ve come in these matters:

https://youtu.be/J3FcbFqSoQY

This should hopefully provide you with some useful perspective.

gerdesj5y ago

"but I'm utterly amazed at how simple this _appears_."

Biology is a funny old thing. You can look at that concise description - the orange and so on blocks of a few letters and a few short groupings.

Now dive into one of those bases and decompose it to its constituent atoms. Now look at the maths around this stuff. It gets quite complicated, quite quickly.

That said, the fact that a bloody complicated thingie can be described so concisely is absolutely amazing and as you say it looks so simple.

flemhans5y ago

It'd be cool to make an easy-to-use interface, still.

puzzlingcaptcha5y ago

Here is a breakdown https://berthub.eu/articles/posts/reverse-engineering-source... discussed previously https://news.ycombinator.com/item?id=25538820

tablespoon5y ago

Neat.

4 more replies

purple_ferret5y ago

Any individual protein doesn't seem that complex since it's just a combination of some 20 amino acids, but the variations are endless:

abfan11275y ago

Gatsky5y ago

Then there are also post translational modifications, like addition of acetyl or phosphate groups, and sugars to the protein (glycoproteins).

I mean, I can understand how an eye or a brain can evolve by natural selection, but I’m still stunned by abiogenesis. I guess we’ll never know for sure how it all started.

MauranKilom5y ago

The exponentiation signs got lost in your quote. Would you mind adding them back in?

phreeza5y ago

jldugger5y ago

> I wonder if the correct way to view it is in fact a very terse domain-specific language

Tuna-Fish5y ago

2 more replies

airstrike5y ago

> and I can easily fit it on my screen.

...with GATACCA right in the middle, but unfortunately with no GATTACA that I could find.

staplung5y ago

Heh. Technically, there isn't even GATTACA in there since it's RNA and hence all the T's are actually U's. It's just convention to use the T's. GAUUACA doesn't have the same ring to it.

andagainagain5y ago

I'm estimating roughly 90-ish characters in a row, roughly 40 rows encoding the spike protein. So about 3600 base pairs. There are 3 base pairs per amino acid, so That's 1200 amino acids.

For comparison, the smallest chain that they technically call a protein is 100 amino acids that's an arbitrary limit to separate proteins from enzymes. So this thing isn't tiny tiny.

But Titin (also called connectin), a giant protein responsible for passive elasticity in mucles, is ~27,000-35,000 amino acids. So this thing isn't even close to the biggest proteins out there.

flobosg5y ago

> that's an arbitrary limit to separate proteins from enzymes

But yeah, the polypeptide-protein threshold can get fuzzy, especially with the recent advances in miniprotein characterization.

1 more reply

weinzierl5y ago

> "Instead it's no bigger than a large paragraph of text, and I can easily fit it on my screen."

When I saw it, I thought that it could almost fit in a tweet, so I just did it:

https://twitter.com/weinzierl/status/1376807707957719041?s=2...

The sequence takes 16 tweets, 15 if you don't split at line endings and remove spaces (4175 nucleobases / 280 nucleobases/tweet ~ 14.9 tweets).

lifthrasiir5y ago

Or you can use base2048 [1] to compress it down to 3 tweets (4175 nucleobases * 2 bits per nucleobase / 3080 bits per base2048 tweet = 2.7 tweets).

[1] https://github.com/qntm/base2048/

xjlin05y ago

"but I'm utterly amazed at how simple this _appears_."

Remind me the joke of the consultant engineer knows where to make X by the chalk. LOL

anxrn5y ago

Not Moderna, but this [1] was a very useful primer on grokking how the Pfizer vaccine works, especially for computer programmers.

[1] https://berthub.eu/articles/posts/reverse-engineering-source...

nraynaud5y ago

the way I see it we're just at the beginning, and we're mainly copy/pasting a lot of code, we understand some small parts, and generally in the teenage years of genome programming.

ohmyzee5y ago

Bravo! Nice execution of the tips from yesterday's article!

https://www.cs.purdue.edu/homes/dec/essay.criticize.html

softwaredoug5y ago

Great quote from Maurice Hilleman, creator of many (most?) of our childhood vaccines goes something like “Don’t be smart. Instead be careful and accurate”

Lots of these things aren’t complicated. It’s the careful systematic testing and public trust building that’s the hard part.

learnstats25y ago

The genetic code itself is reasonably comparable to ASCII in complexity - every 6 bits is the code for one amino acid in a string, which will fold itself into the required protein.

devenvdev5y ago

I remember a lot of features and especially bug fixes where I had to change one line of code, it took hours to figure out how exactly though. I guess this is kinda similar?

gremlinsinc5y ago

The way it reads like source code, truly makes me circle back to the idea we're all living in a simulation.

biolurker15y ago

Mathematical truths about abstract notions of string theory fit in a line.

fiftyfifty5y ago

https://www.nytimes.com/interactive/2020/04/03/science/coron...

amluto5y ago

It appears simple, but a whole lot of work went in to producing that string even pte-COVID. Some of it is generic in the sense that it might apply to any mRNA vaccine. Some is quite specific:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584442/

fnord775y ago

each one of those letters represents a ~15 atom molecule, so in a way it is a compressed representation

pyinstallwoes5y ago

See RadVac: https://radvac.org/

Make your own, open-source. Really cool.

A user on lesswrong made their own (with no prior experience): https://www.lesswrong.com/posts/niQ3heWwF6SydhS7R/making-vac...

hfjfktmtkrn5y ago

It's not really that simple.

Only two companies in the world succeeded, the French company Sanofi which also tried making a mRNA vaccine failed.

WheelsAtLarge5y ago

fermienrico5y ago

I presume a whole bunch goes into making vaccine and this is just the top of the iceberg.

Black1015y ago

so, explain it to me ?

csense5y ago· 9 in thread

The Human Genome Project was completed almost two decades ago, and somebody solved the protein folding problem recently.

Why are we still doing genetics at the machine code level? Shouldn't we have some compilers, assemblers and linkers by now?

6nf5y ago

Protein folding is not solved, that headline was overstating the actual achievement by Google's protein folding solution.

Yizahi5y ago

lambdadmitry5y ago

zero_deg_kevin5y ago

Because the problem is significantly more complicated than sequencing and folding.

WJW5y ago

I feel this XKCD describes the situation particularly well: https://xkcd.com/1831/

neuronic5y ago

Maybe after 4 billion years of evolving our code we will get it right.

baby5y ago

My thought exactly. So this thing is like a VM with a bunch of primitive opcode, why can’t someone write a higher-level language or at least some gadgets

WJW5y ago

1 more reply

asdff5y ago

Because it's a harder problem than it seems at face value.

koeng5y ago· 8 in thread

The thinking behind attaching a PDF with colors and not a Genbank file is why we can't have nice things in biotechnology.

rubatuga5y ago

Wait, you mean you don't extract genomic data from Excel? The MARCH1 gene brings many interesting surprises.

perl4ever5y ago

1 more reply

chromatin5y ago

I am fond of September 2, myself.

For those not in the know:

https://genomebiology.biomedcentral.com/articles/10.1186/s13...

1 more reply

julienchastang5y ago

[1] https://www.nature.com/articles/sdata201618

[2] https://github.com/NAalytics/Assemblies-of-putative-SARS-CoV...

ImaCake5y ago

brian_herman5y ago

Here you go! https://github.com/brianherman/Assemblies-of-putative-SARS-C...

shellfishgene5y ago

If there is no annotation or metadata FASTA format is usually preferred ;)

2 more replies

flobosg5y ago

My thoughts exactly!

Somewhere, Margaret O. Dayhoff is weeping.

kart235y ago· 8 in thread

iso13375y ago

It's the 3' (3-prime) UTR (un-translated region). It can affect the translation of the mRNA.

https://en.wikipedia.org/wiki/Three_prime_untranslated_regio....

The next thing is the poly-A tail:

https://en.wikipedia.org/wiki/Polyadenylation

blasting the 3' UTR, we see it ~50% of it was copied from the human mitochondria

tldr, extra regulatory signals (often not well understood)

dnautics5y ago

Doesn't really make sense for it to copy from the mitochondrion; I presume this thing gets expressed in the cytosol.

fabian2k5y ago

The 3' and 5' untranslated regions are the parts of the mRNA directly before and after the part that encodes the actual protein. So they are themselves not translated into amino acids.

flobosg5y ago

In the BioNTech/Pfizer mRNA, the 3' (or the latter) half of the 3'-UTR comes from human mitochondrial 12S rRNA.

The Moderna one has the 3'-UTR of the alpha subunit of human hemoglobin.

hfjfktmtkrn5y ago

Among other things that purple region determines the "priority"/"intensity" of the whole sequence.

You want it as high as possible to make as much spike protein as possible.

It's proprietary information, mostly they try various possibilities until they find one with high expression.

Asparagirl5y ago

So it’s the mRNA version of !important in CSS?

1 more reply

layoutIfNeeded5y ago

That’s the copyright notice.

_theory_5y ago

That's the part that makes you randomly yell, "Hail Bill Gates!"

andrewcl5y ago· 7 in thread

Cool, but it's the lipid delivery system that is the secret sauce. This is equivalent to giving the source code without a compiler to build it.

airhead9695y ago

Wouldn't the "compiler" be the bioreactor used to mass-produce it and the "installer" be the lipid encapsulation? :)

ineedasername5y ago

Maybe the booster shot can be done with a simple apt-get update.

3 more replies

divbzero5y ago

snuxoll5y ago

InstallGene by Flexera

1 more reply

DrAwdeOccarim5y ago

But wasn't the whole Pfizer/BioNTech "secret sauce" leaked online after the EMA was hacked?

subroutine5y ago

Meh, they probably just used lipofectamine (which has been around since the 90s) or something very similar.

https://en.wikipedia.org/wiki/Lipofectamine

https://www.thermofisher.com/us/en/home/brands/product-brand...

Duller-Finite5y ago

lipofectamine is used for in vitro transfection, not in vivo gene delivery. The vaccines use lipid nanoparticles rather than liposomes

2 more replies

sp1rit5y ago· 7 in thread

If my little knowledge from biology class serves me correct, RNA uses Udenine instead of Thymine. But in this document it uses T.

Can somebody explain to me why?

Laforet5y ago

The convention of genomic research is to present all RNA sequences as equivalent cDNA sequences. As this will be the output of most common sequencing platforms.

https://bioinformatics.stackexchange.com/questions/11353/why...

koeng5y ago

feanaro5y ago

You probably mean uracil, not udenine (which doesn't exist AFAIK).

sp1rit5y ago

Yeah, English is my second language. I just thought of a Translation that sounded reasonable rather than looking it up.

shellfishgene5y ago

[1] https://berthub.eu/articles/posts/reverse-engineering-source...

rolph5y ago

DNA uses base pairs [A,T] and [G,C], this code is for a piece of DNA,. if you keep a DNA sequence in vials for later use, that is much more stable and easier to manipulate, and repair when corrupted.

This is not to say that DNA is simple to work with, but it solves problems if you dont need direct access to RNA.

Duller-Finite5y ago

[1]https://www.wired.co.uk/article/mrna-coronavirus-vaccine-pfi...

VectorLock5y ago· 6 in thread

People joked a lot about "injectible source code / machine code" but it is kind of interesting injecting yourself with something that has the source on github.

dnautics5y ago

VectorLock5y ago

Less like original source code, more like a clean room reverse engineering.

vmception5y ago

We aren’t that different from machines, we just need to know more about the CPU and all the co-processors and how the logic gates interact

But for now we can inject code to trigger protein configuration via the immune system

anyfoo5y ago

> We aren’t that different from machines, we just need to know more about the CPU and all the co-processors and how the logic gates interact

3 more replies

_joel5y ago

Does that mean a cytokine storm is the equivalent of a buffer overflow or a DoS?

3 more replies

calylex5y ago

Just because you see the RNA/DNA sequence on Github doesn't mean anything, DNA sequencing has been around since at least the early 70s [0]. Many pharmaceutical drugs already employ such techniques.

[0] https://en.wikipedia.org/wiki/DNA_sequencing#History

joeyh5y ago· 5 in thread

Short except for flavor, this is from near the beginning:

A[-G-]AGA{+A+}GAA{+ATATAAGAC+}CCCG{+GCGCCG+}CCACCATGTTCGTGTTCCTGGTGCTGCTGCC[-T-]{+C+}

flobosg5y ago

A pairwise sequence alignment done with `needle` starts like this:

  BioNTech_Pfiz      1 -----------GAGAATAAACTAGTATTCTTCTGGTCCCCACAGACTCAG     39
                                  |||||.|.|..||||                |||   ||
  Moderna            1 GGGAAATAAGAGAGAAAAGAAGAGTA----------------AGA---AG     31
  
  BioNTech_Pfiz     40 AGAGA----AC-------CCGCCACCATGTTCGTGTTCCTGGTGCTGCTG     78
                       |.|.|    ||       ||||||||||||||||||||||||||||||||
  Moderna           32 AAATATAAGACCCCGGCGCCGCCACCATGTTCGTGTTCCTGGTGCTGCTG     81
  
  BioNTech_Pfiz     79 CCTCTGGTGTCCAGCCAGTGTGTGAACCTGACCACCAGAACACAGCTGCC    128
                       ||.||||||..|||||||||.|||||||||||||||.|.||.||||||||
  Moderna           82 CCCCTGGTGAGCAGCCAGTGCGTGAACCTGACCACCCGGACCCAGCTGCC    131

type_enthusiast5y ago

Knowing nothing about biotech – if Moderna and Pfizer were working from the same sequencing data, why would their resulting vaccine mRNA sequences be different? Even slightly?

7 more replies

koeng5y ago

Us folks in biotech have a special tool just for this :) https://blast.ncbi.nlm.nih.gov/Blast.cgi

Unfortunately, the core algorithm dates back to 1990, so it can be real slow in some cases. Biotech takes a while to improve :(

flobosg5y ago

I think you meant https://www.ebi.ac.uk/Tools/psa/emboss_needle/ or https://www.ebi.ac.uk/Tools/psa/emboss_water/ ;-)

asdff5y ago

You can also run blast locally if you need to throw more hardware at it.

jonplackett5y ago· 5 in thread

Rather disappointingly, neither sequence includes the string 'GATTACA'

ImaCake5y ago

shellfishgene5y ago

The sequence GATTCA appears 4 times in the reference version of the COVID genome :) (Go to https://www.ncbi.nlm.nih.gov/nuccore/NC_045512, pick "Find in this Sequence" on the right)

1 more reply

calebm5y ago

That would have been a killer easter egg (possibly literally).

joe456432345y ago

Whats special about this string?

Duber5y ago

It´s the title of a cult film: https://www.imdb.com/title/tt0119177/

1 more reply

wonderwonder5y ago· 5 in thread

We are simply programmable machines, its pretty interesting that all of human life can be reduced down to 30k editable microservices.

8note5y ago

That gives me the feeling that those reflexion models could do some help for improving our understanding of those microservices

hutzlibu5y ago

"its pretty interesting that all of human life can be reduced down to 30k editable microservices."

I don't know much about DNA and co, but it sounds as microservice is not the right metapher. Rather just 30k sourcecode?

Because a microservice is something that is already compiled and running..

wonderwonder5y ago

Was looking at it as each gene is a microservice and performs a role. Those microservices can be added to, edited / eliminated or swapped out.

qbasic_forever5y ago

Sure, but if you took that 30k of data and dropped it on a planet just like earth it would still take 10k years or so for us to build civilizations as we know it again.

ngcc_hk5y ago

Not 10k year as it needs to go through the million years scale - rna, hot, uv then dna ... with no oxygen to oxygen etc. Then million of years of evolving ... scale is a bit off.

1 more reply

yrral5y ago· 5 in thread

Related: Here's a article from late last year describing and explaining the source code of Pfizer vaccine:

https://berthub.eu/articles/posts/reverse-engineering-source...

It's a very interesting read and I hope the author makes another post explaining the differences of the two mrna vaccines.

throwawaysea5y ago

From that link:

fabian2k5y ago

The mRNA is stable for a few hours or so, it is both chemically unstable in solution under the conditions in a cell and also actively degraded by various mechanisms.

atleta5y ago

Read the article, it answers your question in detail:

> The very end of mRNA is polyadenylated. This is a fancy way of saying it ends on a lot of AAAAAAAAAAAAAAAAAAA. Even mRNA has had enough of 2020 it appears.

1 more reply

outworlder5y ago

> The reason I'm asking is to understand how the vaccine designers determine if there are any long-term effects of having this artificial material inside your body

The only 'artificial' thing about it is the special base that's added to avoid detection by the immune system. Everything else is the exact same compounds present in your cells.

1 more reply

alexobenauer5y ago

It is worth noting that the studies are still ongoing. The Phase 3 trials are two years.

ur-whale5y ago· 5 in thread

What this does, as a non-biotech person, I believe I understand at a high level: plonk this code into a ribosome and out comes the desired protein.

What I don't understand is:

   a) how the m-RNA code relates to the produced protein (i.e I can read C-code and get an idea of what is does fairly quickly, but can the same be said of m-RNA and the resulting protein)?

   b) how did they get their hands on that code in the first place? Do the coronaviruses use m-RNA as well? Was then a coronavirus somehow "dissected" to get at the spike protein "source code"?

koeng5y ago

Answers:

ur-whale5y ago

Thanks !

azernik5y ago

flobosg5y ago

a) Yes, you can translate a mature[1] mRNA sequence, codon by codon, from the start until the stop codon, and it will give you the sequence of the protein it encodes.

b) Coronaviruses have a RNA genome. Researchers extracted it from wild-type viruses and then sequenced it.

[1]: mRNAs can undergo several maturation steps, such as splicing, which removes regions that won’t be translated into protein.

grey4135y ago

elliekelly5y ago· 4 in thread

phcordner5y ago

flobosg5y ago

The authors reverse engineered the sequences of the vaccines, obtaining them from the remaining mRNA present in the vials.

“Assembly” in this case means that they merged several short sequences they obtained, each representing a fragment of the whole mRNA sequence.

hfjfktmtkrn5y ago

They sequenced vaccine leftover remaining in used vials.

So reverse engineering basically.

usrusr5y ago

aty268OP5y ago· 3 in thread

https://gizmodo.com/stanford-scientists-post-entire-mrna-seq...

throwawaysea5y ago

What does "hacked" mean here? The article makes it sound like this wasn't something illegal:

Also:

cwkoss5y ago

It means the gizmodo author is trying to get more views.

flobosg5y ago

> Is it simply that they took a sample of the vaccine and studied its composition using some standard machine/process?

That’s exactly what they did.

nsxwolf5y ago· 3 in thread

ELI5, Why are the sequences different if they result in the same spike protein?

rnestler5y ago

Maybe one could compare it to having different recipes for the same cake. Or different source code to solve the same problem.

ssijak5y ago

Different codons can encode the same amino acids, so different sequences can encode the same protein.

shakow5y ago

I didn't check it was the only explanation, but the DNA -> protein encoding is surjective.

obilgic5y ago· 3 in thread

so how are the first and the second dose different?

meepmorp5y ago

They're the same. It's just a second dose as a booster.

takeda5y ago

jonbaer5y ago

zappo29385y ago· 2 in thread

Wow Looks like it is analogous to having a header on a TCP packet. [0] Here is an animation of mRNA encoding translated to proteins inside a ribosome. [1]

Very analogous indeed.

[0] https://xerocrypt.wordpress.com/2014/07/22/how-to-read-almos...

[1] https://www.youtube.com/watch?v=TfYf_rPWUdY

retrac5y ago

robbiep5y ago

I ended up majoring in biochemistry and molecular biology in my undergrad because I was browsing on Wikipedia one day and came across an article written on an E. Coli variant that had sentences like:

01J3 e. Coli has a DNA Polymerase that contains 3k’-5’ proofreading capability and 5’-3’ error correcting with a polymerisation rate of 50bps

I’ve made the above up because I have never been able to find a Wikipedia page winxe that as succinctly pointed out to me that biology was a machine and I was hooked

1 more reply

mrfusion5y ago· 2 in thread

The lipid container is weird to me. Is that all it takes to send instructions inside a cell? Seems like a security hole. Why haven’t viruses evolved to have a lipid container?

inportb5y ago

> Why haven’t viruses evolved to have a lipid container?

They have. https://en.wikipedia.org/wiki/Viral_envelope

jforman5y ago

I was going to say you can't get very far without a protein vehicle, but then I remembered that's quite incorrect:

https://en.wikipedia.org/wiki/Retrotransposon

The injection is important, however, as it gets the genetic material past a whole lot of nucleases that cover your epithelia.

mrfusion5y ago· 2 in thread

So what moves the new protein out of your cells once the rna is processed? Don’t most proteins stay inside the cell?

lowdanie5y ago

There is a system that transports protein fragments to the cell surface and “presents” them to the immune cells: https://en.m.wikipedia.org/wiki/Antigen_presentation

mrfusion5y ago

1 more reply

karolkozub5y ago· 1 in thread

It looks like a machine code snippet. I wonder if we'll develop high level languages and compilers for genetic code in the future.

ImaCake5y ago

http://geneontology.org/

jturolla5y ago· 1 in thread

Please someone... create some abstraction language for this bio-assembly code. Can we make LLVM compile this? :joy:

jakeogh5y ago

Checkout https://github.com/clasp-developers/clasp

"Clasp: Common Lisp using LLVM and C++ for Designing Molecules": https://www.youtube.com/watch?v=0rSMt1pAlbE

dooopy5y ago· 1 in thread

I compared the spike encoding regions, and it looks like they're quite different...I wonder if the codons wind up coding for different amino acids. And who got it right?

flobosg5y ago

Their codon compositions were optimized differently, but both coding regions translate to the same amino acid sequence.

tibbydudeza5y ago· 1 in thread

6nf5y ago

We do know that bit in the middle encodes the structural spike protein of the virus

verytrivial5y ago· 1 in thread

There are people who could memorize this. And it would weirdly be more useful than digits of π!

whitepaint5y ago

I think memorizing any of these two is pretty much totally useless.

stevefrench935y ago· 1 in thread

I wouldn't install beta software on a production system though.

rossdavidh5y ago

Well we do with anti-malware stuff, when a 0-day comes out and we know there are exploits in the wild and beta software is all we've got.

1 more reply

singularity20015y ago· 1 in thread

tangential: do biologists sometimes use some form of base 64 encoding for their triplets? so instead of AAG.TCA.GGA just g5F or something?

jebus9895y ago

brian_herman5y ago· 1 in thread

https://github.com/brianherman/Assemblies-of-putative-SARS-C... I posted some txt files with the lines removed and stuff.

flobosg5y ago

If you have the time, it would be nice to transfer the data to the commonly used Genbank format.

3 more replies

ibraheemdev5y ago· 1 in thread

Is this all another medical company needs to start manufacturing and selling the vaccine themselves? Or is this sequence licensed/proprietary in some way?

akkawwakka5y ago

No. The RNA still needs to be fit in a lipid nanoparticle which is Moderna and BioNTech’s secret sauce.

rjvir5y ago· 1 in thread

This should be an NFT, I'd love to own an NFT of the RNA sequence of the Moderna vaccine.

cwkoss5y ago

No reason you can't make your own NFT of it. Heck, if you promise to pay at least 1.337 ETH I'll figure out how to do it myself and make it for you. :-P

1 more reply

plattyp5y ago· 1 in thread

Who would have thought it'd be this simple

  if covid?(dna)
    block_virus(dna)
  end

rantwasp5y ago

read the article linked in the thread. it actually does not work against all of the covid virus. it works against the spikes.

it’s extremely clever, but it also means that your code is wrong ;))

flemhans5y ago

bionhoward5y ago

https://github.com/bionicles/coronavirus

then you could use this recursive function to generate potential matches within some cutoff https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...

the function right below it converts the generator to a list. then you could save that

enjoy

spullara5y ago

https://en.wikipedia.org/wiki/Ribosome

ineedasername5y ago

It's also short enough to post the whole thing to Wikipedia, so that's probably inevitable along with some very entertaining edit wars.

flobosg5y ago

https://twitter.com/PowerDNS_Bert/status/1375091898797453326

mushroomzulu5y ago

https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_...

em3rgent0rdr5y ago

Are there any visual compilers that simulate the process of using these sequences to assemble a protein?

narrator5y ago

So I guess Josiah Zayner has to pick up on this now and do a DIY Moderna COVID vaccine video. He already did a DIY vaccine video with full open source documentation on how to do it yourself.

http://www.josiahzayner.com/2020/12/i-made-covid-19-vaccine-...

The_rationalist5y ago

I would love to see the output structure from Alphafold of this RNA source code

pknerd5y ago

Can someone give me the link of FASTA files of these sequences?

husamia5y ago

if you have understanding of how the sequence mutates then you can predict what the next strain is going to be and design spike protein that matches it.

stjohnswarts5y ago

ELI5 could this be used by "evil governments" to make designer pathogens to release during doomsday situations (say by North Korean leaders in their suicide bunkers if things went badly) ?

sktrdie5y ago

No package.json found, won't install.

StaticRice5y ago

Archive.org mirror: https://web.archive.org/web/20210326214140/https://raw.githu...

anonu5y ago

This is amazing. It appears quite "simple" - of course I know nothing about this part of the sciences.

Thinking about that timeline amazes me.

peter3035y ago

djmips5y ago

Where's the JSON versions?

squarefoot5y ago

aden1ne5y ago

Why not in fasta format?

omlet5y ago

Where is the 5G stack?

a-dub5y ago

i'm a dna noob: is it possible to do the growing and sampling thing to get the sequence from a sample of the vaccine or does the bubble of fat get in the way?

p0rkbelly5y ago

obligatory:

"I could have done this in a weekend"

person_of_color5y ago

How long before we can 3d print an mRNA vaccine?

bvanderveen5y ago

> .docx.pdf

Cargo-cult much?

savrajsingh5y ago

omlet5y ago

Where is the code about 5G modem?

j / k navigate · click thread line to collapse