^Most recent discussion I’ve seen.
I worked in genomics, left this year because you’re underpaid and often disregarded “IT-help” that assists wildly over-educated and underpaid people driving the actual research in 95% of cases.
[1] – https://dantelabs.com/
at best you waste your time, at worst you will find all kinds of things that are not there
it is the Silicon Valley hacker mentality that thinks the life is some sort of computer where you can fiddle with parameters
learn some biology first, then you can marvel at it and realize just how absurdly simplistic is to think you can read anything out of some random letter
But it's a fun subject, and as the technology develops, middle layers will disappear and then the money from expertise will become better.
The number of people that are both capable software developers and has a good understanding of cellular biology are quite few and will probably remain so for the foreseeable future.
In biotech, the end goal is a physical product or a service performed by a doctor or another highly paid professional. Those don't scale as well as software. The ratio of users to developers is also low. You are likely developing software for many niche tasks, which does not scale either.
And if you are considering roles in the academia, your productivity is not going to be high enough to justify a competitive salary. Productivity, in monetary terms, is defined by the amount of money you can bring in. Either directly on indirectly. In the academia, that usually means grants. You may be able to argue successfully to a funding agency that one software engineer is worth two postdocs, but not four.
Then there are also engineers from XKCD 1831 https://xkcd.com/1831
How do you handle one genomic variant affecting dozens of different rna transcripts and isoforms? How do you handle tissue-specific expression? LD haplotype blocks? Frequency across populations and reference choice? Sample handling affecting read depth? Mixed direction of effects in phenotype-genotype? The critical (and beauty IMO) feature of bioinfo is requiring an understanding of how your dataset can rarely be considered clean and as simple as _observation name_ and _observation value_. To succeed it is usually critical to know a lot about the observation meta data which is not collected in the dataset. Hopefully in the future it will be better curated and less esoteric.
A new generation of bioinformaticians and computational biologists are using rust, go, and the web to create, share and deliver.
Checkout nextclade.org
As soon as you start touching science, everything is important.
- the dna that doesn’t code for proteins but makes up the vast majority of human dna
- the intron regions of genes that are translated into RNA but then sliced out of the RNA and not transcribed into protein and are 5x larger than the coding parts
Those two things alone are absolutely critical to understand to interpret a genome sequence. Of course there is much more.
1) The Moderna vaccine was made with the help of illumina genome sequencing. They were able to sequence the virus and send that sequence of nucleotides over to moderna for them to develop the vaccine - turning a classically biology problem, into a software problem, reducing the need for them to bring the virus in house.
2) Illumina has a cancer screening test called Galleri, that can identify a bunch of cancers from a blood test. It identifies mutated dna released by cancer cells. This is huge, if we can identify cancer before someone even starts to show symptoms, the chances of having a useful treatment dramatically go up.
Disclaimer: I work for illumina, views my own.
I wrote some more about why genomics is cool from a technical point of view here (truly big data, hardware accelerated bioinformatics) : https://dddiaz.com/post/genomics-is-cool/
Having Turing complete programmatic control over biological systems has an absolutely endless list of transformative applications.
Imagine being able to program bacteria that can "infect" the patient and attack tumor cells, or act as fodder to keep autoimmune disease in check.
Or let's say we could program stem cells into "liver repair mode" to go and differentiate into new liver cells.
Then the implications for things like drug synthesis with the ability to programmatically control enzyme levels to compile more or less arbitrary biosynthetic pathways into fast growing photosynthetic algea, turning CO2, water and sunlight into medicine.
It's still a long way off being at that level of applicability, but man oh man it's gonna change everything.
Imagine a software heisenbug, but instead it's a life form that you can't kill -9.
The idea of tailor-made medicines in a vat is awesome, but as far as creating a bacteria to "specially target" certain cells seems like a disaster waiting to happen.
"The Amazon CloudFront distribution is configured to block access from your country."
And yeah it shows - contrived example after another, and honestly not a great description of anything.
If you want to truly understand genomics you have to understand how biology works. And honestly it’s great info for anyone even if you’re not getting into genomics or whatever.. why would you not want a working model of how life is put together? In that case I’d just recommend dusting off a biochem or cell bio text book and reading just the first 5-8 chapters. Typically they lay it out very simply from basic principles and the authors have far more experience and understanding and writing help than this weird tutorial course thing.
I once tried reading a few chapters of a bioinformatics book explaining DNA, RNA, protein creation, etc. The basic idea seems very simple but to my mind they explained it non-systematically with too many words. There seems to be an internal information structure in these RNA- and DNA- related processes that was not being concisely presented and it seemed that if the writers presented the material in terms of computer-science concepts, so much time could be saved.
For example, the central dogma of DNA transcribed to RNA translated to protein seems simple, but it's not.
In almost every instance, there are vague 'rules' and many many exceptions to these rules. For example, often coding regions in genes start with an ATG, but sometimes they don't. Sometimes splice sites (where the non-coding parts of transcripts called introns are chopped out) can be predicted, but a portion of the transcripts are not spliced at predicted sites for no obvious reason. Sometimes the predictions are just wrong. Sometimes the generated proteins are modified at specific locations which impacts their function, but again, sometimes not. Even whether the gene itself is 'switched on' (i.e. able to be transcribed) is impacted by many many things, such as unidentified transcription factors, or whether the chromosomal location itself is accessible or not. There are many many other things that impact the process.
There is no simple underlying concept as the system is not designed, it evolved and is quite different among different organisms, and even in different tissues or timepoints in the same organism. As long as it works and provides enough benefit to avoid negative selection, that's enough.
It's a mess, which makes it interesting.
You are absolutely correct, there’s an information theoretical underpinning of genomics and systems biology that’s rarely if ever tackled in text books but (a) neither does this course tackle it, and (b) you can’t just skip on biochem basics and Jump to that. That’s like trying to become a physicist without learning math.
There's nothing about sequencing by synthesis, how blocking nucleotides are added one after another, pictures of the fluorescent nucleotides on the flow cells are image analysed, etc.
This site looks like an ELI5 kind of treatment.
Amusingly that's literally like 80% true. Water is just a really big deal in biochem.
https://www.biostarhandbook.com/
I have learned so much from it.
It is an introduction into what is like to do genomics in a scientific environment. The content at the link the OP posted appears to be an oversimplified, high level and naive overview
https://www.edx.org/course/introduction-to-biology-the-secre...
by Professor Eric Lander
> Introduction to Biology - The Secret of Life
> Explore the secret of life through the basics of biochemistry, genetics, molecular biology, recombinant DNA, genomics and rational medicine.
It's really well done and genomics is the focus. I took many dozens of edX and Coursra courses over the years, this is one of the top 5% of the courses there I would say.
I don't understand the phrase "from a programmer's perspective", or "for Engineers" in the title on top.
As a programmer whos studied CS but also took numerous life science courses throughout my life. You want to learn biology you study biology, what does a "programmer's view", or an engineer's, have to do with it? You use the correct tool for the job, and having a background in both, I don't see this working out well, more like the opposite actually.
The point of looking at biology for an engineer or programmer should be to broaden ones horizons, not to use ones internal models build for a completely different field in another one that really is not like that at all. IMO it's best to forget all computer metaphors here.
----
By the way, since there was something about this yesterday, there also is this course: https://www.edx.org/course/principles-of-biochemistry - it too is very good. A good knowledge of organic chemistry is a prerequisite, but there are plenty of equally interesting course resources for that available too, including even Khan Academy (https://www.khanacademy.org/science/organic-chemistry), or to give a(nother) random link, https://ocw.mit.edu/courses/5-12-organic-chemistry-i-spring-...
Biology becomes a lot more fun with this foundation already established in ones head.
I saw a recent Lex Friedman podcast where the guest talks about "bioelectric patterns" and somehow getting a worm to grow a second head by messing with those patterns. I would absolutely start on this course now if it was a realistic pathway to doing something like that.
There is no REPL for the cell. No tinkering allowed.
When Marvin Minsky was growing up in New York, neighborhood pharmacists owned fluoroscopes. He said those fluoroscopes were like “great black boxes” to him and that “those kinds of black boxes don't exist for kids anymore.”
Unfortunately, each step ends up being extremely challenging, and there's tons of noise, and the cost of each Read, Eval and Print is far higher than in a programming language. Further, the "system" is running 38,000 other "threads" all of which have direct read and write access to your data, some of those threads consider your data to be the enemy and cut it up, while others are just randomly spamming your console with uneccessary debug log messages.
We have actually reached the point where some scientists have synthetically created a novel chromosome, and used a preexisting cell to bootstrap the new genome so that the cell eventually contains only protein from the new genome. To me, that represents a step beyond tinkering: it means we can create synthetic lifeforms with exactly and only the details we want, which makes studying them and engineering them far easier.
Interestingly, even though this tech exists, nobody has found any interesting use for it and it's not even really used to probe biology.
A better example would be gene therapy, which has been developing slowly over decades. A single person died in the a trial in the 90s and stopped development (that's the regulation part you're referring to) for decades. In other trials that don't include gene therapy, patients routinely die and they're just a statistic.
The pay is exactly where market is. There’re ton of wet-lab people wanting to get into “data”. And the industry is less lucrative than showing ads like Google does.
Genomics stretches vastly beyond this - assembly and annotation to start with.
I'd argue the most interesting problem space for software engineers is outside of what is covered in the document.
For me coming from a SWE background the computational skills are very easy to pick up especially if you work with bioinformaticians you can ask questions. It’s the genomics knowledge that is very difficult for an engineer to acquire.
To get that basic biology foundation, another post mentioned an EdX Intro Biology course, that would be a terrific start, or just get a recent university-level intro biology textbook. It's not terribly difficult material and you'll be in far better shape than reading a biology-for-laypersons pamphlet.