In Spain, for example, we have a private system but it is extremely inefficient in some areas (and very good in others). Of course, you can have private insurance, but you still have to pay your social security. Curiously, the only ones who can decide which system they want are the public servants...
You cannot really avoid the fundamental constraints - anywhere in the world, there are only so many doctors and so much money available for treatments. IDK if USA has a shortage of doctors, but plenty of European countries do. A country like Romania just cannot give its doctors big enough wages to stop them from seeking employment elsewhere, where they will get five to ten times as much (UK, Germany, Switzerland). As a result, local hospitals are seriously understaffed.
Where I live, having personal connections to good doctors gives you an advantage - you will be examined and treated faster. Then there is outright nepotism.
The outgroups are different than in America, but there are always people for whom the system sucks.
Gattaca shows eugenics has been so vilified that the audience will root for a character who selfishly commits fraud, risking lives and scientific progress for his own vanity.
The really scary fact is that there would be no need for a police state and segregation. The genetically enhanced would just completely dominate an open and fair competition.
There's a guy on YouTube doing diy gene therapy to treat his lactose intolerance so it's not exactly science fiction.
https://youtube.com/c/thethoughtemporium
He’s got a ton of other interesting projects, the DIY gene therapy is just one that stands out because it seems so risky.
If someone steals my DNA I can't stop them. But I can at least avoid being swept up in large scale DNA scanning and tracking efforts.
Edit: I used to help Google fund researchers like Joe Derisi and others who develop technology to do this, and some of the people I worked with in my academic career are quite good at identifying serial killers from 30 year old DNA. If you're downvoting because you think I'm making this up, you're wrong. If you're downvoting because you don't think large-scale individual detection using genetic sampling of the environment is possible, you're wrong. If you're downvoting because you think you couldn't do a whole genome sequence of an individual using a sample collected in the wild, you're wrong. If you're downvoting because you think this is a terrible idea (morally, ethically), that's fine but I didn't say anything about my own moral or ethical beliefs about this.
It's simply factually correct to say that large-scale individual sample collection (at order tens of thousands, if not hundreds of thousands of individuals in a country the size of the US) is possible. All the technology is there to do this.
https://www.latimes.com/california/story/2020-12-08/man-in-t...
I'm really hoping someone will work on an open source "23andme@home" solution that ties all this together in an accessible way.
The results have been pretty astounding. I found markers that pointed to poor response to a specific blood thinner my grandfather was put on before he passed. Currently I'm researching the cluster of Bipolar / ADHD / SAD symptoms I experience that all seem to trace back to a certain genotype of circadian rhythm genes I have (thank you, Sci Hub). To boot, some of the studies I've come across have been done on Han Chinese populations that match my descendance.
Perhaps going too far down this rabbit hole poses a self-diagnosis risk, but the correlations to my family history and my own life experience working with doctors to diagnose and treat symptoms are pretty undeniable. And given that your run-of-the-mill psychiatrist is going to treat you off of a DSM checklist, I feel much more confident knowing there have been genomic studies to back things up, since my doctor isn't up to date on this research, and finding one that would be will be difficult and expensive. I've shared the papers with my doc and he's been supportive, sometimes I feel like I should be getting a discount on services rendered.
https://store.nanoporetech.com/us/minion.html
https://www.extremetech.com/extreme/190409-minion-usb-stick-...
On some of their larger devices (eg, the PromethION), they've moved outright to a "we lend you the device for free, you buy the consumables" model.
https://en.wikipedia.org/wiki/Nanopore#Inorganic https://nanoporetech.com/how-it-works/types-of-nanopores
You also still, in that case, need a gelbox + ladder + loading dye + sybrsafe or whatever, so it’s still not nothing.
then you’d need a program like bwa http://bio-bwa.sourceforge.net/ to map your data.
then use https://samtools.github.io/bcftools/howtos/variant-calling.h... or something else to produce variants from the mapping results.
then compare your resultant vcf file to something like dbSNP: https://www.ncbi.nlm.nih.gov/snp/
at this point you can start generating a raw version of a 23andMe report.
when i worked on https://github.com/iontorrent/tmap we thought it would be a good idea to do something like a “local alignment” (using https://en.wikipedia.org/wiki/Smith–Waterman_algorithm) after doing a lookup into a burrows wheeler transform on a substring of the “read.”
https://www.ebay.com/itm/265148387179
Nanopore is still not quite ready yet for precise and high accuracy sequencing. Give it another five years.
https://genomebiology.biomedcentral.com/articles/10.1186/s13...
I guess there are limits to ensemble methods if the underlying accuracy doesn't increase. I don't work on gene sequencing algorithms but from what I understand of ML ensemble techniques, there are certain assumptions regarding the underlying independence of the errors. The errors for nanopore should be uniform but I am not sure. Any molecular biologist here care to comment?
There are two components that drive sequencing error rate. 1) The chemistry behind the sequencing (for nanopore sequencing this is the "feeding DNA through a pore" bit) 2) the method to convert raw signal into DNA sequence (this is called "base calling").
The gold-standard in terms of error profile for sequencing is currently the Illumina short read platform. Illumina machines are really just microscopes (TIRF scopes for optics folks) that sequence DNA by visualizing incorporation of dye-labeled nucleotides into the sequenced molecule(s) (Imagine a really slow PCR [1]). Each base is labeled with a different color, then when a molecule has a match it makes a colored spot on the slide that the machine can read (see here for more info & details of newer chemistry that use fewer colors [2]). This whole process is mediated by DNA polymerase which itself has a very low error rate. Another important point is that DNA sequenced on the illumina platform (called a "library") tends to be from "amplified" template DNA, meaning the DNA will have been processed and potentially be missing chemical modifications on the bases that could be present in the organism. This works to Illumina's advantage, because when trying to answer the question of "what is the DNA sequence?" we want the ground-truth DNA, not the modification state.
In contrast, Nanopore sequencing works by feeding a long strand of DNA through a pore and measuring the change in electrical current through the pore (watch the cool video [3]). For the current set of nanopore flowcells, 8 bases of DNA sit in the pore at a time, meaning the current at each timestep is a product of 8 nucleotides in aggregate. This also means that the pore "sees" each base 8 times, but always in the context of an additional 7. In order to basecall from the raw signal, it's not as easy as saying "blue = A", instead, you have to deconvolve each base from a complex signal. As you might imagine, the folks at Oxford Nanopore & broader research community have turned to machine learning-based base callers to solve this problem, and they work quite well [4]. But they are not perfect. Deconvolving runs of the same base (e.g. "AAAAAAA") is difficult because without well-defined signal changes between bases, the caller has a hard time deciding how many bases it has seen, so a common error mode for nanopore sequencing is to create insertions/deletions at places in the genome with low nucleotide diversity. Another interesting reason is that most Nanopore library preps are often performed on unamplified DNA, and so in addition to normal A/T/G/C nucleotides, the template DNA can also contain bases with chemical modifications. For example, in bacteria, A's are often methylated, and in Humans, C can have all kinds of different modifications (5-methyl-cytosine, 5-hydroxymethyl-cytosine, etc. etc.) and each different modification affects the signal in the nanopore. Therefore, basecallers that weren't trained on modified bases will produce basecalling errors in the presence of base modifications.
For both Illumina and Nanopore basecallers, they assign a quality score to each base that indicates the probability that the basecaller produced an incorrect value. This is called a Q-score, which is defined as "Q = -10(log10(P-value))" (i.e. Q / 10 = the order of magnitude of the error probability) [5]. For example, a Q-score of 10 means an error rate of 1 in 10, but a Q-score of 50 means an error rate of 1 in 100,000. For Illumina sequencing, >95% of the reads have a Q-score > 30 (i.e. 1 in 1000 errors), while Nanopore reads tend to have lower average Q-scores (~Q20, i.e. 1 in 100 errors). For genetics, where 1 base difference can mean the difference between a severe disease allele vs a normal variant, 1 in 100 won't cut it.
The current gen Nanopore flowcell chemistry (R9.4.1) is what most people are talking about when they talk about Nanopore error rates, but they've just released a new pore type & made some basecaller upgrades that improve the accuracy to what they call "Q20+" and some claims of Q>30, and from the data I've seen, it's impressive, I just haven't got my hands on one yet to see for myself [6]. I think the comment saying "wait 5 years" is an overestimate, but if you want to genotype yourself today, I'd just pay someone for Illumina sequencing and process the fastq files yourself if you really want to do it as a learning exercise.
I've unintentionally written an essay, so I'll stop here, but real quick to your other point RE: rerunning the sample N times & using the repeats for error correction. This won't work the way you're thinking because a "sample" is actually a collection of DNA molecules that are sampled randomly by the sequencer. You have no way of knowing that the same read between runs was actually from the same molecule, so you can't error correct this way. Consequently, a totally different sequencing platform from Pacific Biosciences uses this strategy by doing some really cool chemistry, but I'll spare you the second essay (google "PacBio HiFi" or "circular consensus reads" if you're interested).
[1] https://en.wikipedia.org/wiki/Polymerase_chain_reaction
[2] https://www.ecseq.com/support/ngs/do-you-have-two-colors-or-...
[3] https://www.youtube.com/watch?v=RcP85JHLmnI
[4] This paper is a tad out of date, but Ryan Wick always writes extremely clear papers: https://genomebiology.biomedcentral.com/articles/10.1186/s13...
[5] https://www.illumina.com/documents/products/technotes/techno...
[6] https://nanoporetech.com/about-us/news/oxford-nanopore-tech-...
Edit: reformatted links for clarity.
If this can sequence flora, fungi and human DNA for about 10k - I'd buy it, just to experiment and deep dive. That is such a low barrier of entry it itself is interesting.
and i feel like nanopore is the VR of dna sequencing. it’s always just another few years off.
Is this also true for nanopores in protein sequencing? This HN comment from a few weeks back [1] pointed out recent progress but perhaps the tech is still not quite there.
But this could enable things like finding relatives which is what I got out of the comment about 23andme. Instead of all the data being centralized, storage and comparison could be distributed
Not sure what you are concerned about. What would you expect a bad actor to do with your DNA sequences? I'm genuinely curious.
To actually sequence DNA with this USB thingy you need to prepare a so called sequencing library - and for that you need a fairly well equipped lab - expensive reagents and years of practice and skill ... a mid level biology Ph.D can prepare these ...
in addition the flowcell sold by Oxford Nanopore often malfunctions and the whole run is a bust ... (behaves like this since 2014 ... so no, the technology does not seem to improve a whole lot)
On one hand, I would love to learn something new about my body.
On the other hand, what if the results tell me that I am predisposed to some horrible untreatable disease? Will I spend the rest of my days observing every little pain or discomfort and thinking "is this IT?"
1. A completely genetically determined disease; a rare 100%-going-to-happen deal. (Which you would probably know about already, because your mother, or grandfather died from it...)
2. Some significant, but abstract risk modification.
With 1., you would know, you will get sick/die some time soon in the future, allowing you to live your life accordingly, die without regrets, prepared and so on. You can take that into consideration when planning for a family, taking job offers, procrastinating on the good life with work and retirement plans. Burn bright.
With 2., there is a very, very high chance lifestyle choice influence the stated risk, as obviously not everybody who got the polymorphism gets sick. So you can get your ass up, exercise, quit smoking and drinking, reduce stress, get regular check ups, ..., and avoid getting sick or reduce the impact/progression, in case you do.
I think, logically, knowing is always better than not knowing. But I understand how anxiety does tell a different story.
"Inaction breeds doubt and fear. Action breeds confidence and courage. If you want to conquer fear, do not sit home and think about it. Go out and get busy." --Dale Carnegie
"You gain strength, courage and confidence by every experience in which you really stop to look fear in the face. You are able to say to yourself, 'I have lived through this horror. I can take the next thing that comes along.' You must do the thing you think you cannot do." --Eleanor Roosevelt
"Fear is the path to the Dark Side. Fear leads to anger, anger leads to hate, hate leads to suffering." --Yoda
"The brave man is not he who does not feel afraid, but he who conquers that fear." --Nelson Mandela
"Nothing in life is to be feared. It is only to be understood.' --Marie Curie
"The key to change... is to let go of fear." --Roseanne Cash
"He who is not everyday conquering some fear has not learned the secret of life." --Ralph Waldo Emerson
"We should all start to live before we get too old. Fear is stupid. So are regrets." --Marilyn Monroe
"Fear keeps us focused on the past or worried about the future. If we can acknowledge our fear, we can realize that right now we are okay. Right now, today, we are still alive, and our bodies are working marvelously. Our eyes can still see the beautiful sky. Our ears can still hear the voices of our loved ones." --Thich Nhat Hanh
Perhaps a trusted middleman would be a solution: "just don't tell me about anything that is totally beyond my control".
Now I'm a Data Engineer doing backend work in public sector. :)
Here are some press releases related to articles I published during my PhD:
https://physics.illinois.edu/news/article/34064
https://www.sciencedaily.com/releases/2014/10/141014095320.h...
when i looked i was interested, but was turned off when i saw that the cost far outstripped commercial sequencing services.
London, UK https://biohackspace.org/
Brooklyn, NY, https://www.genspace.org/
Baltimore, MD, https://bugssonline.org/
Australia, https://foundry.bio/
https://abarry.org/dna-sequencing-in-our-extra-bedroom/
http://blog.booleanbiotech.com/sequencing-at-home-with-flong...
What if you could take the (binary) data file of your DNA and use it as input in the (recently remastered) Monster Rancher games to generate a monster? Apparently those games use external user-provided data (like music CDs, game discs etc.) to generate the monsters the player would then train and use (something I only recently learned about through gaming livestreams).
I'd actually like to see the level of jank that would come out of something like that.
Also your dna is bootstraped from your mothers cells. And the prenatal environment has quite a large effect on development so your simulation might end up quite different from you if we only started with your dna.
For example today we can already predict the color of the eyes and other phenotype from the DNA.
If you are able to observe enough samples of cell growth and their associated DNA, you probably can model and predict the statistics of a cell from their DNA. Because the cell is itself the result of a lot of chemical processes, the law of large number will help smooth those statistics.
Given that we have a lot of cells, the collective behavior is probably entirely governed by these statistics.
We could even do that, without knowing anything about DNA at all. Or predict tomorrows weather without satellites and computers.
I think you are a bit too enthusiastic about statistics, or too naive about complexity.
What was unthinkable 50 years ago, playing chess better than a human, it's now trivial for a $100 device.
And it's not necessarily required that to simulate the growth of a human you'll need to simulate the entirety of chemical reactions in all 50 trillion cells and all that.
It's unlikely even if we improved computing hardware many orders of magnitude beyond all reasonable predictions, that the calculations would be able to simulate all the necessary details; most of our simulations now are based on many approximations due to hardware limitations.
As to the question of "what level of fidelity is required to turn a FASTQ of somebody's genome into an accurate model of the resulting human, with some sort of realistic environment also provided", that's so far beyond what is even remotely comprehensible it's not worth speculating about in terms of science fact; it's just fiction.
I see open-source implementations of BWT-based indexes (FM-Index/FMtree) out there. Out of curiosity, does anyone know of anything using BWTs for compact indexes in more everyday uses (like full-text search), or alternately reasons it doesn't really work outside the genome-alignment use case? Likely it only 'pays for itself' if you really need the space savings (like, it's what makes an index fit in RAM) or else we'd see it in use more places. It'd still be kinda neat to actually see those tradeoffs.
The BWT sees strings as integer sequences. Either "ABC" and "abc" are two unrelated strings, or you normalize before building the index and lose the ability to distinguish between the two.
Search proceeds character-by-character backwards, jumping arbitrarily around the BWT using the same LF-mapping function as when inverting the BWT. You get cache misses for every character.
BWT construction is expensive, because you want a single BWT for the entire string collection. There is a ridiculous number of papers on BWT construction, as well as on updating and merging existing BWTs, but the problem has still not been solved adequately. If your data is measured in gigabytes, you can just pay the price and build the index, but a few terabytes seems to be the practical upper limit for the current approaches.
You can of course partition the data and build multiple indexes, but then you have to search for each pattern in each index. There is no way to partition the data in a way that different indexes would be responsible for different queries.
Last time it was analyzed the conclusion was that there was nothing actionable.
I guess in your case where nothing actionable is found it's benign. It will be the cases where there are risk factors for late onset things - cancer, diabetes, heart disease etc. where it would get sticky.
As for the case where nothing actionable is found- it's not benign. It's absence of information, not information of absence.
In some people's thoughts, making a better society is the first and most obvious thing to do with technology like this, not an accidental consequence of inconvenience. Fortunately, enough of those people are active in the world to make Main Street different to Wall Street, at least sometimes.
They sell swab kits directly, or via NFT purchase, for ~$500 for a 30x near complete sequencing (that's 30 passes for over 99.9% vs 0.2% for 23andme et al). The results are stored in an encrypted AMD SEV-E vault to be accessed by big pharma or individuals, only for specific markers, in exchange for the $GENE token paid directly to the genome owner. Figures touted are $50-80 per request. This token is burned as kits are sold, can be staked, offers rewards like DAO membership, can be gifted to charities researching specific diseases in various populations. It can act as a form of UBI in unbanked populations and puts your DNA back in your control.
To me it's the best use of web3 tech I've come across, so disclaimer, I am invested and a DAO member, but it's early in the project still. They are not quite ready for mass marketing. They are moving over to Polygon for very low transaction fees in January, will be launching the first joint NFT/kit sale (the next season might include personal genetically generated art) to fill the vaults with 10k sequenced genomes. They are over half way already through work with charities, but that is the magic number before big pharma can start making queries. Right now though they are quietly building and preparing before marketing plans kick in later in Q1.
Take a look at https://genomes.io where everything is explained in more detail, the team are presented and the tokenomics set out.
TL;dr - for $500 right now you can get your entire genome sequenced, stored in a vault to earn you passive income, if you agree to each query. But wait for the NFT vs buying directly, it will have more perks.