Sequencing your DNA with a USB dongle and open source code (opens in new tab)

(stackoverflow.blog)

398 pointsjohntortugo4y ago176 comments

176 comments

m12k4y ago

I'm really curious about what I could learn by getting my DNA sequenced, but I'm worried about my rights to not have it recorded and shared without my consent if I got someone else to do it for me - so any advance toward an affordable home test setup is very welcome.

adabaed4y ago

Imagine insurers refusing to give you a service due to your predisposition to certain diseases...

meltedcapacitor4y ago

Protection from this comes from laws that ban DNA-based policies, not by being secretive about sequencing. If it is allowed, insurers will have no need to obtain DNA sequences in devious ways, they will just ask and refuse cover or charge more when clients refuse to get sampled.

m12k4y ago

Sure, but being secretive about your DNA seems like the prudent course of action until those laws are in place

1 more reply

ajuc4y ago

It's amazing how many problems you avoid by having public health system.

adabaed4y ago

You resolve part of them, but immediately generate others. Hybrid systems are the way to go.

In Spain, for example, we have a private system but it is extremely inefficient in some areas (and very good in others). Of course, you can have private insurance, but you still have to pay your social security. Curiously, the only ones who can decide which system they want are the public servants...

1 more reply

inglor_cz4y ago

You avoid the problem with medical debt, to be precise.

You cannot really avoid the fundamental constraints - anywhere in the world, there are only so many doctors and so much money available for treatments. IDK if USA has a shortage of doctors, but plenty of European countries do. A country like Romania just cannot give its doctors big enough wages to stop them from seeking employment elsewhere, where they will get five to ten times as much (UK, Germany, Switzerland). As a result, local hospitals are seriously understaffed.

Where I live, having personal connections to good doctors gives you an advantage - you will be examined and treated faster. Then there is outright nepotism.

The outgroups are different than in America, but there are always people for whom the system sucks.

2 more replies

foobarbecue4y ago

If you haven't seen Gattaca, you should

haihaibye4y ago

There should be a directors cut where the mission fails because of Vincent's hidden heart condition.

Gattaca shows eugenics has been so vilified that the audience will root for a character who selfishly commits fraud, risking lives and scientific progress for his own vanity.

The really scary fact is that there would be no need for a police state and segregation. The genetically enhanced would just completely dominate an open and fair competition.

2 more replies

adabaed4y ago

Yeah super good!!

tekproxy4y ago

Imagine gene therapy to fix the problems. After a few generations, many diseases will be extinct.

There's a guy on YouTube doing diy gene therapy to treat his lactose intolerance so it's not exactly science fiction.

godelski4y ago

I'm pretty sure the effect was temporary and he had to do it a second time. It's very important to note that this research is still very new and he was lucky that his genetic code was prime for that test. (I'm not against bio hackers btw. I think they provide a very good service though obviously more risky. No problems when that risk is on yourself but just trying to say "don't try this at home").

jcims4y ago

The Thought Emporium

https://youtube.com/c/thethoughtemporium

He’s got a ton of other interesting projects, the DIY gene therapy is just one that stands out because it seems so risky.

1 more reply

dekhn4y ago

Note that you are literally shedding identifiable DNA from your body at all times and a truly motivated adversary would have no problem obtaining enough sample material to do high quality sequencing.

nomercy4004y ago

It's not the motivated adversary I am worried about, who actually has to show up where I have physically been. It is the company on the other side of the world in a country with lax legislation, profiling me based on the data I 'shed' online, like a cloud-based DNA sequencing service.

KeepFlying4y ago

This is my threat model for most things in life. If someone is physically targeting me, I'm fucked. I'm more worried about limiting the casual long-distance attacker since I have more ability to stop them.

If someone steals my DNA I can't stop them. But I can at least avoid being swept up in large scale DNA scanning and tracking efforts.

ClumsyPilot4y ago

The data monopolies and abuse originate from people giving these companies data for free. If they had to buy it, or pay goons to collect it, they wouldn't be profitable.

russdill4y ago

In the near future (or arguably now depending on your purpose) you don't even need that. Assuming enough of your relative's sequences are available, the probability of you having certain genes/mutations can be narrowed down so much that having your individual genome doesn't add much.

petters4y ago

This does not seem true? Even if the complete genome of my mother and father is known, there is still a lot of uncertainty left.

2 more replies

tgsovlerkhgsel4y ago

One of the key differences is that in the case of the DNA sequencing services, you're agreeing to ToS that allow them to abuse your data (and thus indirectly the data of any of your blood-relatives), and you directly tie the data to a name and address.

Teever4y ago

I assume this line of reasoning is also why you don't lock your doors at night?

shukantpal4y ago

At scale?

dekhn4y ago

Sure. I've worked with and know people who could carry this out at scale, although obviously individual sample collection isn't highly scalable.

Edit: I used to help Google fund researchers like Joe Derisi and others who develop technology to do this, and some of the people I worked with in my academic career are quite good at identifying serial killers from 30 year old DNA. If you're downvoting because you think I'm making this up, you're wrong. If you're downvoting because you don't think large-scale individual detection using genetic sampling of the environment is possible, you're wrong. If you're downvoting because you think you couldn't do a whole genome sequence of an individual using a sample collected in the wild, you're wrong. If you're downvoting because you think this is a terrible idea (morally, ethically), that's fine but I didn't say anything about my own moral or ethical beliefs about this.

It's simply factually correct to say that large-scale individual sample collection (at order tens of thousands, if not hundreds of thousands of individuals in a country the size of the US) is possible. All the technology is there to do this.

1 more reply

hourislate4y ago

I'm curious whether a Covid PCR test could be used to sequence your DNA. Is there enough of a specimen in the process.

5 more replies

diplodocusaur4y ago

I imagine one's DNA can't be too different from the cousin that agrees to share that kind of data?

Method-X4y ago

When I had 23andme sequence my DNA, I used a fake last name and pre-paid credit card.

albertgoeswoof4y ago

But if enough of your relatives get sequenced, they’ll know who you are anyway

authed4y ago

If your family members got it done with their real name, they will be able to create a link.

amelius4y ago

Uh yeah, but you logged in from an IP address which Google already tied to your real name.

lostlogin4y ago

It’s out the bag now - you can be identified via relatives DNA.

https://www.latimes.com/california/story/2020-12-08/man-in-t...

dataflow4y ago

I don't think that implies the increase in risk would be negligible, which was the parent's point.

Gatsky4y ago

It’s not exactly DIY but there are in theory ways to ‘encrypt’ your DNA before it gets sequenced. Something like amplifying/enzymatically modifying the DNA in a way that changes the sequence which you can undo computationally once you get the data back.

biophysboy4y ago

Its only valuable if somebody also interprets it for you, such as telling you whether you have a genetic predisposition for certain diseases.

m12k4y ago

One of the other comment threads indicates that the data, that you need to do that kind of annotation of the sequence, is to some extent available for home use as well: https://news.ycombinator.com/item?id=29695449

I'm really hoping someone will work on an open source "23andme@home" solution that ties all this together in an accessible way.

rumblerock4y ago

Years ago I used Ancestry, then requested the .txt file and asked them to delete it from their records. Uploaded it to run a report at https://promethease.com/ that cross-references your SNPs against the existing body of genetic research.

The results have been pretty astounding. I found markers that pointed to poor response to a specific blood thinner my grandfather was put on before he passed. Currently I'm researching the cluster of Bipolar / ADHD / SAD symptoms I experience that all seem to trace back to a certain genotype of circadian rhythm genes I have (thank you, Sci Hub). To boot, some of the studies I've come across have been done on Han Chinese populations that match my descendance.

Perhaps going too far down this rabbit hole poses a self-diagnosis risk, but the correlations to my family history and my own life experience working with doctors to diagnose and treat symptoms are pretty undeniable. And given that your run-of-the-mill psychiatrist is going to treat you off of a DSM checklist, I feel much more confident knowing there have been genomic studies to back things up, since my doctor isn't up to date on this research, and finding one that would be will be difficult and expensive. I've shared the papers with my doc and he's been supportive, sometimes I feel like I should be getting a discount on services rendered.

1 more reply

DoctorOW4y ago

Is that not something software can theoretically provide?

jacquesm4y ago

Your DNA can tell you a lot about what could happen, but not about what is happening.

refurb4y ago

That's Prometheus, no? They got acquired however, but prior to that you could upload data anonymously and then browse the analysis. It was very rough though, just linking sequences to risk, but a lot of it was inconclusive.

fragmede4y ago

I don't know if this is the exact nanopore USB dongle used in the article, but this one is $1,000 for the base package, first released in 2014

https://store.nanoporetech.com/us/minion.html

https://www.extremetech.com/extreme/190409-minion-usb-stick-...

cge4y ago

Note that Oxford Nanopore seems to have very much a "sell the ink/razor/etc" business model with their devices: that $1,000 package comes with one flow cell, which is a consumable and costs $900. They're essentially giving the device away for free.

On some of their larger devices (eg, the PromethION), they've moved outright to a "we lend you the device for free, you buy the consumables" model.

up6w64y ago

There is some exciting work around this flow cells to create something more durable. It would be really interesting to be able to buy something like that and use it in schools/personal hacks without worrying about small mistakes in the sample.

https://en.wikipedia.org/wiki/Nanopore#Inorganic https://nanoporetech.com/how-it-works/types-of-nanopores

hobofan4y ago

IIRC you even have to send back the used flow cells to buy new ones so they can keep prices down.

koeng4y ago

Yep that’s the one. They update the flow cells over time. The bit they don’t tell you is the stuff you need, like a qubit, to properly run the thing.

joshuamcginnis4y ago

A qubit or fluorometer isn't required. You can use a simple DNA ladder to measure the relative quantity and quality of DNA that's good enough for nanopore sequencing. I just did a full genome sequence of a novel fungus using this exact approach.

koeng4y ago

Huh, interesting. Did you fragment? I’d imagine comparison of high weight gDNA wouldn’t be too nice on a gel.

You also still, in that case, need a gelbox + ladder + loading dye + sybrsafe or whatever, so it’s still not nothing.

1 more reply

monopoledance4y ago

So, for about 1000$ I can sequence my DNA at home without giving the data away?

1 more reply

LinuxBender4y ago

This is very cool. Are there by chance any associated projects that could evolve into something like 23andme but remain entirely within a private network meaning that the data is entirely in the hands of the individual?

mylons4y ago

yes. if you wanted to annotate your genome you could “easily” do it on your brand new macbook (this is ram intensive, you probably need 32G). you’d need a reference genome, like https://www.nist.gov/programs-projects/genome-bottle

then you’d need a program like bwa http://bio-bwa.sourceforge.net/ to map your data.

then use https://samtools.github.io/bcftools/howtos/variant-calling.h... or something else to produce variants from the mapping results.

then compare your resultant vcf file to something like dbSNP: https://www.ncbi.nlm.nih.gov/snp/

at this point you can start generating a raw version of a 23andMe report.

tootie4y ago

I'm unclear from this what kind of equipment you need to extract and analyze the material?

mylons4y ago

you’d likely to have to get the nanopore sequencer in the article or find a lab using Next Generation Sequencing to sequence your DNA and give you “raw data” which are usually fastq files

pas4y ago

Could you please explain how this mapping works? Why it needs so much RAM? Is it doing a fuzzy search of sorts for known sequences (genes)? Why can't it do so one by one?

mylons4y ago

bwa specifically performs a burrows wheeler transform of a 3GB string. other mapping algorithms usually rely on some sort of indexing of the genome. the program then loads this into memory and queries that index for each “read” (a dna fragment from the dna sequencer).

when i worked on https://github.com/iontorrent/tmap we thought it would be a good idea to do something like a “local alignment” (using https://en.wikipedia.org/wiki/Smith–Waterman_algorithm) after doing a lookup into a burrows wheeler transform on a substring of the “read.”

1 more reply

LinuxBender4y ago

Nice! Thankyou for the links. I will research all of this.

mylons4y ago

good luck! it’s not that tough, just a lot of new vocabulary.

ampdepolymerase4y ago

A used laboratory grade NGS system can be had for less than 10K

https://www.ebay.com/itm/265148387179

Nanopore is still not quite ready yet for precise and high accuracy sequencing. Give it another five years.

joshuamcginnis4y ago

That's not true. I just did a high-quality sequence and assembly of a new species of fungus from my home lab using nanopore. You can see all my code used for assembly and analysis that will be referenced in a paper I plan to publish in Jan here: https://github.com/EverymanBio/pestalotiopsis

lawrenceyan4y ago

Given that the decoder is machine-learned and depends on a training set to go from squiggle -> ATGC..., how do you ensure that sequences which haven't been seen before (not in the training set) are still accurately accounted for?

1 more reply

pas4y ago

How do you know the quality of the resulting sequence?

3 more replies

jcims4y ago

I thought I recognized your name from the side hustle story. :) This is super cool!!!

1 more reply

AstroDogCatcher4y ago

Interested outsider here; I work with a lot of HCLS research customers but don't have a biology-related background. Can you explain the problems with the Nanopore sequencer accuracy in more detail? Basically, I was wondering if I could get one for myself and sequence my own genome, then user the data to learn about life-sciences computing techniques. If I were to buy one of the USB-attachable devices and run it, is the data simply not viable for use in a genomics pipeline, or is it just that the results would be questionable? Also, if accuracy is an issue, what about just running the same sample N times and doing some error correction?

ampdepolymerase4y ago

I recommend reading this review

https://genomebiology.biomedcentral.com/articles/10.1186/s13...

I guess there are limits to ensemble methods if the underlying accuracy doesn't increase. I don't work on gene sequencing algorithms but from what I understand of ML ensemble techniques, there are certain assumptions regarding the underlying independence of the errors. The errors for nanopore should be uniform but I am not sure. Any molecular biologist here care to comment?

1 more reply

biophysboy4y ago

The instruments do exactly as you say (run the sample N times), but this obviously comes at a cost. Also, keep in mind that sequencing needs to be very, very accurate to be useful. We share most of our DNA, and the small variations make up all the difference.

1 more reply

snystrom4y ago

Tl;Dr: Nanopore data is historically lower quality than current gold-standard methods, but it is by no means "not viable" in a genomics pipeline. Their newer chemistry flowcells are competitive with current gold-standard (but I've not seen it with my own eyes in the lab yet due to limited release).

There are two components that drive sequencing error rate. 1) The chemistry behind the sequencing (for nanopore sequencing this is the "feeding DNA through a pore" bit) 2) the method to convert raw signal into DNA sequence (this is called "base calling").

The gold-standard in terms of error profile for sequencing is currently the Illumina short read platform. Illumina machines are really just microscopes (TIRF scopes for optics folks) that sequence DNA by visualizing incorporation of dye-labeled nucleotides into the sequenced molecule(s) (Imagine a really slow PCR [1]). Each base is labeled with a different color, then when a molecule has a match it makes a colored spot on the slide that the machine can read (see here for more info & details of newer chemistry that use fewer colors [2]). This whole process is mediated by DNA polymerase which itself has a very low error rate. Another important point is that DNA sequenced on the illumina platform (called a "library") tends to be from "amplified" template DNA, meaning the DNA will have been processed and potentially be missing chemical modifications on the bases that could be present in the organism. This works to Illumina's advantage, because when trying to answer the question of "what is the DNA sequence?" we want the ground-truth DNA, not the modification state.

In contrast, Nanopore sequencing works by feeding a long strand of DNA through a pore and measuring the change in electrical current through the pore (watch the cool video [3]). For the current set of nanopore flowcells, 8 bases of DNA sit in the pore at a time, meaning the current at each timestep is a product of 8 nucleotides in aggregate. This also means that the pore "sees" each base 8 times, but always in the context of an additional 7. In order to basecall from the raw signal, it's not as easy as saying "blue = A", instead, you have to deconvolve each base from a complex signal. As you might imagine, the folks at Oxford Nanopore & broader research community have turned to machine learning-based base callers to solve this problem, and they work quite well [4]. But they are not perfect. Deconvolving runs of the same base (e.g. "AAAAAAA") is difficult because without well-defined signal changes between bases, the caller has a hard time deciding how many bases it has seen, so a common error mode for nanopore sequencing is to create insertions/deletions at places in the genome with low nucleotide diversity. Another interesting reason is that most Nanopore library preps are often performed on unamplified DNA, and so in addition to normal A/T/G/C nucleotides, the template DNA can also contain bases with chemical modifications. For example, in bacteria, A's are often methylated, and in Humans, C can have all kinds of different modifications (5-methyl-cytosine, 5-hydroxymethyl-cytosine, etc. etc.) and each different modification affects the signal in the nanopore. Therefore, basecallers that weren't trained on modified bases will produce basecalling errors in the presence of base modifications.

For both Illumina and Nanopore basecallers, they assign a quality score to each base that indicates the probability that the basecaller produced an incorrect value. This is called a Q-score, which is defined as "Q = -10(log10(P-value))" (i.e. Q / 10 = the order of magnitude of the error probability) [5]. For example, a Q-score of 10 means an error rate of 1 in 10, but a Q-score of 50 means an error rate of 1 in 100,000. For Illumina sequencing, >95% of the reads have a Q-score > 30 (i.e. 1 in 1000 errors), while Nanopore reads tend to have lower average Q-scores (~Q20, i.e. 1 in 100 errors). For genetics, where 1 base difference can mean the difference between a severe disease allele vs a normal variant, 1 in 100 won't cut it.

The current gen Nanopore flowcell chemistry (R9.4.1) is what most people are talking about when they talk about Nanopore error rates, but they've just released a new pore type & made some basecaller upgrades that improve the accuracy to what they call "Q20+" and some claims of Q>30, and from the data I've seen, it's impressive, I just haven't got my hands on one yet to see for myself [6]. I think the comment saying "wait 5 years" is an overestimate, but if you want to genotype yourself today, I'd just pay someone for Illumina sequencing and process the fastq files yourself if you really want to do it as a learning exercise.

I've unintentionally written an essay, so I'll stop here, but real quick to your other point RE: rerunning the sample N times & using the repeats for error correction. This won't work the way you're thinking because a "sample" is actually a collection of DNA molecules that are sampled randomly by the sequencer. You have no way of knowing that the same read between runs was actually from the same molecule, so you can't error correct this way. Consequently, a totally different sequencing platform from Pacific Biosciences uses this strategy by doing some really cool chemistry, but I'll spare you the second essay (google "PacBio HiFi" or "circular consensus reads" if you're interested).

[1] https://en.wikipedia.org/wiki/Polymerase_chain_reaction

[2] https://www.ecseq.com/support/ngs/do-you-have-two-colors-or-...

[3] https://www.youtube.com/watch?v=RcP85JHLmnI

[4] This paper is a tad out of date, but Ryan Wick always writes extremely clear papers: https://genomebiology.biomedcentral.com/articles/10.1186/s13...

[5] https://www.illumina.com/documents/products/technotes/techno...

[6] https://nanoporetech.com/about-us/news/oxford-nanopore-tech-...

Edit: reformatted links for clarity.

2 more replies

anderspitman4y ago

I work in a dry lab but I'm pretty sure you need a lot of expensive chemicals to actually make one of these work, yeah?

mylons4y ago

yup. that’s the business model for Illumina. it’s very much akin to video game consoles. Illumina might take a hit on selling the machine but makes it up in selling you proprietary reagents.

rootsudo4y ago

What sort of books/videos do you suggest so one can learn more? This stuff is interesting, and I've always seen inexpensive lab equipment on ebay.

If this can sequence flora, fungi and human DNA for about 10k - I'd buy it, just to experiment and deep dive. That is such a low barrier of entry it itself is interesting.

rbartelme4y ago

Cost/benefit analysis may dictate that, as other posters suggested, you'd be better served to get raw fastq files from a sequencing lab. Even better if you can send the lab a sample and they'll process the extractions for extra $$.

mylons4y ago

wow i didn’t know they were that “cheap” now. i used to work for a major competitor to the sequencer you linked, the SOLiD.

and i feel like nanopore is the VR of dna sequencing. it’s always just another few years off.

divbzero4y ago

> and i feel like nanopore is the VR of dna sequencing. it’s always just another few years off.

Is this also true for nanopores in protein sequencing? This HN comment from a few weeks back [1] pointed out recent progress but perhaps the tech is still not quite there.

[1]: https://news.ycombinator.com/item?id=29481075

joshuamcginnis4y ago

What do you mean by it's always a few years off? Nanopore will allow you to do high-quality genomic sequencing _now_, in a home lab if you wanted, for less than $3K. If you amortize the 3K by the number of genomes you can sequence on the same flow cell, the price per base or per genome falls precipitously, depending on the size of the genome of course.

ampdepolymerase4y ago

The one I linked to is a decade out of date and OEM discontinued.

1 more reply

haihaibye4y ago

Occulus sold 8.1m units this year, more than XBox.

netizen-9368244y ago

Sounds like a fediverse project?

Malp4y ago

Oh God, I would not want a distributed group of actors with limited trust to sequence my DNA. Maybe it's a project for close group of friends that would be interested?

netizen-9368244y ago

I wasn't thinking sequencing but rather comparison. Could even hash data for comparison to enforce privacy (unsure how effective that would be)

But this could enable things like finding relatives which is what I got out of the comment about 23andme. Instead of all the data being centralized, storage and comparison could be distributed

1 more reply

inciampati4y ago

Your DNA is almost exactly the same as other people's, just a unique mix.

Not sure what you are concerned about. What would you expect a bad actor to do with your DNA sequences? I'm genuinely curious.

3 more replies

glofish4y ago

Alas the information presented is an over simplification of the process.

To actually sequence DNA with this USB thingy you need to prepare a so called sequencing library - and for that you need a fairly well equipped lab - expensive reagents and years of practice and skill ... a mid level biology Ph.D can prepare these ...

in addition the flowcell sold by Oxford Nanopore often malfunctions and the whole run is a bust ... (behaves like this since 2014 ... so no, the technology does not seem to improve a whole lot)

9tailedkitsune4y ago

Yep

inglor_cz4y ago

DNA sequencing bugs me quite a bit.

On one hand, I would love to learn something new about my body.

On the other hand, what if the results tell me that I am predisposed to some horrible untreatable disease? Will I spend the rest of my days observing every little pain or discomfort and thinking "is this IT?"

monopoledance4y ago

I think you would have to two scenarios at hand:

1. A completely genetically determined disease; a rare 100%-going-to-happen deal. (Which you would probably know about already, because your mother, or grandfather died from it...)

2. Some significant, but abstract risk modification.

With 1., you would know, you will get sick/die some time soon in the future, allowing you to live your life accordingly, die without regrets, prepared and so on. You can take that into consideration when planning for a family, taking job offers, procrastinating on the good life with work and retirement plans. Burn bright.

With 2., there is a very, very high chance lifestyle choice influence the stated risk, as obviously not everybody who got the polymorphism gets sick. So you can get your ass up, exercise, quit smoking and drinking, reduce stress, get regular check ups, ..., and avoid getting sick or reduce the impact/progression, in case you do.

I think, logically, knowing is always better than not knowing. But I understand how anxiety does tell a different story.

wallacoloo4y ago

well, build a whitelist of the conditions you are interested in knowing. then just run the report through a sed filter so that it strips out all the information you’re not interested in. destroy the original report. problem solved: infohazards avoided.

wombatmobile4y ago

Knowing something about your prospects doesn't doom you to negative thoughts. In fact, the way the human mind works is often the obverse.

"Inaction breeds doubt and fear. Action breeds confidence and courage. If you want to conquer fear, do not sit home and think about it. Go out and get busy." --Dale Carnegie

"You gain strength, courage and confidence by every experience in which you really stop to look fear in the face. You are able to say to yourself, 'I have lived through this horror. I can take the next thing that comes along.' You must do the thing you think you cannot do." --Eleanor Roosevelt

"Fear is the path to the Dark Side. Fear leads to anger, anger leads to hate, hate leads to suffering." --Yoda

"The brave man is not he who does not feel afraid, but he who conquers that fear." --Nelson Mandela

"Nothing in life is to be feared. It is only to be understood.' --Marie Curie

"The key to change... is to let go of fear." --Roseanne Cash

"He who is not everyday conquering some fear has not learned the secret of life." --Ralph Waldo Emerson

"We should all start to live before we get too old. Fear is stupid. So are regrets." --Marilyn Monroe

"Fear keeps us focused on the past or worried about the future. If we can acknowledge our fear, we can realize that right now we are okay. Right now, today, we are still alive, and our bodies are working marvelously. Our eyes can still see the beautiful sky. Our ears can still hear the voices of our loved ones." --Thich Nhat Hanh

nomercy4004y ago

How about affinities to possible health issues, which could be avoided if you started now and not in 20 years?

inglor_cz4y ago

I know. There is a lot of different scenarios. It is the worst one that bugs me. Human nature in action.

Perhaps a trusted middleman would be a solution: "just don't tell me about anything that is totally beyond my control".

Cyclical4y ago

Nanopore sequencing is a really interesting technology. It utilizes fundamentally the same apparatus as a Coulter Counter [1], which is a general method of counting and sizing arbitrary particles that's frequently used in flow cytometry. Applying it to sequencing by drawing unwound DNA through the pore was a really excellent logical leap, and we're only now starting to see the benefits of even though it was first ideated over 30 years ago.

[1] https://en.wikipedia.org/wiki/Coulter_counter

unemphysbro4y ago

Happy to see this year. I worked on solid-state nanopore development as a part of my PhD.

Now I'm a Data Engineer doing backend work in public sector. :)

Here are some press releases related to articles I published during my PhD:

https://physics.illinois.edu/news/article/34064

https://www.sciencedaily.com/releases/2014/10/141014095320.h...

a-dub4y ago

the nanopore units are awesome! although if i recall, most of the device is a replaceable one time use consumable and the cost of that consumable is quite expensive (at least hundreds, if not thousands).

when i looked i was interested, but was turned off when i saw that the cost far outstripped commercial sequencing services.

walterbell4y ago

There are some bio HackerSpace labs with memberships open to the public.

London, UK https://biohackspace.org/

Brooklyn, NY, https://www.genspace.org/

Baltimore, MD, https://bugssonline.org/

Australia, https://foundry.bio/

up6w64y ago

Reports of people trying to use it at home without any special lab:

https://abarry.org/dna-sequencing-in-our-extra-bedroom/

http://blog.booleanbiotech.com/sequencing-at-home-with-flong...

GekkePrutser4y ago

I don't see any reference to the "USB dongle" mentioned in the title. I was thinking this would be some cool thing you could do at home.

dekhn4y ago

https://nanoporetech.com/products/minion

GekkePrutser4y ago

Ah thanks! Not something 'just for fun', so. But good to see this tech is becoming more affordable!

kingcharles4y ago

So, how long before I can take my DNA "ROM" file and boot it in an emulator that would allow it to grow?

Lev1a4y ago

An idea just popped into my head reading your comment:

What if you could take the (binary) data file of your DNA and use it as input in the (recently remastered) Monster Rancher games to generate a monster? Apparently those games use external user-provided data (like music CDs, game discs etc.) to generate the monsters the player would then train and use (something I only recently learned about through gaming livestreams).

I'd actually like to see the level of jank that would come out of something like that.

callesgg4y ago

Many years, we still have problems simulating a single protein folding correctly. If we don’t find some new algorithm for simulating cells we would need computers that are billions of times faster than our current ones.

Also your dna is bootstraped from your mothers cells. And the prenatal environment has quite a large effect on development so your simulation might end up quite different from you if we only started with your dna.

dekhn4y ago

it's unlikely we would ever be able to achieve this. Even simulating a single cell at high resolution is a serious challenge.

GistNoesis4y ago

It's likely that you don't have to simulate even a single cell at high resolution to be able to simulate how an organism would grow. There are numerical shortcuts.

For example today we can already predict the color of the eyes and other phenotype from the DNA.

If you are able to observe enough samples of cell growth and their associated DNA, you probably can model and predict the statistics of a cell from their DNA. Because the cell is itself the result of a lot of chemical processes, the law of large number will help smooth those statistics.

Given that we have a lot of cells, the collective behavior is probably entirely governed by these statistics.

monopoledance4y ago

> For example today we can already predict the color of the eyes and other phenotype from the DNA.

We could even do that, without knowing anything about DNA at all. Or predict tomorrows weather without satellites and computers.

I think you are a bit too enthusiastic about statistics, or too naive about complexity.

3234y ago

You seriously underestimate the continuous growth of computer power. And quantum computers after, which are perfect for simulating chemical reactions.

What was unthinkable 50 years ago, playing chess better than a human, it's now trivial for a $100 device.

And it's not necessarily required that to simulate the growth of a human you'll need to simulate the entirety of chemical reactions in all 50 trillion cells and all that.

dekhn4y ago

It's possible I underestimate, but I have worked in all the relevant fields of simulation, ~20 years of running various simulations on large HPC, built the largest instance of folding@home using idle cycles inside google data centers, published papers simulating proteins, developed infrastructure to process the voluminous data, etc, etc. Quantum computing remains fantasy (in terms of being useful for science).

It's unlikely even if we improved computing hardware many orders of magnitude beyond all reasonable predictions, that the calculations would be able to simulate all the necessary details; most of our simulations now are based on many approximations due to hardware limitations.

As to the question of "what level of fidelity is required to turn a FASTQ of somebody's genome into an accurate model of the resulting human, with some sort of realistic environment also provided", that's so far beyond what is even remotely comprehensible it's not worth speculating about in terms of science fact; it's just fiction.

monopoledance4y ago

You seriously underestimate the complexity of genetics and ontogenesis.

twotwotwo4y ago

A researcher mentions using a compact index based on the Burrows-Wheeler Transform to fit things in less memory compared to using a huge hashtable.

I see open-source implementations of BWT-based indexes (FM-Index/FMtree) out there. Out of curiosity, does anyone know of anything using BWTs for compact indexes in more everyday uses (like full-text search), or alternately reasons it doesn't really work outside the genome-alignment use case? Likely it only 'pays for itself' if you really need the space savings (like, it's what makes an index fit in RAM) or else we'd see it in use more places. It'd still be kinda neat to actually see those tradeoffs.

jltsiren4y ago

There was some interest in the information retrieval research community 10-15 years ago, but I don't think anyone ever found a good application for it. Some limitations of the BWT always got in the way.

The BWT sees strings as integer sequences. Either "ABC" and "abc" are two unrelated strings, or you normalize before building the index and lose the ability to distinguish between the two.

Search proceeds character-by-character backwards, jumping arbitrarily around the BWT using the same LF-mapping function as when inverting the BWT. You get cache misses for every character.

BWT construction is expensive, because you want a single BWT for the entire string collection. There is a ridiculous number of papers on BWT construction, as well as on updating and merging existing BWTs, but the problem has still not been solved adequately. If your data is measured in gigabytes, you can just pay the price and build the index, but a few terabytes seems to be the practical upper limit for the current approaches.

You can of course partition the data and build multiple indexes, but then you have to search for each pattern in each index. There is no way to partition the data in a way that different indexes would be responsible for different queries.

twotwotwo4y ago

All interesting! Thank you.

dekhn4y ago

Folks are free to analyze my genome, https://my.pgp-hms.org/profile/hu80855C

Last time it was analyzed the conclusion was that there was nothing actionable.

zmmmmm4y ago

Have you ever encountered any insurance implications from it? eg: questioned whether you have ever had a genomic test etc. and had to answer yes and then them wanting to see results?

I guess in your case where nothing actionable is found it's benign. It will be the cases where there are risk factors for late onset things - cancer, diabetes, heart disease etc. where it would get sticky.

dekhn4y ago

No, my health insurance company doesn't care about my whole genome data. Health Insurance companies are already quite skilled at (and profitable due to) their ability to model life expectancy and health issues without genomic data, and they are legally prohibited from using this data, in my country anyway. Life insurance is different (they are allowed to incorporate much more information) but I've never been asked for anything like that.

As for the case where nothing actionable is found- it's not benign. It's absence of information, not information of absence.

lend0004y ago

How does it get the DNA to go through the hole?

Cyclical4y ago

Initially, the DNA is brought near the pore through diffusive (brownian) motion + any small attraction it'll have to the membrane. Close to the pore it uses a combination of the electrophoretic and electro-osmotic effects to draw the DNA molecules through. The application of an external magnetic field will cause the charged DNA molecules to migrate along the field (electrophoresis). This is independent of the fluid, and happens to any ions under voltage. The electro-osmotic flow, on the other hand, is a motion of the fluid itself, pulling the DNA molecules along with it. EOF is a really interesting phenomenon which is caused by the interaction between the surface chemistry (vis-a-vis charge distribution) and the concentration gradient of charge carriers in the fluid. I'd recommend Fundamentals and Application of Microfluidics by Nguyen et al if you're looking for a good primer on electrically induced flows in microfluidics.

wombatmobile4y ago

> Why not make the software into a proprietary product? ... There’s such a race there that it’s hard to commercialize the software for the long term.” Schatz continues, “Plus our work is largely funded through government sponsored grants, so this is one of the important ways for us to give back to society.”

In some people's thoughts, making a better society is the first and most obvious thing to do with technology like this, not an accidental consequence of inconvenience. Fortunately, enough of those people are active in the world to make Main Street different to Wall Street, at least sometimes.

klmr4y ago

It’s a weird quote anyway since there is commercial, proprietary software for DNA sequence analysis. Just a few examples of companies in this space are Sentieon, Edico (acquired by Illumina) and Parabricks (acquired by Nvidia). And Michael knows this (they’re sufficiently well known, and his own research laid some of the earliest foundations that Parabricks would ultimately build upon) so I’m assuming the quote was taken out of context or he was talking specifically about his own lab.

thadk4y ago

Maybe at our local library we should be able to check these nanopore sequencers, or even other devices like simple & robust medical devices like handheld ultrasound devices that plug into iPad's?

luxpir4y ago

There is a 3+ year old London-based project, partnered with an established genome sequencing company, doing something highly interesting.

They sell swab kits directly, or via NFT purchase, for ~$500 for a 30x near complete sequencing (that's 30 passes for over 99.9% vs 0.2% for 23andme et al). The results are stored in an encrypted AMD SEV-E vault to be accessed by big pharma or individuals, only for specific markers, in exchange for the $GENE token paid directly to the genome owner. Figures touted are $50-80 per request. This token is burned as kits are sold, can be staked, offers rewards like DAO membership, can be gifted to charities researching specific diseases in various populations. It can act as a form of UBI in unbanked populations and puts your DNA back in your control.

To me it's the best use of web3 tech I've come across, so disclaimer, I am invested and a DAO member, but it's early in the project still. They are not quite ready for mass marketing. They are moving over to Polygon for very low transaction fees in January, will be launching the first joint NFT/kit sale (the next season might include personal genetically generated art) to fill the vaults with 10k sequenced genomes. They are over half way already through work with charities, but that is the magic number before big pharma can start making queries. Right now though they are quietly building and preparing before marketing plans kick in later in Q1.

Take a look at https://genomes.io where everything is explained in more detail, the team are presented and the tokenomics set out.

TL;dr - for $500 right now you can get your entire genome sequenced, stored in a vault to earn you passive income, if you agree to each query. But wait for the NFT vs buying directly, it will have more perks.

billiam4y ago

TMI.

j / k navigate · click thread line to collapse