A farewell to bioinformatics (2012) (opens in new tab)

(madhadron.com)

337 pointsemcl13y ago170 comments

170 comments

107 comments · 42 top-level

chrisamiller13y ago· 12 in thread

Some thoughts on this article:

- This guy clearly has a limited understanding of the field. This quote is laughable: "There are only two computationally difficult problems in bioinformatics, sequence alignment and phylogenetic tree construction."

- As a bioinformatician, I feel sorry for this guy. Just like any other field, there are shitty places to work. If I was stuck in a lab where a demanding PI with no computer skills kept throwing the results of poorly designed experiments at me and asking for miracles, I'd be a little bitter too.

- Just like any other field, there are also lots of places that are great places to work and are churning out some pretty goddamn amazing code and science. I'm working in cancer genomics, and we've already done work where the results of our bioinformatic analyses have saved people's lives. Here's one high-profile example that got a lot of good press. (http://www.nytimes.com/2012/07/08/health/in-gene-sequencing-...)

- I'm in the field of bioinformatics to improve human health and understand deep biological questions. I care about reproducibility and accuracy in my code, but 90% of the time, I could give a rat's ass about performance. I'm trying to find the answer to a question, and if I can get that answer in a reasonable amount of time, then the code is good enough. This is especially true when you consider that 3/4 of the things I do are one-off analyses with code that will never be used again. (largely because 3/4 of experiments fail - science is messy and hard like that). If given a choice between dicking around for two weeks to make my code perfect, or cranking out something that works in 2 hours, I'll pretty much always choose the latter. ("Premature optimization is the root of all evil (or at least most of it) in programming." --Donald Knuth)

- That said, when we do come up with some useful and widely applicable code, we do our best to optimize it, put it into pipelines with robust testing, and open-source it, so that the community can use it. If his lab never did that, they're rapidly falling behind the rest of the field.

- As for his assertion that bad code and obscure file formats are job security through obscurity, I'm going to call bullshit. For many years, the field lacked people with real CS training, so you got a lot of biologists reading a perl book in their spare time and hacking together some ugly, but functional solutions. Sure, in some ways that was less than optimal, but hell, it got us the human genome. The field is beginning to mature, and you're starting to see better code and standard formats as more computationally-savvy people move in. No one will argue that things couldn't be improved, but attributing it to unethical behavior or malice is just ridiculous.

tl;dr: Bitter guy with some kind of bone to pick doesn't really understand or accurately depict the state of the field.

moondowner13y ago

" I could give a rat's ass about performance. I'm trying to find the answer to a question, and if I can get that answer in a reasonable amount of time, then the code is good enough"

This is the only bad point that a lot of people are aligned with.

The more time a program needs to finish, the more time you will need to run it again with some other dataset, and in turn - more time to find the right answer.

I really feel that people with scientific and mathematics background should learn proper programming (not take a course in some language - but have actual experience). Design patterns, data structures, best practices, memory consumption, are all things that should be known before a person starts submitting code for this kind of projects.

scott_s13y ago

Spending time optimizing a program that you will use once is a waste. Sitting idle while waiting for a program to finish is also a waste. So I think it's reasonable to optimize for programmer time the first time, and then re-visit the design if you discover the code is getting reused and fed larger data sets.

toufka13y ago

Want to teach us? A bunch of us work right near AT&T park in Mission Bay and would love to learn. Even a long day or two from you guys would be awesome. But as was eluded to, we can't pay you - we're poor as shit - especially when compared with you all.

kevin_rubyhouse13y ago

What's the backstory on the author's tangent about the human genome? It sounded like the human genome project didn't actually do what the name implies.

chrisamiller13y ago

Tell that to the tens of thousands of researchers who make use of the human reference genome daily. I don't even know what the guy is talking about there - imagining modern genetics or genomics without it is pretty much impossible.

attractivechaos13y ago

The problem with bioinformatics is not "prematured optimization", but rather no optimization at all.

Inufu13y ago

Out of curiosity, what other computationally difficult problems are there?

I'm very interested in bioinformatics, but sadly don't know as much about the field as I'd like.

drosophila13y ago

1. gene networks is a big one: some proteins turn genes on or off. Some of those genes get translated into other proteins that turn genes on or off. How can you infer the interactions from experimental data? How can you figure out what these complex networks DO? 2. Predicting gene expression: where do proteins bind to the DNA? How can you predict what these proteins do once they are bound ( add chemical tags to structural proteins, knock off structural proteins by bending DNA, etc)? How can you predict how frequently the gene will be transcribed? How does the 3D shape of the DNA effect this?

These are just two of many questions ( biased towards my research interests of course ). It is really funny that he mentions sequence alignment and phylogenetically as the two big problems, because people generally consider these to be boring, uncool, solved-well-enough-for-our-purposes problems nowadays and just trust the algorithms described by Durbin decades ago. It sounds like the writer really doesn't know bioinformatics that well...

zmmmmm13y ago

One that comes immediately to mind is genome assembly, which is a hugely complex problem, and essential to a variety of fields that rely on re-piecing together the genome without a reference (or with a reference that is highly divergent from the sequence data).

1 more reply

ginger_beer13y ago

I do PhD research in metabolomics -- one of the latest omics in bioinfo-- with the CS department in my university. At the moment, we're working on alignment and identification of metabolite data. The data is not big in the sense of genomics data, but messy and complex due to the nature of the instruments (mass spectrometer), which will not get better THAT much in the foreseeable future.

Definitely a computationally difficult problem because while naive approaches work, they produce crappy results, wasting the result of tens of thousands of dollars of experiments. I see a big move towards applying statistical/machine learning methods, and graph theory stuffs in our field.

A lot of the rants in the original article are correct, with regards to prototyping and throwaway codes. That's because researchers are rushing to get an MVP out. The truly good ones got turned into (usually open-source) products, where the code quality hopefully improves a fair bit.

If you're a CS person who's interested or considering a move into bioinfo, I wrote a blog post about it recently: http://www.joewandy.com/2013/01/getting-into-bioinformatics....

pedrobeltrao13y ago

Protein folding is an interesting and computational challenging task. So challenging that some groups have sort of given up on it and move to other fields. Look up David Baker and Rosetta for more info. This is just an example, there are many many problems to work on. I feel sorry for the author of the post, bioinformatics is only getting more interesting as our capacity to make experimental measurements grow. There have been so many interesting findings that are just the product of bioinformaticians digging into existing databases and analyzing them to come up with new theories that have since then been experimentally validated.

jtmcmc13y ago

any type of network reconstruction - gene - gene / protein - protein , gene - protein , interaction network are all very challenging and important computational problems in biology

MattRogish13y ago· 8 in thread

I have some experience working at a genomics research company and I'll broadly +1 Fred's experience about the industry, although in less negative terms. I got out before I got jaded, so my perspective is a bit more "oh, that's a shame" than his. I really like genetics, bioinformatics, hardware, deep-science, and all that but the timing and fit wasn't right.

The tools are written by (in my experience) very smart bioinformaticians who aren't taught much computer science in school (you get a smattering, but mostly it's biology, math, chemistry, etc.). Ex:

http://catalog.njit.edu/undergraduate/programs/bioinformatic...

http://www.bme.ucsc.edu/bioinformatics/curriculum#LowerDivis...

http://advanced.jhu.edu/academic/biotechnology/ms-in-bioinfo...

The tools themselves are written by smart non-programmers (a very dangerous combination) and so you get all sorts of unusual conventions that make sense only to the author or organization that wrote it, anti-patterns that would make a career programmer cringe, and a design that looks good to no one and is barely useable.

Then, as he said, they get grants to spend millions of dollars on giant clusters of computers to manage the data that is stored and queried in a really inefficient way.

There's really no incentive to make better software because that's not how the industry gets paid. You get a grant to sequence genome "X". After it's done? You publish your results and move on. Sure, you carve out a bit for overhead but most of it goes to new hardware (disk arrays, grid computing, oh my).

I often remarked that if I had enough money, there would be a killing to be made writing genome software with a proper visual and user experience design, combined with a deep computer science background. My perfect team would be a CS person, a geneticist, a UX designer, and a visual designer. Could crank out a really brilliant full-stack product that would blow away anything else out there (from sequencing to assembly to annotation and then cataloging/subsequent search and comparison).

Except, I realized that most folks using this software are in non-profits, research labs, and universities, so - no, there in fact is not a killing to be made. No one would buy it.

gabeiscoding13y ago

I live in this field, as a computer scientist learning the biology, and trying to make a living with a bootstrapped company.

I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

http://blog.goldenhelix.com/?p=1534

In terms of your ideal software strategy, I can speak to that as well, as I am actually attempting to do almost exactly what you suggesting. My team is all masters in CS & Stats, with focus on kick-ass CG visualization and UX.

We released a free genome browser (visualization of NGS data and public annotations) that reflects this:

http://www.goldenhelix.com/GenomeBrowse/

But you're right, selling software in this field is a very weird thing. It's almost B2B, but academics are not businesses and their alternative is always to throw more Post-Doc man-power at the problem or slog it out with open source tools (which many do).

That said, we've been building our business (in Montana) over the last 10 years through the GWAS era selling statistical software and are looking optimistically into the era of sequencing having a huge impact on health care.

carbocation13y ago

> I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

I've seen you link to your blog post a couple of times now, and I still think it's misleading. I do wonder whether your conflict of interest (selling competing software) has led you to come to a pretty unreasonable conclusion. (My conflict of interest is that I have a Broad affiliation, though I'm not a GATK developer.)

In your blog post, you received output from 23andme. The GATK was part of the processing pipeline that they used. What you received from 23andme indicated that you had a loss of function indel in a gene. However, it turns out that upon re-analysis, that was not present in your genome; it was just present in the genome of someone else processed at the same time as you.

Somehow, the conclusion that you draw is that the GATK should not be used in a clinical pipeline. This is hugely problematic:

1) It's not clear that there were any errors made by the GATK. Someone at 23andme said it was a GATK error, but the difference between "user error" and "software error" can be blurred for advantage. It's open source, so can someone demonstrate where this bug was fixed, if it ever existed?

2) Now let's assume that there was truly a bug. Is it not the job of the entity using the software to check it to ensure quality? An appropriate suite of test data would surely have caught this error yielding the wrong output. Wouldn't it be as fair, if not more so, to say that 23andme should not be used for clinical purposes since they don't do a good job of paying attention to their output?

Your blog post shows, for sure, a failure at 23andme. Depending on whether the erroneous output was purely due to 23andme or if the GATK had a bug in production code, your post shows an interesting system failure: an alignment of mistakes at 23andme and in the GATK. But I really don't think it remotely supports the argument that the GATK is unsuitable for use in a clinical sequencing pipeline.

1 more reply

specialist13y ago

This is an old story. Every domain I've worked in featured a chasm between the domain experts and the software folks. Experts write terrible software that somehow mostly works. Software folks misunderstand the problem and create overwrought monstrosities.

In my experience, this applies to accounting software, sensor data, computer-aided design, print manufacturing, healthcare, etc.

I imagine there's phases of maturity, something akin to CMM/SEI. Eventually there's enough people with a foot on both sides to bridge the gap.

It just takes time.

elangoc13y ago

Hrrm, I was in a genetics research lab myself and got annoyed at the inefficiencies myself. In particular, I got frustrating to write & use in-house scripts to run pipelines for compute clusters and then not know what the state of the execution is, where the files are, etc. It's sort of a meta-problem, but I decided to do a startup based on writing good software w/ a good UI to make the problem better (problem = running, monitoring, managing pipelines on clusters that have job schedulers like Grid Engine): http://www.palmyrasoftware.com/workflowcommander/

Maybe it's still in the early going, but I do see how it's going to be real difficult making a living doing this. OTOH, companies like CLC Bio seem like they're doing well for themselves...

ank28613y ago

Why wouldn't anyone buy your product? If it is easy to use, and SPEEDS UP RESEARCH TIME, your researcher/PI who is spending thousands on computing clusters will buy your software for their graduate students. Hell, my PI keeps asking me if I need a faster computer so I can run Matlab better/quicker. Really, if I had a software that helped me perform research faster/better/quicker and compare my results to ground truth or gold-standards, that is a much more useful tool than a bunch of hardware for my research. You push out papers fast.

So I disagree with you on your very last sentence (agree with the rest)

gabeiscoding13y ago

Ahh the efficiency argument.

The trick is, academics often have excess manpower capacity in the form of grad students and post-docs. Even though personell is usually one of the highest expenses on any given grant, they often don't look at ways to improve the efficiency of their research man-hours.

That's not a blank rule, as we have definitely had success with the value proposition of research efficiency, but in general, a lot of things business adopt to improve project time (like Theory of Constraints project management, Mindset/Skillset/Toolset matching of personel et) is of no interest to academic researchers.

2 more replies

MattRogish13y ago

I'd be happy for you to be right. At least back when I worked there it wasn't clear the total addressable market was there. It's not that they couldn't buy it, it's that they didn't see the need. Perhaps that has changed. :)

cmccabe13y ago

There are companies out there that offer commercial sequencing software. DNANexus is one.

As for whether there's "a killing to be made", it's kind of unclear so far.

zerohp13y ago· 7 in thread

> the software is written to be inefficient, to use memory poorly, and the cry goes up for bigger, faster machines! When the machines are procured, even larger hunks of data are indiscriminately shoved through black box implementations of algorithms in hopes that meaning will emerge on the far side. It never does, but maybe with a bigger machine…

I spent five years working in bioinformatics, and this is exactly the attitude of both the researchers and the other developers on the projects I worked on. It was very frustrating.

michaelhoffman13y ago

Hi, I'm a bioinformatics researcher. Apparently I work for this guy's ex(?)-employer although I have never heard of him before.

My single most limited resource is programmer time. My time and the time of other people who work with me. I have access to loads of computers that sit idle all the time, even if it is on nights and weekends. There is zero opportunity cost to me in using these computers more fully. I have enough human work to do that I can wait for the results without having any wait states.

There can be a big opportunity cost in trying to rework a workflow so that it is more efficient and then test it thoroughly ensure correctness. Doing this may seem more appealing to someone who is interested primarily in computational efficiency. But I am more interested in research efficiency, and so are my employers and funders.

epistasis13y ago

>There can be a big opportunity cost in trying to rework a workflow so that it is more efficient and then test it thoroughly ensure correctness.

Hi, I recognize your name as a legit bioinformatician, am a huge fan of the lab that you're currently in, and others should listen to you.

I'd like to add that for many projects, general reusable software engineering is not necessarily a huge advantage. Instead of verifying a single implementation, it's often better for somebody to reimplement the idea from scratch; if a second implementation in a different language written by a different programmer gets the same results, this is a much more thorough validation of the software than going over prototype software line by line.

Also, I've seen way too many software engineers come in with an enterprisey attitude of establishing all sorts of crazy infrastructure and get absolutely no work done. If Java is your idea of a good time, it's unlikely that you'll be an effective researcher (though it's not unheard of), because it's not good at maximizing single-programmer output, and not good at maximizing I/O or CPU or string processing. In research it's best to get results, fail fast fast fast, and move on to the next idea. If you're lucky, 1 in 20 will work out. Publish your crap, and if it's a good idea, it will be worth polishing the turd later, but it's better to explore the field then to spend too much time on an uninteresting area.

The only time you worry about efficiency is when it enables a whole other level of analysis. So, for example, UCSC does most of their work in C, including an entire web app and framework written in C, because when they were doing the draft assembly of human genome a decade ago on a small cluster of computers that they scrounged from secretaris' desks over the summer, Perl wouldn't cut it.

5 more replies

east2west13y ago

Well, I can call myself a bioinformatics researcher, I guess, as I have CS Ph.D working in genetics/genomics. I see your point of throwing computers at simple solutions as cheaper than throwing good programmers. I do that too. We are very fortunate in that we write run-once programs that only have to work in one environment using one inputs. However, bad programmers write incorrect programs, which give wrong conclusions that lead to faulty clinical trials (look up Duke University facing class-action law-suit). I have seen people parsing Gigabytes-files with one line of Awk. People seem to forget that good engineering practice is learned with blood. Is it any wonder academic research is looked with suspicion by the pharmaceutical companies?

4 more replies

hnwh13y ago

How can you leverage all of us really good programmers with tons of time, who are dying to work on something "important" and meaningful?

1 more reply

leoh13y ago

Part of the problem is grant money. Sometimes it's faster to buy more machines and get more results as opposed to rewriting entire algorithms. But the author does correctly identify, I think, some tendencies of some academic bioinformaticists.

zerohp13y ago

I have enough experience to know if this is true or not. Many times it was faster to buy more machine, but often it was not. We already had 10000 cores.

I proposed, implemented, and tested an 8 line change to our alignment tool that saved 6% cpu time. It took me two days, most of which was my spare time at home. This one program was using 15 cpu years every month. Nobody cared. It never went into production. I started interviewing for a new job and left shortly after that.

1 more reply

latj13y ago

I dont think the problem is people (researchers, developers) but of the infrastructure for research. Researchers are constantly thinking about getting new grants and renewing old ones the way politicians are constantly worried about their corporate sponsors and getting reelected. The result is that we only get a little science and we only get a little good governance. The internal organizations that form as a result of this environment are artificial. In the lean times researchers make short term decisions aimed at generating marketing and taking mindshare. In the fat times researchers ensure that all computational and lab space are used and come up with new reasons for growth. A friend working in a large research institution once suggested a refactoring that would greatly improve efficiency of an application. Instead, she was handed back down a recommendation that would make the application less efficient with the same functionality. The reason was that the computational usage was about to be audited and the rule was that there would be no improvements in efficiency until after it was complete. The system is an old house with hundred year old plumbing. The people you pour through the system are going to flow through the pipes abiding by the laws of physics. Blaming them for a leak is about as useful as blaming water: while you may win the moral argument, you will not solve the problem. The best you can do is replace them with new people who will react largely in the same manner.

FreeKill13y ago· 7 in thread

If you really want to get a feel for how deluted the Bioinformatics community is, look for a job in the field as an outsider. It's not uncommon to see requirements like:

"Must be an expert in 18 technologies" "Must have a PHD in Computer Science or Molecular Biology" "Must have 12 years experience and post doctoral training" "Pay: $30,000"

It's delusional because they apply the requirements it took for themselves to get a job in Molecular Biology (long PHD, post doc, very low pay for first jobs) and just apply it carte blanche to all fields that may be able to aid in their pursuits. Especially when it comes to software engineering where it can often be extremely difficult to explain why you did not pursue a PHD.

tom_b13y ago

As someone on a bioinformatics team in a public research institution, salaries range from $75k to $100K for developers on our team. This includes a number of people, including myself, who do primarily normal IT things (data management, small webapps for various clinical and research needs) and also for devs doing pipeline/workflow mgmt software, novel dev (e.g. new research code for sequencing), and variant calling work.

In my geographic area, this salary range is somewhat below corporate IT work (say 10% to 15%), but generally higher than the typical university software dev job listing. The university is really bad to list jobs and job requirements with laughable salaries. I have seen (in other departments) web app dev jobs that require significant front-end and back-end skillsets/experience and then pop a salary that is full 50% less than entry level jobs for CS undergrads.

One problem is that hiring departments in that position will find someone to hire at that rate, so they think it was correct. From personal experience, I can verify that "good on-paper" candidates with exceptional credentials (say MS in CS, bunch of experience) from other depts who look to join our team are unable to to write any code at the whiteboard at all (say a for loop in java to println something). But to be fair, a recent job interview cycle one of my teammates performed produced exactly two candidates out of 16 who could do this and only one of those could write a SQL statement that required a simple inner-join. Most of those folks were external, so it's not just a problem inside the institution.

I have a number of cynical and embarrassing opinions about this situation.

tensor13y ago

I don't disagree with most of your post, though I cannot resist commenting on the whiteboard interview test. Writing anything other than rough pseudo code or algorithm sketches on the whiteboard is a silly exercise. It's not reflective of any sort of reality, probably indicates to candidates that you are not working on any interesting problems, and people won't remember exact syntax or library functions for any language that they don't use fairly regularly.

The whiteboard is only useful as an aid in explaining an algorithm. If a candidate can do that without the whiteboard, even better.

1 more reply

michaelhoffman13y ago

I assume these are separate requirements. I have not seen any doctoral-level positions advertised for a salary of $30,000. The minimum NIH salary for postdoctoral trainees is more than that.

It's only delusional if they can't find people to fill the jobs. The idea that, as an outsider, you know what requirements they should use in their hiring process better than they do is perhaps more delusional.

FreeKill13y ago

I'm not an outsider and the 30K was a bit of an exaggeration, and I apologize for that. The point I was trying to make was that if you look in as an outsider, you would see the requirements being extremely daunting compared to what you might see elsewhere with a pay scale that is very low and unappealing to anyone who might match it. Unless, of course, you just finished your degree in some biological discipline where the jobs are scarce. They are absolutely delusional (and so am I, most likely) because in most cases what they really need to solve the problems they have, is the same type of person most companies would need in a similar situation, a quality software engineer with experience building quality applications that are both extensible and maintainable.

I worked in bioinformatics for more than 10 years before I moved on, and In my experience they do have a lot of trouble finding people to fill positions, especially outside of massive government funded groups like the NIH. This often results in passing on competent software engineers with a B.Sc. that don't meet the requirements in favor of PHD level biology graduates who have taken a year or so of undergrad computer science courses. In my experience, this leads to many of the problems discussed (and exaggerated) by the OP. While some of these people are smart and produce good work, much of the time they produce poor quality software that gets the job done, but as inefficiently as possible and they leave a code base that is virtually unusable. Overall, I mostly just wanted say that it's a mindset they REALLY need to get past for the long term success of the industry.

2 more replies

leont13y ago

Check the Sanger Institute's job page (https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_page?p_web...). They offer «£29,750 to £37,525» for a "senior bioinformatician", for example.

Devilboy13y ago

He's exaggerating about the 30k of course but it's true that these positions don't pay very well compared to what experienced programmers can get elsewhere.

Lost_BiomedE13y ago

I have seen this in many bio fields. As the biology research becomes more of a computational problem, requiring unique solutions for bleeding edge research, I imagine the field is going to have huge pains before actually paying for the work vs. using seniority and degree level as the sole determinate of pay scale.

lemming13y ago· 7 in thread

This is a little discouraging - BioInformatics was my top choice for a Master's program I'm planning to start this year. The program at Melbourne Uni looks really good (accepts from three streams, Math/Stats, Biology or Computing and tailors the course based on your background). Maybe I should go for a more generic Machine Learning one and try to apply that to healthcare in some other field if things are really this bad.

chrisamiller13y ago

As someone in the field, let me assure you: This article does not accurately reflect the state of the field.

lemming13y ago

Thanks for the reply. I wasn't basing this just on the article, there seem to be a fair number of comments here supporting a less-extreme version of what he's saying.

zmmmmm13y ago

I'm just starting a PhD at Melbourne Uni in bioinformatics after working in the field for several years. Don't pay any attention to this is my advice. Bioinformatics is a field currently pulling itself up by its own bootstraps out of the realm of research into the clinic. That's a painful process to be sure, but IMHO it's the most profoundly exciting time to be part of any discipline. You are literally being a part of and watching history in the making. It's going to be messy, but there are chances to contribute here like no other field going around.

lemming13y ago

Interesting, thanks for the point of view. If you'd be interested in getting a beer or a coffee at any point, let me know - contact details in my profile.

jerven13y ago

I am also in the field, and IMHO we are starting to get away from the worst excesses code quality wise i.e. things are getting better.

6 years ago using CVS or something like that was novel. Now not using GIT is. Big improvement!

Problems are still interesting and challenging.

jstevens8513y ago

Could you add an email to your profile? I'd like to email you regarding Masters courses at UniMelb.

lemming13y ago

Sure, done.

jmspring13y ago· 6 in thread

Sounds like a fed up academic with a stick up his backside.

Sh*tty data? Comes from the community. If the data and algorithms are so poor, and the author so superior, he should have been able to improve the circumstances.

This whole screed reads like an entitled individual who entered a profession, didn't get the glory, oh and yeah, academia doesn't pay well.

In the realm of bioinformatics, lets ignore the work done on the human genome and the like.

gwern13y ago

> Sh*tty data? Comes from the community. If the data and algorithms are so poor, and the author so superior, he should have been able to improve the circumstances.

Why? Aren't you assuming a lot about the incentives? What if the ground truth is simply that all the results are false due to a melange of bad practices? Do you think he'll get tenure for that? (That was a rhetorical question to which the answer is 'no'.) Then you know there's at least one very obvious way in which he could not improve the circumstances of poor data & algorithms.

michaelhoffman13y ago

He's not getting tenure because he doesn't have a PhD. According to LinkedIn, he has a master's degree awarded after four years of study [1], which often indicates someone who did not complete a PhD.

[1] http://www.linkedin.com/pub/frederick-ross/13/81a/47

2 more replies

saraid21613y ago

> In the realm of bioinformatics, lets ignore the work done on the human genome and the like.

He discusses this specifically in the rant. Are you saying he's wrong?

singingfish13y ago

Depends. Subtle corruption of institutional research processes is unfortunately far too common. It means that there's nice low hanging fruit if you know where to look and have access to funding. But that, especially the latter is a tall ask in almost every field.

dmak13y ago

Perhaps the algorithms aren't within his grasps. They could very well be paying for an out-of-the-box solution.

ucee05413y ago

he should have been able to improve the circumstances

Was anyone asking him to? Was anyone paying him to? No? Then it's an uphill battle and also not his responsibility. Leaving is saner.

mscarborough13y ago· 6 in thread

>> I’m leaving bioinformatics to go work at a software company with more technically ept people and for a lot more money.

More money, good on you. Starting off your critique of your former colleagues with "technically ept people'...not going to get a lot of sympathy for the correctness of your work.

aheilbut13y ago

Everyone is jumping on that, but (while I had to look it up too) 'ept' actually is a real word:

from the OED:

ept, adj. Pronunciation: /ɛpt/ Etymology: Back-formation < inept adj.

  Used as a deliberate antonym of ‘inept’: adroit, appropriate, effective.

1938 E. B. White Let. Oct. (1976) 183, I am much obliged..to you for your warm, courteous, and ept treatment of a rather weak, skinny subject.

1966 Time 30 Sept. 7/1 With the exception of one or two semantic twisters, I think it is a first-rate job—definitely ept, ane and ert.

1976 N.Y. Times Mag. 6 June 15 The obvious answer is summed up by a White House official's sardonic crack: ‘Politically, we're not very ept.’

SilasX13y ago

We have the term "adept" though, which is actually in common usage and fits the intended meaning here...

christiangenco13y ago

That was…surprisingly thorough.

1 more reply

Paradigma1113y ago

Isn't it more likely that he just mispelled "apt".

droithomme13y ago

Well, ept is obviously a back-formation and a clever and amusing one.

Etymology is straight from Latin: ineptus, which is prefix in- plus aptus (fitting or suitable). Interestingly there's also inapt which is quite similar.

edit: aheilbut's research on this is much more thorough.

MartinCron13y ago

I used a similar back figuring when describing a co-worker who was in the wrong job... "He's not inept, he is inapt"

kevinalexbrown13y ago· 3 in thread

John Graham-Cumming (jgrahamc here) co-authored a piece on making scientific code open. It was received well-enough that Nature published it [0]. This approach has inspired others to do better work by describing a concrete problem, then outlining steps to fix it on an individual and institutional level.

When someone finds fault with the way a field conducts itself, I would implore them to constructively influence that field. You might be surprised how many are actually sympathetic to your concerns.

I'm not dismissing this author's concerns: to do that would really require knowing the molecular biology field (which is more than sequencing, it turns out). I do neuroscience right now, and programming can be a problem for some. But a constructive suggestion to change can have much more impact than a long rant.

[0] http://www.runmycode.org/data/MetaSite/upload/nature10836.pd...

chewxy13y ago

Off topic, but since you mentioned jgrahamc's article in Nature, interestingly, this was what I read last night on Simply Statistics: http://simplystatistics.org/2013/01/23/statisticians-and-com...

It's a similar issue. I think statisticians are taking constructive steps to correct their path, since you know, ML is the new sexy thing. Bioinformatics could take a much longer time to self-correct though.

Although, as I mentioned in an earlier comment, Fred seems to be in a prime position to disrupt the bioinformatics field since he seems to know all the problems that afflict it

troymc13y ago

Regarding "ML is the new sexy thing," check out these graphs:

http://books.google.com/ngrams/graph?content=machine+learnin...

http://www.google.com/trends/explore#q=machine%20learning

1 more reply

jflatow13y ago

The problem is people who would have the experience / knowledge to really make it better, are not tempted to go in and fix it, because there's so much political / non-technical work involved in doing that. If someone wants to solve hard computational problems, they might as well go into another field. If they really care about doing biology, chances have been that they aren't the greatest programmers (I understand this may be changing, but still seemed to be the case 4 years ago when I left bioinformatics). This leaves the people who are happy with the status quo, staying in bioinformatics, and the people who are dissatisfied, going to other fields where they feel their work can have more of an impact.

In my experience, what happens is that biologists define the science, and they depend on the computer scientists / engineers to implement solutions to their computational problems. The computational people depend on the biologists to validate whatever results they produce. The iteration cycle can be painfully slow, especially for people used to telling machines what they want them to do, and getting results immediately. The proposition of changing that dynamic is not alluring to most people, but I still hope there will be some who try.

chewxy13y ago· 3 in thread

Spelling error: 'technically apt', not 'ept'.

"Ept" means effective. As in "inept"

I don't understand this part:

> No one seems to have pointed out that this makes your database a reflection of your database, not a reflection of reality. Pull out an annotation in GenBank today and it’s not very long odds that it’s completely wrong.

In fact this entire article seems to be a rant on why bioinformatics as a field is rotting. But instead of ranting, surely something can be done about it?

Shouldn't we as hackers see this as an opportunity to revolutionize the field?

dmak13y ago

It all begins with a rant.

saraid21613y ago

As a general rule, the people on the short end of the stick are the people least capable of producing change. Worse, change that they bring about tends to be good from a strict, technical viewpoint but has huge negative side effects that go unnoticed or deliberately ignored until it becomes difficult to distinguish the resultant system as a better one.

Rants like this, and providing interviews to third parties, are actually one of the more positive things that he could bring to the table: it provides information to people who aren't aware and inspires motivation in people who aren't entangled.

chewxy13y ago

I don't know, but I think Fred is in a prime position to disrupt bioinformatics. He knows all the flaws, he knows all the problems. If I were him, I'd have seized the opportunity and work on a hard problem.

Then again, I am in no position to judge what Fred should or should not do

1 more reply

ChristianMarks13y ago· 2 in thread

My experience working as a scientific programmer is this: my colleagues aren't forthcoming. I could list case after case of failure to document or communicate crucial details that cost me days, weeks and even months of effort. But I won't, until I have another job lined up. If I were in the author's position (I'm in another field), I would insist that my colleagues--all of them, in whatever field I ended up working, were forthcoming about their work. This is non-negotiable. Being over-busy is no excuse. (It may be an excuse for not being forthcoming, but right or wrong, I couldn't care less--I would not work with such people if I could avoid it, for whatever reason.)

Academia rewards journal publication and does not adequately reward programming and data collection and analysis, although these are indispensable activities that can be as difficult and profound as crafting a research paper. At least the National Science Foundation has done researchers a small favor by changing the NSF biosketch format in mid-January to better accommodate the contributions of programmers and "data scientists": the old category Publications has been replaced with Products.

Naming is important to administrators and bureaucrats. It can be easy to underestimate the extent to which names matter to them. Now there is a category under which the contribution of a programmer can be recognized for the purpose of academic advancement. Previously one had to force-fit programming under Synergistic Activities or otherwise stretch or violate the NSF biosketch format. This is a small step, but it does show some understanding that the increasingly necessary contributions of scientific programmers ought to be recognized. The alternative is attrition. Like the author of the article, programmers will go where their accomplishments are recognized.

Still, reforming old attitudes is like retraining Pavlov's dogs. Scientific programmers are lumped in with "IT guys." IT as in ITIL: the platitudinous, highly non-mathematical service as a service as a service Information Technocracy Indoctrination Library. There is little comprehension that computer science has specialized. For many academics, scientific programmers are interchangeable IT guys who do help desk work, system and network administration, build websites, run GIS analyses, write scientific software and get Gmail and Google Calendar synchronization running on Blackberries. It is as if scientists themselves could be satisfied if their colleagues were hired as "scientists" or "natural philosophers" with no further qualification, as opposed to "vulcanologist" or "meteorologist" (to a first order of approximation).

drosophila13y ago

Right now experimentalists generate data and then try to find computer people to analyse their data. However, in the not too distant future computer models will drive experimental research as hypothesis generation tools. Then the computer people will be seeking biology people ( or robots) to run experiments to validate their hypothesis and there will be more respect for the field.

ChristianMarks13y ago

This seems to presume that scientific programming is merely a service to the important and more deserving persons who generate scientific hypotheses, from whom it can be decoupled and isolated, instead of being the collaborative effort that it is--if elevating the professional standing of scientific programmers must wait for the widespread adoption of automated hypothesis generation software. For example, the computation of ecosystem service indicators--what you might call the interface between biogeophysical models of Earth systems and economic and policy modeling--is an interdisciplinary and collaborative activity that relies heavily on computational technique and technology.

adambratt13y ago· 2 in thread

Really makes me want to learn more about molecular biology.

Any solid factual resources besides the references mentioned in this justified rant?

BioGeek13y ago

Biostars.org is a stackexchange-like site for bioinformaticians.

See there for answers to your question, eg:

* Best resources to learn molecular biology for a computer scientist. [1]

* What are the best bioinformatics course materials and videos (available online)? [2]

[1] http://www.biostars.org/p/3066/

[2] http://www.biostars.org/p/10766/

gabeiscoding13y ago

If you're interested in Next Generation Sequencing (the new "technology" OP referred to to replace microarrays), I wrote a 3-part series on my blog:

"A Hitchhikers Guide to Next Generation Sequencing"

Part1: http://blog.goldenhelix.com/?p=423

Part2: http://blog.goldenhelix.com/?p=490

Part3: http://blog.goldenhelix.com/?p=510

aheilbut13y ago· 1 in thread

I sympathize with the author, but this piece fails because many of the specific criticisms are off-base, and he's not trying to be at all constructive.

For example, it isn't true at all that microarray data is worthless. The early data was bad, and it was very over-hyped, but with a decade of optimization of the measurement technologies, better experimental designs, and better statistical methods, genome-wide expression analysis became a routine and ubiquitous tool.

The claim that sequencing isn't important is ridiculous. It's the scaffold to which all of biological research can be attached.

However:

There is a great deal of obfuscation, and reinventing well-known algorithms under different names (perhaps often inadvertently). There's also a lot of low-quality drivel on tool implementations or complete nonsense. This is driven largely by the need in academia to publish.

The other side of this problem is that in general, CS and computer scientists don't get much respect in biology. People care about Nature/Science/Cell papers, not about CS conference abstracts. Despite bioinformatics/computational biology not really being a new field anymore, the cultures are still very different.

east2west13y ago

No kidding about reinventing wheels. I once saw a manuscript based entirely on dot-product as 1-D least-square. I don't know what happened to it, but one reviewer called it a seminal event in GWAS.

Bioinformatics is hard, but too many careerists take advantage of difficulties and uncertainty to publish as many papers as they can get away with.

stiff13y ago· 1 in thread

This is pretty hilarious, from my brief experience with bioinformatics I can very well imagine someone writing the opposite rant, about CS people getting into bioinformatics not knowing sh*t about biology. I mean, browse through bioinformatics textbooks, those are either written by computer scientists and those are little more than string algorithm textbooks or by biologists and then the layer of jargon for someone coming from CS is just impenetrable. Same with bioinformatics teachers, I come from a CS background, but spent one solid month seriously trying to understand the basics of molecular biology and my bioinformatics seminar instructor sometimes seemed to know less about it than me. Terrifying, no wonder nonsense results are produced.

sampo13y ago

My friend said: Bioinformatics means that computer scientists – who don't know mathematics and don't know biology – are trying to do mathematical biology.

vsbuffalo13y ago

I agree with him, and have been complaining about the same shit for ages (I work in bioinformatics too). Sadly, biologists don't care. We're treated as the number crunchers. The real problem isn't that we waste computational resources, it's that many biologists download programs, run their data through it, and if it spits out an answer rather than an error, they trust it. Since that program probably has zero unit test coverage, and the results may be fed into pharmaceutical decisions, disease diagnostics, etc, you're basically fucked if something went wrong. Lots of us have said this[0].

Minor quibble: genome assembly is definitely still an open problem that's computationally difficult. So is robust high dimension inference, but that falls more under statistics.

I've wanted to leave at least a dozen times too, for the better pay, for working with programmers that can teach me something, and to not have my work be interrupted by academic politics. But the people pissed at the status quo are the ones that are smart enough to see it's broken and try to fix it, and if we all leave, science is really fucked.

[0] http://www.johndcook.com/blog/2010/10/19/buggy-simulation-co...

chris_wot13y ago

I always feel awkward reading these rants, mainly because I've burned my bridges before and it really wasn't worth it. Even if it is true, it's better to leave it and move on.

If you really feel strongly about something, write it dispassionately (normally some time after the event) and treat it like a dissertation, backed with case studies and citations.

jostmey13y ago

Basic science moves forward slowly limited by the pace of fortuitous discoveries. I have found that many people from the field of computer programming have unrealistic expectations of what can be done in biology and other sciences.

CrLf13y ago

"I’m leaving bioinformatics to go work at a software company [...]"

"[bioinformatics] software is written to be inefficient, to use memory poorly, and the cry goes up for bigger, faster machines! [...]"

Well, the author is heading for a very bitter surprise...

kylemaxwell13y ago

You know, I'd be more inclined to listen to him if he didn't also completely decry almost all of modern biology, which (in my view) has been to the late 20th and early 21st centuries what physics was to the late 19th and early to mid 20th centuries.

skittles13y ago

I spent a year in a bioinformatics PhD program and got the feeling I was studying to be science's version of the business analyst. Not knowing enough about the biology or computation, but expected to speak the language of both. And what would my research consist of in such an applied science? Luckily I had another opportunity and became a software developer (which I'm happy with). The worst thing about the experience was listening to so many research presentations where I could tell the presenter didn't understand the science and could barely explain it.

singingfish13y ago

Also, yes molecular biologists with few exceptions know little more than fuck all about ecology. Hence the mostly gung-ho attitudes to GM of crop foods for example. Honestly. I've done real molecular biology work (simple commercial protein chemistry and molecular phylogenetics of mitochondrial DNA) and tried to start a PhD in ecology (failed due to funding issues and realising it was a dead end job wise).

sciencerobot13y ago

There are a lot of problems in bioinformatics. Mainly, lack of reproducibility (ie "custom perl scripts"), poorly organized and characterized data and plenty of wheel reinvention (I heard Jim Kent, who first assembled the human genome, created his own version of wc [word of mouth, citation needed]).

The fact of the matter is that through high-throughput sequencing, microarrays, what have you, generation of biologically-meaningful results is possible.

There are a lot of problems in bioinformatics that need to be solved. Github has helped. More of bioinformaticians are learning about good software development practices, and journal reviewers are becoming more enlightened of the merits of sharing source code.

BioGeek13y ago

Also see the discussion at the bioinformatics subreddit: http://www.reddit.com/r/bioinformatics/comments/179e9k/a_far...

Agathos13y ago

Interesting to read since I made the same career move last year. I agree with about half of it but don't see a lot of value or useful advice here.

I find it curious that he stops to salute ecologists, since I was in an ecology lab. I liked my labmates and our perspective, but we didn't have any magical ability to avoid the problems he aludes to here.

I think a lot of his frustration comes down to not being more involved in the planning process. That's not a new problem. R.A. Fisher put it this way in 1938: “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”

Perhaps the idea that we can have bioinformatics specialists who wait for data is just wrong. Should we blame PIs who don't want to give up control to their specialists, or the specialists who don't push harder, earlier? Ultimately the problem will only be solved as more people with these skills move up the ranks. But the whole idea that we need more specialists working on smaller chunks of the problem may be broken from the start (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1183512/).

sbassi13y ago

OK, I agree that there are some shitty work on this field, but he can't think they we all in the same boat. For example "Irene Pepperberg’s work with Alex the parrot dwarfs the scientific contributions of all other sequencing to date put together." this is not true. Bioinformatics is not just blinding sequencing new DNA, but analyzing data and almost every new breakthrough in medicine is based in a direct (or indirect) bioinformatics analysis. I used to work in an agrobiotech company and the sequencer was the first source of data for any breeding program. Bioinformatics was used to design primers for PCR to find molecular markers. There is bad software out there? Yes, but I see this as an opportunity than a problem. And the cause is not the need to hide something, but the lack of ability of biologists with no CS background in the field.

neilk13y ago

Maybe overblown, but it echoes complaints I've heard from other bioinformatics people.

Surely this means there's a goldmine waiting there for someone to produce a non-broken toolchain for bioinformatics?

Or is it even possible to produce standard tools? Maybe all the labs are too bespoke?

jerryhuang10013y ago

i totally disagree on Fred's negative view of Bioinformatics. as "software is eating the world", it's actually bioinformatics is eating biology. today's main-stream biology is dealing with exploding amount of data from modern instruments, images or clinical data collected every day and mostly machine readable. to stay up-to-date a modern biologist / bioinformatist need to think biological problems in a "big-data" (i know, cliche) way, then try to gain some insight from the data with (computational) tools. today it's the algorithms, mathematical models and software packages on top of databases to pinpoint cancer SNPs and drive drug discovery. and today it's these same algorithms and math models driving how web bench works are designed. if you think biological data are "shitty", i guess you never see other kind of unstructured data out there. so many scholars in other fields envy biologist and medical scientists for something called "PubMed". on the other hand, for those purely wet bench "biologists" who think computers are magic boxes to give answers, insights, models with one push of the button, i do feel sorry for them. they are so last-gen as they just don't have the essential techniques nowadays (just like a molecular biologist not knowing pcr).

dderiso13y ago

Some things are going to suck in academia, as this guy points out. But, its a necessary step and todays progress is almost always going to be tomorrows shit. So quit bitching.

Biologists are almost never good coders, if they can code at all. But thats not what they do, they signed up for pipettes, not python.

Its the programmers who wrote said shitty code that are to be blamed, but you can't hate under-paid and over-worked phd students who write this code even though it usually has nothing to do with their thesis (the math/algorithm is the main part, the deployable implementation is usually not the most important).

If you want good code and organized/accountable databases, go to industry. Theres nothing new about this transition. The IMPORTANT part, is that industry gives back to academia. So when you get an office with windows and a working coffee machine, remember to help make some phd student's life a little easier by making part of your code open source.

SilasX13y ago

Where does the Rosalind project (rosalind.info) fit into all of this, I'm wondering? It seems to be written by people who have actual understanding of the mappings between biology and informatics, with clear explanations of problems in terms of the programming challenge involved.

Surely they can't get that far without having some kind of sensible method?

dinkumthinkum13y ago

Why is this on the front page or why is it relevant? It's kind of a rant. I did some work on a publication in this field and was published once; I don't think it is a horrible research program. There may exist some of the issues in bioinformatics described here but I don't think it is terribly productive.

ascotan13y ago

Having working in the bioinformatics industry as an SE for 9 years I can both agree and disagree.

1. I agree that SE standards and good coding practice are completely absent in the bioinformatics world. I remember being asked to improved the speed of some sequence alignment tools and realized that the source code was originally Delphi that had been run through a C++ converter. No comments, single monolithic file. The vast majority of the bioinformatics code I worked with was poorly written/documented Perl. In addition a lot of bioinformatics guys don't understand SE process and so rather than having a coordinated engineering effort, you end up with a lot of "coyboy coding" with guys writing the same thing over and over.

2. I agree that productivity is very slow. This is a side product of research itself though. In the "real world" (quoted) where people need to sell software, time is the enemy. It's important to work together quickly to get a good product to market. In the research world, you get a 2/5 year grants and no one seems have much of a fire under them to get anything done (Hey we're good for 5 years!). You would think that the people would be motivated to cure caner quickly (etc), but it's not really the case. Research moves at a snail's pace - and consequently the productivity expectations of the bioinformatics group.

3. I disagree that research results from the scientists are garbage. Yes it's true that some experiments get screwed up. However, if you having a lot of people running those experiments over and over, the bad experiments clearly become outliers. Replication in the scientific community is good because it protects against bad data this way. Somehow the author must have had a particularly bad experience.

4. Something the author didn't mention that I think is important to understand: most scientists have no idea how to utilize software engineering resources. The pure biologists, many times are the boss, and don't really understand how to run a software division like bioinformatics. Many times PHD's in CS run a bioinformatics group, who have never worked in industry and don't know anything about good SE practice or how to run a software project. A lot of the problems in the bioinformatics industry is directly related to poor management. Wherever you go you're going to have team members that have trouble programming, trouble with their work ethic, trouble with following direction. However, in a bioinformatics environment where these individuals are given free reign and are not working as a cohesive unit, you can see why there is so much terrible code and duplication.

caseybergman13y ago

This piece seems to have touched a nerve in the bioinformatics community, though I have no idea why. Much of what is said here is obvious to anyone working in academic research that requires programming expertise.

Yes, industry typically pays more than academia. Yes, most molecular biologists cannot code and rely on bioinformatics support. Yes, biological data is often noisy. Yes, code in bionformatics is often research grade (poorly implemented, poorly documented, often not available). These are all good points that have been made many times more potently by others in the field like C. Titus Brown (http://ivory.idyll.org/blog/category/science.html). But they are not universal truths and exceptions to these trends abound. Show me an academic research software system in any field outside of biology that is functional and robust as the UCSC genome browser (serving >500,000 requests a day) or the NCBI's pubmed (serving ~200,000 requests a day). To conclude from common shortcomings of academic research programming that bioinformatics is "computational shit heap" is unjustified and far from an accurate assessment of the reality of the field.

From looking into this guy a bit (who I've never heard of before today in my 10+ years in the field), my take on what is going is here is that this is the rant of a disgruntled physicist/mathematician is a self-proclaimed perfectionist (https://documents.epfl.ch/users/r/ro/ross/www/values.html), who moved into biology but did not establish himself in the field. From what I can tell contrasting his CV (https://documents.epfl.ch/users/r/ro/ross/www/cv.pdf) to his linkedin profile (http://www.linkedin.com/pub/frederick-ross/13/81a/47), it does not appear that he completed his PhD after several years of work, which is always a sign of something something going awry and that someone has had a bad personal experience in academic research. I think this is most important light to interpret this blog post in, rather than an indictment of the field.

That said, I would also like to see bioinformatics die (or at least whither) and be replaced by computational biology (see differences in the two fields here: http://rbaltman.wordpress.com/2009/02/18/bioinformatics-comp...). Many of the problems that apparently Ross has experienced come from the fact that most biologists cannot code, and therefore two brains (the biologist's and the programmer's) are required to solve problems that require computing in biology. This leads to an abundance of technical and social problems, which as someone who can speak fluently to both communities pains me to see happen on a regular basis. Once the culture of biology shifts to see programming as an essential skill (like using a microscope or a pipette), biological problems can be solved by one brain and the problems that are created by miscommunication, differences in expectations, differences in background, etc. will be minimized and situations like this will become less common.

I for one am very bullish that bioinformatics/computational biology is still the biggest growth area in biology, which is the biggest domain of academic research, and highly recommend students to move into this area (http://caseybergman.wordpress.com/2012/07/31/top-n-reasons-t...). Clearly, academic research is not for everyone. If you are unlucky, can't hack it, or greener pastures come your way, so be it. Such is life. But programming in biology ain't going away anytime soon, and with one less body taking up a job in this domain, it looks like prospects have just gotten that little bit better for the rest of us.

ejain13y ago

I agree that a lot of effort that is put into bioinformatics is wasted. But it's silly to say that bioinformatics hasn't contributed much to science, and naive to think that dysfunctional software development is less widespread outside of bioinformatics.

julienchastang13y ago

Fascinating HN thread. I work in the geoinformatics domain where many of the same comments apply. I agree scientists turned programmers are often poor software developers. Moreover, this group often belittles industry established best practices in software development. But in truth, the "pure" software engineer/computer scientist lacks sufficient domain expertise to accomplish something useful. Learning fluid dynamics requires many years of education. Ideally, you would like these two groups to work closely together and with mutual respect.

iharris13y ago

I largely agree with Fred's opinion on the shortcomings of bioinformaticians and the general attitude in the industry, but my personal experience was actually pretty positive. My past research was on building visualizations of the complicated biochemical processes, for use in educating undergrads. It was certainly more interesting than slogging through mounds of crappy data.

Just another data point for someone contemplating a career in BINF, although some purists might say that my work did not really fall under the same category.

mvanveen13y ago

Say for the purposes of argument that this thesis were true. What is there (if anything) to be done about it? I ask as a naive interested party with a CS background.

ElliotH13y ago

That's a shame. I just finished a uni module about bioinformatics. It seemed like a cool field where progress was being made, and as an undergraduate I could generate meaningful looking results by following very recent papers. I hope the field has some saving graces even if this is all true. The idea of CompSci folk working with biology folk to solve human problems inspired me a lot.

jmgao13y ago

The author is exactly right about the quality of data in bioinformatics. There are datasets with genes named MAR1, DEC1, etc. getting mangled to 1-Mar, 1-Dec, because of Microsoft Excel autoformatting.

http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-a...

pjotrp13y ago

The bio in bioinformatics is the important bit. Informatics plays second fiddle, even in the name. Very few will appreciate your beautiful code, but many will appreciate you finding a cure for cancer. That is the reality of bioinformatics, most of the code has a short shelf life. If you luck out, your software may live longer, as is the case with samtools. That samtools code is crappy is true, still the much cleaner code alternatives, sambamba and bamtools, are not much used! Go figure.

Maybe bioinformatics is not the place to aim for great informatics. We do bioinformatics because of love of science first and foremost. This is frontier land, the wild west, and it pays to play quick and dirty. I would suggest to hang on to some best practices, e.g. modularity, TDD and BDD, but forget about appreciation. Dirty Harry, as a bioinformatician you are on your own.

To be honest, in industry it is not much different. These days, coders are carpenters. If you really want to be a diva, learn to sing instead.

thornad13y ago

molecular biology has been dead for years now, but the amount of money poured into it makes it impossible to publish its death certificate. Here is why and how it happened (among other things): http://www.youtube.com/watch?v=Y0b11S1FjXY

datz13y ago

Come work with me in my genomic interpretation company. Fun application building, no data mess, big money!

retrogradeorbit13y ago

Someone's got a bad case of God Complex.

helloamar13y ago

i'm not into bio, but read articles on latest development. my sister also took bioinformatics but the scope in India is very less it seems.

have you checked out synthetic biology? will it be easy to understand when you have a degree in bioinformatics?

j / k navigate · click thread line to collapse

170 comments

107 comments · 42 top-level

chrisamiller13y ago· 12 in thread

Some thoughts on this article:

tl;dr: Bitter guy with some kind of bone to pick doesn't really understand or accurately depict the state of the field.

moondowner13y ago

" I could give a rat's ass about performance. I'm trying to find the answer to a question, and if I can get that answer in a reasonable amount of time, then the code is good enough"

This is the only bad point that a lot of people are aligned with.

The more time a program needs to finish, the more time you will need to run it again with some other dataset, and in turn - more time to find the right answer.

scott_s13y ago

toufka13y ago

kevin_rubyhouse13y ago

What's the backstory on the author's tangent about the human genome? It sounded like the human genome project didn't actually do what the name implies.

chrisamiller13y ago

attractivechaos13y ago

The problem with bioinformatics is not "prematured optimization", but rather no optimization at all.

Inufu13y ago

Out of curiosity, what other computationally difficult problems are there?

I'm very interested in bioinformatics, but sadly don't know as much about the field as I'd like.

drosophila13y ago

zmmmmm13y ago

1 more reply

ginger_beer13y ago

If you're a CS person who's interested or considering a move into bioinfo, I wrote a blog post about it recently: http://www.joewandy.com/2013/01/getting-into-bioinformatics....

pedrobeltrao13y ago

jtmcmc13y ago

any type of network reconstruction - gene - gene / protein - protein , gene - protein , interaction network are all very challenging and important computational problems in biology

MattRogish13y ago· 8 in thread

The tools are written by (in my experience) very smart bioinformaticians who aren't taught much computer science in school (you get a smattering, but mostly it's biology, math, chemistry, etc.). Ex:

http://catalog.njit.edu/undergraduate/programs/bioinformatic...

http://www.bme.ucsc.edu/bioinformatics/curriculum#LowerDivis...

http://advanced.jhu.edu/academic/biotechnology/ms-in-bioinfo...

Then, as he said, they get grants to spend millions of dollars on giant clusters of computers to manage the data that is stored and queried in a really inefficient way.

Except, I realized that most folks using this software are in non-profits, research labs, and universities, so - no, there in fact is not a killing to be made. No one would buy it.

gabeiscoding13y ago

I live in this field, as a computer scientist learning the biology, and trying to make a living with a bootstrapped company.

I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

http://blog.goldenhelix.com/?p=1534

We released a free genome browser (visualization of NGS data and public annotations) that reflects this:

http://www.goldenhelix.com/GenomeBrowse/

carbocation13y ago

> I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

Somehow, the conclusion that you draw is that the GATK should not be used in a clinical pipeline. This is hugely problematic:

1 more reply

specialist13y ago

In my experience, this applies to accounting software, sensor data, computer-aided design, print manufacturing, healthcare, etc.

I imagine there's phases of maturity, something akin to CMM/SEI. Eventually there's enough people with a foot on both sides to bridge the gap.

It just takes time.

elangoc13y ago

Maybe it's still in the early going, but I do see how it's going to be real difficult making a living doing this. OTOH, companies like CLC Bio seem like they're doing well for themselves...

ank28613y ago

So I disagree with you on your very last sentence (agree with the rest)

gabeiscoding13y ago

Ahh the efficiency argument.

2 more replies

MattRogish13y ago

cmccabe13y ago

There are companies out there that offer commercial sequencing software. DNANexus is one.

As for whether there's "a killing to be made", it's kind of unclear so far.

zerohp13y ago· 7 in thread

I spent five years working in bioinformatics, and this is exactly the attitude of both the researchers and the other developers on the projects I worked on. It was very frustrating.

michaelhoffman13y ago

Hi, I'm a bioinformatics researcher. Apparently I work for this guy's ex(?)-employer although I have never heard of him before.

epistasis13y ago

>There can be a big opportunity cost in trying to rework a workflow so that it is more efficient and then test it thoroughly ensure correctness.

Hi, I recognize your name as a legit bioinformatician, am a huge fan of the lab that you're currently in, and others should listen to you.

5 more replies

east2west13y ago

4 more replies

hnwh13y ago

How can you leverage all of us really good programmers with tons of time, who are dying to work on something "important" and meaningful?

1 more reply

leoh13y ago

zerohp13y ago

I have enough experience to know if this is true or not. Many times it was faster to buy more machine, but often it was not. We already had 10000 cores.

1 more reply

latj13y ago

FreeKill13y ago· 7 in thread

If you really want to get a feel for how deluted the Bioinformatics community is, look for a job in the field as an outsider. It's not uncommon to see requirements like:

"Must be an expert in 18 technologies" "Must have a PHD in Computer Science or Molecular Biology" "Must have 12 years experience and post doctoral training" "Pay: $30,000"

tom_b13y ago

I have a number of cynical and embarrassing opinions about this situation.

tensor13y ago

The whiteboard is only useful as an aid in explaining an algorithm. If a candidate can do that without the whiteboard, even better.

1 more reply

michaelhoffman13y ago

I assume these are separate requirements. I have not seen any doctoral-level positions advertised for a salary of $30,000. The minimum NIH salary for postdoctoral trainees is more than that.

FreeKill13y ago

2 more replies

leont13y ago

Check the Sanger Institute's job page (https://jobs.sanger.ac.uk/wd/plsql/wd_portal.show_page?p_web...). They offer «£29,750 to £37,525» for a "senior bioinformatician", for example.

Devilboy13y ago

He's exaggerating about the 30k of course but it's true that these positions don't pay very well compared to what experienced programmers can get elsewhere.

Lost_BiomedE13y ago

lemming13y ago· 7 in thread

chrisamiller13y ago

As someone in the field, let me assure you: This article does not accurately reflect the state of the field.

lemming13y ago

Thanks for the reply. I wasn't basing this just on the article, there seem to be a fair number of comments here supporting a less-extreme version of what he's saying.

zmmmmm13y ago

lemming13y ago

Interesting, thanks for the point of view. If you'd be interested in getting a beer or a coffee at any point, let me know - contact details in my profile.

jerven13y ago

I am also in the field, and IMHO we are starting to get away from the worst excesses code quality wise i.e. things are getting better.

6 years ago using CVS or something like that was novel. Now not using GIT is. Big improvement!

Problems are still interesting and challenging.

jstevens8513y ago

Could you add an email to your profile? I'd like to email you regarding Masters courses at UniMelb.

lemming13y ago

Sure, done.

jmspring13y ago· 6 in thread

Sounds like a fed up academic with a stick up his backside.

Sh*tty data? Comes from the community. If the data and algorithms are so poor, and the author so superior, he should have been able to improve the circumstances.

This whole screed reads like an entitled individual who entered a profession, didn't get the glory, oh and yeah, academia doesn't pay well.

In the realm of bioinformatics, lets ignore the work done on the human genome and the like.

gwern13y ago

> Sh*tty data? Comes from the community. If the data and algorithms are so poor, and the author so superior, he should have been able to improve the circumstances.

michaelhoffman13y ago

He's not getting tenure because he doesn't have a PhD. According to LinkedIn, he has a master's degree awarded after four years of study [1], which often indicates someone who did not complete a PhD.

[1] http://www.linkedin.com/pub/frederick-ross/13/81a/47

2 more replies

saraid21613y ago

> In the realm of bioinformatics, lets ignore the work done on the human genome and the like.

He discusses this specifically in the rant. Are you saying he's wrong?

singingfish13y ago

dmak13y ago

Perhaps the algorithms aren't within his grasps. They could very well be paying for an out-of-the-box solution.

ucee05413y ago

he should have been able to improve the circumstances

Was anyone asking him to? Was anyone paying him to? No? Then it's an uphill battle and also not his responsibility. Leaving is saner.

mscarborough13y ago· 6 in thread

>> I’m leaving bioinformatics to go work at a software company with more technically ept people and for a lot more money.

More money, good on you. Starting off your critique of your former colleagues with "technically ept people'...not going to get a lot of sympathy for the correctness of your work.

aheilbut13y ago

Everyone is jumping on that, but (while I had to look it up too) 'ept' actually is a real word:

from the OED:

ept, adj. Pronunciation: /ɛpt/ Etymology: Back-formation < inept adj.

  Used as a deliberate antonym of ‘inept’: adroit, appropriate, effective.

1938 E. B. White Let. Oct. (1976) 183, I am much obliged..to you for your warm, courteous, and ept treatment of a rather weak, skinny subject.

1966 Time 30 Sept. 7/1 With the exception of one or two semantic twisters, I think it is a first-rate job—definitely ept, ane and ert.

1976 N.Y. Times Mag. 6 June 15 The obvious answer is summed up by a White House official's sardonic crack: ‘Politically, we're not very ept.’

SilasX13y ago

We have the term "adept" though, which is actually in common usage and fits the intended meaning here...

christiangenco13y ago

That was…surprisingly thorough.

1 more reply

Paradigma1113y ago

Isn't it more likely that he just mispelled "apt".

droithomme13y ago

Well, ept is obviously a back-formation and a clever and amusing one.

Etymology is straight from Latin: ineptus, which is prefix in- plus aptus (fitting or suitable). Interestingly there's also inapt which is quite similar.

edit: aheilbut's research on this is much more thorough.

MartinCron13y ago

I used a similar back figuring when describing a co-worker who was in the wrong job... "He's not inept, he is inapt"

kevinalexbrown13y ago· 3 in thread

When someone finds fault with the way a field conducts itself, I would implore them to constructively influence that field. You might be surprised how many are actually sympathetic to your concerns.

[0] http://www.runmycode.org/data/MetaSite/upload/nature10836.pd...

chewxy13y ago

Off topic, but since you mentioned jgrahamc's article in Nature, interestingly, this was what I read last night on Simply Statistics: http://simplystatistics.org/2013/01/23/statisticians-and-com...

Although, as I mentioned in an earlier comment, Fred seems to be in a prime position to disrupt the bioinformatics field since he seems to know all the problems that afflict it

troymc13y ago

Regarding "ML is the new sexy thing," check out these graphs:

http://books.google.com/ngrams/graph?content=machine+learnin...

http://www.google.com/trends/explore#q=machine%20learning

1 more reply

jflatow13y ago

chewxy13y ago· 3 in thread

Spelling error: 'technically apt', not 'ept'.

"Ept" means effective. As in "inept"

I don't understand this part:

In fact this entire article seems to be a rant on why bioinformatics as a field is rotting. But instead of ranting, surely something can be done about it?

Shouldn't we as hackers see this as an opportunity to revolutionize the field?

dmak13y ago

It all begins with a rant.

saraid21613y ago

chewxy13y ago

Then again, I am in no position to judge what Fred should or should not do

1 more reply

ChristianMarks13y ago· 2 in thread

drosophila13y ago

ChristianMarks13y ago

adambratt13y ago· 2 in thread

Really makes me want to learn more about molecular biology.

Any solid factual resources besides the references mentioned in this justified rant?

BioGeek13y ago

Biostars.org is a stackexchange-like site for bioinformaticians.

See there for answers to your question, eg:

* Best resources to learn molecular biology for a computer scientist. [1]

* What are the best bioinformatics course materials and videos (available online)? [2]

[1] http://www.biostars.org/p/3066/

[2] http://www.biostars.org/p/10766/

gabeiscoding13y ago

If you're interested in Next Generation Sequencing (the new "technology" OP referred to to replace microarrays), I wrote a 3-part series on my blog:

"A Hitchhikers Guide to Next Generation Sequencing"

Part1: http://blog.goldenhelix.com/?p=423

Part2: http://blog.goldenhelix.com/?p=490

Part3: http://blog.goldenhelix.com/?p=510

aheilbut13y ago· 1 in thread

I sympathize with the author, but this piece fails because many of the specific criticisms are off-base, and he's not trying to be at all constructive.

The claim that sequencing isn't important is ridiculous. It's the scaffold to which all of biological research can be attached.

However:

east2west13y ago

No kidding about reinventing wheels. I once saw a manuscript based entirely on dot-product as 1-D least-square. I don't know what happened to it, but one reviewer called it a seminal event in GWAS.

Bioinformatics is hard, but too many careerists take advantage of difficulties and uncertainty to publish as many papers as they can get away with.

stiff13y ago· 1 in thread

sampo13y ago

My friend said: Bioinformatics means that computer scientists – who don't know mathematics and don't know biology – are trying to do mathematical biology.

vsbuffalo13y ago

Minor quibble: genome assembly is definitely still an open problem that's computationally difficult. So is robust high dimension inference, but that falls more under statistics.

[0] http://www.johndcook.com/blog/2010/10/19/buggy-simulation-co...

chris_wot13y ago

I always feel awkward reading these rants, mainly because I've burned my bridges before and it really wasn't worth it. Even if it is true, it's better to leave it and move on.

If you really feel strongly about something, write it dispassionately (normally some time after the event) and treat it like a dissertation, backed with case studies and citations.

jostmey13y ago

CrLf13y ago

"I’m leaving bioinformatics to go work at a software company [...]"

"[bioinformatics] software is written to be inefficient, to use memory poorly, and the cry goes up for bigger, faster machines! [...]"

Well, the author is heading for a very bitter surprise...

kylemaxwell13y ago

skittles13y ago

singingfish13y ago

sciencerobot13y ago

The fact of the matter is that through high-throughput sequencing, microarrays, what have you, generation of biologically-meaningful results is possible.

BioGeek13y ago

Also see the discussion at the bioinformatics subreddit: http://www.reddit.com/r/bioinformatics/comments/179e9k/a_far...

Agathos13y ago

Interesting to read since I made the same career move last year. I agree with about half of it but don't see a lot of value or useful advice here.

sbassi13y ago

neilk13y ago

Maybe overblown, but it echoes complaints I've heard from other bioinformatics people.

Surely this means there's a goldmine waiting there for someone to produce a non-broken toolchain for bioinformatics?

Or is it even possible to produce standard tools? Maybe all the labs are too bespoke?

jerryhuang10013y ago

dderiso13y ago

Some things are going to suck in academia, as this guy points out. But, its a necessary step and todays progress is almost always going to be tomorrows shit. So quit bitching.

Biologists are almost never good coders, if they can code at all. But thats not what they do, they signed up for pipettes, not python.

SilasX13y ago

Surely they can't get that far without having some kind of sensible method?

dinkumthinkum13y ago

ascotan13y ago

Having working in the bioinformatics industry as an SE for 9 years I can both agree and disagree.

caseybergman13y ago

ejain13y ago

julienchastang13y ago

iharris13y ago

Just another data point for someone contemplating a career in BINF, although some purists might say that my work did not really fall under the same category.

mvanveen13y ago

Say for the purposes of argument that this thesis were true. What is there (if anything) to be done about it? I ask as a naive interested party with a CS background.

ElliotH13y ago

jmgao13y ago

http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-a...

pjotrp13y ago

To be honest, in industry it is not much different. These days, coders are carpenters. If you really want to be a diva, learn to sing instead.

thornad13y ago

datz13y ago

Come work with me in my genomic interpretation company. Fun application building, no data mess, big money!

retrogradeorbit13y ago

Someone's got a bad case of God Complex.

helloamar13y ago

i'm not into bio, but read articles on latest development. my sister also took bioinformatics but the scope in India is very less it seems.

have you checked out synthetic biology? will it be easy to understand when you have a degree in bioinformatics?

j / k navigate · click thread line to collapse