That said, it's possible find work that's respected and pays well. Most of that kind of work is happening in the context of startups or freelancing. My favorite example of this is Robert Edgar: he's a freelance computational biologist with over 100k citations who has made a living for the past 20 years by selling licenses to his bioinformatics software (https://scholar.google.com/citations?user=RzVMRc0AAAAJ).
To find those kinds of jobs, I'd try YC's Work at a Startup, Flagship Pioneering's portfolio companies, and emailing founders of companies that have a bioinformatics component (my email is in my profile!).
I think the issues with the field are because it's a new and growing space. We do need better tooling, respect for engineering, and established best practices, but that seems to have been the case in the past for other domains that moved from research to industry – including software engineering itself.
As an outside observer to this area, something doesn't add up. It sounds like software tooling is desperately needed to advance the entire field across the board, yet it seems that few startups or founders are attempting to tackle this problem or, if they exist, aren't having much of an impact (perhaps, yet?).
One would imagine that all of this inefficiency, suffering and bottlenecking of incredible therapies to cure diseases and advance human knowledge would be a siren call for capital allocators to unlock value by solving this pain point -- but here we are, in some cases still 20 years and counting.
I can buy the argument that FAANGs have had amazing compensation packages over the past 2 decades, but this still doesn't address the reason why nobody else has bothered or been able to to "disrupt" (yes, air quotes) the industry in this regard and harvest such seemingly low-hanging fruit.
I see a few comments talking about the PI and grant-funding model -- but if the promised value was sufficiently large then I find it hard to believe that this wouldn't have been a competitive candidate alongside other recent buzzword-laden investment trends such blockchain & AI that pulled down so much VC funding over the past decade.
Clearly, I'm missing a piece of the value puzzle as to why founders and startups are few and far between to specifically address the dire straits that biological software engineering (computational biology, bioinformatics, systems biology, etc.) finds itself in.
This is hardly the only place in society where X is desperately needed, but people don't want to pay for it, and continue to suffer through the underprovision of X.
Here, software engineering is simply not viewed as prestigious or important, or often, even as valuable. The science is viewed as valuable. A lack of bugs or rigorous software engineering practices... nobody cares.
Some of it is a phd is a prestige competition, and dirty engineers being comped on par with people who spent (cough wasted cough) a decade or more of their life in college/post-docs just won't do.
And a piece of it is the scale of the investment needed. Imagine a couple million LOCs with only manual testing. Your two weeks of writing tests is a tiny drop in the bucket. Retrofitting reasonable software dev standards on these projects is enormously expensive.
Finally, there's an inescapable volume issue. Suppose I build a hot new confocal microscope and I sell 300 of them for low hundreds of thousands each. I have 2 teams of devs ($2M/year/team) for 2 years on analysis code, for a $8m investment. That's $27k/machine. That's real tough math to make work. Whereas Google pays gmail engineers really well, in part because they spread those costs over a billion users.
Why so few tooling companies - is there actually a market for good software in science? For there to be such a market most scientists would have to care about the correctness of their results, and care enough to spend grant money on improvements. They all claim to care, but observation of actual working practices points to the opposite too much of the time (of course there are some good apples!).
In 2020 I got interested in research about COVID, so over the next couple of years I read a lot of papers and source code coming out of the health world. I also talked to some scientists and a coder who worked alongside scientists. He'd worked on malaria research, before deciding to change field because it was so corrupt. He also told me about an attempt to recruit a coder who'd worked on climate models who turned out to be quitting science entirely, for the same reason. The same anti-patterns would crop up repeatedly:
- Programs would turn out to contain serious bugs that totally altered their output when fixed, but it would be ignored because nobody wants to retract papers. Instead scientists would lie or BS about the nature of the errors e.g. claiming huge result changes were actually small and irrelevant.
- Validation would be often non-existent or based on circular reasoning. As a consequence there are either no tests or the tests are meaningless.
- Code is often write-once, run-once. Journals happily accept papers that propose an entirely ad-hoc and situation specific hypothesis that doesn't generalize at all, so very similar code is constantly being written then thrown away by hundreds of different isolated and competing groups.
These issues will sooner or later cause honest programmers to doubt their role. What's the point in fixing bugs if the system doesn't care about incorrect results? How do you know your refactoring was correct if there are no unit tests and nobody can even tell you how to write them? How do you get people to use tools with better error checking if the only thing users care about is convenience of development? How do you create widely adopted abstractions beyond trivial data wrangling if the scientists are effectively being paid by LOC written?
The validation issue is especially neuralgic. Scientists will check if a program they wrote works by simply eyeballing the output and deciding that it looks right. How do they know it looks right? Based on their expertise; you wouldn't understand, it's far too complicated for a non-scientist. Where does that expertise come from? By reading papers with graphs in them. Where do those graphs come from? More unvalidated programs. Missing in a disturbing number of cases - real world data, or acceptance that real data takes precedence over predicted data. Example from [1]: "we believe in checking models against each other, as it's the best way to understand which models work best in what circumstances". Another [2]: "There is agreement in the literature that comparing the results of different models provides important evidence of validity and increases model credibility".
There are a bunch of people in this thread saying things like, oh, I'd love to help humanity but don't want to take the pay cut. To anyone thinking of going into science I'd strongly suggest you start by taking a few days to download papers from the lab you're thinking of joining and carefully checking them for mistakes, logical inconsistencies, absurd assumptions or assertions etc. Check the citations, ensure they actually support the claim being made. That sort of thing. If they have code on github go read it. Otherwise you might end up taking a huge pay cut only to discover that the lab or even whole field you've joined has simply become a self-reinforcing exercise in grant application, in which the software exists mostly for show.
On that note :) I'm starting a forward-looking research lab at UofT to advance massive-scale (think petabytes) genetic analyses and am looking to find the right few individuals who have a similar vision. It's difficult to find passionate engineers with a solid CS and HPC background who are willing to meet halfway and work _together_ with biologists in getting the analysis right. Robert does this _very_ well, and that's why we recently co-wrote a landmark Nature paper: https://www.nature.com/articles/s41586-021-04332-2.
Job post: https://jobrxiv.org/job/university-of-toronto-27778-full-sta...
For example, the Broad Institute is super interesting, but having applied there several times, they are esoteric, to say the least, in their hiring. They pay well below market, and their process is opaque and slow and sometimes downright non-communicative. They are also not really open to remote work, so you gotta move there and commute to the heart of Cambridge. Budgets are set by folks maybe a couple years out of a PhD program, who will also make technical decisions in terms of the software design (the latter an assumption given my experience in similar places).
These organizations are also pretty traditional in their selection of stacks. Good luck trying to use a functional-first language, aside from maybe Scala (usually lots of Java stacks), and be prepared to write lots of Python, the only language that exists to many scientists. I once saw a Python signature (function name and arguments) spill over 10-20 lines, in a file over 10,000 lines long. They had given up on another software stack because “it wasn’t working for them”.
This is all painting with broad strokes, of course. But I think scientific organizations that would embrace software as a major component of their technological and scientific development would do well. There’s a lot of opportunity.
Good luck trying to use a functional-first language at any company (be in bioinformatics or otherwise).
and the coming years will be interesting, rust is placing a lot of functional bits on the map, just like closures were an obscure thing 10 years ago, there might be a rise in abstraction in the mainstream
Quote I liked (can't find attribution; maybe Alan Perlis?):
"If your function has 10 arguments, you're missing some."
While they've moved away from it in the last few years, the Broad Institute had a huge investment in Scala. It's been in use there since at least 2010 and I believe longer. The primary software department was almost entirely Scala based for several years. That same department had pockets of Clojure as well.
I’m in the “bunch of software people together” department so it’s not as insular or PI driven as working in a lab.
I still mostly like the role but it has become more generic over the years as the department acquiesced to the working ways & programming languages of outside private funders.
2. As a SWE, how deep into biology/genetics concepts have you had to go during your tenure?
More broadly...shortages like this aren't because SWEs just love ad-tracking and hate health improvements. People need to be willing to pay for these services (case in point; the jobs link for AWS has 3 links which appear unrelated to genomics, for Microsoft there are 2 for interns, and for Google its empty).
There are a lot of opportunities in biotech for SWEs, and many firms (though not all) really do respect the power of software. Worth looking around if you're interested in the area.
FWIW, I really loved Broad the people, my direct line manager and co-workers. The management was horrible and the management at DSP (not the line folks/managers) were the worst.
When I joined, it was running really like an academic center. Like literally in my lab, if I wanted to go into the lab and pipet and do library prep, the wet lab scientist would teach me and vice versa. It was lit. a place where anybody could pivot their career to anything. We worked on NIAID/NIH grants and went to conferences even as SWE's and I felt we were doin' important research - not just pipeline monkeys but actually performed important analysis like RNASeq differential analysis, ChIPSeq peak calling, metagenomics etc. on publications along with PhD scientists even if I didn't have the academic credentials. Groups within the Broad was running courses for Software Engineers to learn Biology... and you could literally take off middle of day to go across the street to Stata Center to attend lectures on ML or audit Comp Bio at MIT. Nobody would bat an eye. 75% of my group got a Masters degree on the job where we spent more time some months on classes than actual work.
The culture somewhere shifted around 2018-2019... where they brought in new management to run DSP (Data Science Platform where most SWE's likely end up). The DSP management (not the chief guy) but lieutenants ran the "tech playbook"... get PM, Scrum/Agile coaches in; make software and comp biologists line workers.A lot of my fav. people either left or got pushed out. A lot of intuitional knowledge about sequencing and biology got lost, self-driven people left and line workers to work on Portal web development and Data pipeline management came in. To the point where I presented once to the software engineers of DSP and nobody in the room even knew the basic's like what is a long or short reads is. I left soon afterwards to a place where I wouldn't be silo'd. I wouldn't recommend the Broad to anybody now... unless you're working for an academic group. Avoid platform groups (DSP mostly; other platforms are still good) if you want to learn & grow.
Like others have pointed out, this really makes the engineer's end of the bargain sound like janitorial work. There's no lack of fields where researchers and engineers both sit at the table from the beginning to pick which projects to pursue and how to implement them.
I don't think you should interpret it that way. Another take would be that its like collaborating with a domain expert outside your specialization.
Important is that your potential impact as an engineer can grow as you become more knowledgeable in the relevant bio. Most of the scientists I've worked with were happy to teach background (and some were just exceptional, fun times if you also found the field interesting as I did!). Obviously some allowance must be made for differences in culture from org to org, and that likely accounts to some of the disappointed voices - but I'm not convinced this is endemic to the field as opposed to organization specific. Just like with an opportunity with any particular company, do your research.
Incidentally, working on a well defined engineering+optimization problem, if you are lucky enough to bump into one, is just candy for lots of engineering types. Ok quick & simple one: a scientist I worked with was doing some analysis that involved intersecting piles of genomic intervals with each other, which was taking many hours for a single run - super painful to tweak and re-execute. Our team showed them how to use interval-trees and made these available integrated in our internal tools, and the problem transformed into ~10 min execution runs. See, a wee a bit of comp-sci where suddenly you're the domain expert. And appropriately appreciated!
When I said that software engineers add in the speed and reliability, I didn't mean they _only_ add in the speed and reliability: just that these two tenants of good software engineering where accounted for in this "correct" way of doing things (as opposed to the state of most genomics software that I described above).
However, I can see how my phrasing can give the wrong impression about the contributions an engineer makes when the biologist and engineer sit down to do create the real thing together. In a positive environment, both sides (biologists and software engineers) share enough information with one another that the either can make contributions to the scientific/software engineering domain.
Which if there's no standard for field, and working outside of a given field, makes writing grant(s) without paring up with someone who can develop field standards to be included in grant necessary. Hard to find/compete for scarce applicants using limited resources.
aka startups vs. big company funding for pure research lab (bell labs, xero parc, etc)
How many of us have heard from some guy who has said, essentially, "I have an idea for the next <fill in the blank with whatever is hot>, and I've already sketched out a prototype. There's just the small matter of programming and we'll make $LARGE_SUM?
Imagine this happening in the business world. A partnership between SMEs and software engineers. Oh, we do this all the time, that's why software engineers get paid well: we turn ideas into working code. Anyone ever heard of a product manager "banging out a prototype" and then handing if off to the software engineers to rewrite?
The more I re-read the passage from the article, the worse it sounds.
But the role of the software engineer after that is invaluable in making that idea accessible and reproducible.
Academia? Yeah they're going to be one of the last to realize, PIs don't want to cede any power in their little fiefdom. Very familiar with the dynamics there.
So, unless research is ground breaking / exploritory across disciplines, supporting disciplines tend to be extremely limited by PI research interest.
aka (wording sanitized a bit) tends to come across as being tight wad / ham fisted
Treating a position with limited available applicants as if there were to many available applicants is always a receipt for issues.
I think that is already accepted as good practice, and the way most people in the field work, which is part of the reason why the field is in this shoddy state right now. Because in reality, most of the time those engineers don't exist and it will never advance to the second stage, but will still be used regardless. And even if you manage to find an engineer for your team, the same problem exists in many layers down your stack.
As with most other kinds of software, the biologists should be treated as customers (or trained up to be skilled-enough engineers), as it is done in other disciplines. To create good accounting software you also wouldn't propose to have the accountant write the initial version of the software, would you?
> Many of the projects that are critical to the foundation of genomics are reaching or have eclipsed the ten-year mark. How much longer can we expect these individuals to single-handedly maintain these code bases?
What you propose sounds more like "hey, be the next idiot that commits to maintaining critical software for nothing", rather than any systemic change. The ugly secret of bioinformatics is the same one as in broader tech: Most of it runs on the backs of unpaid OSS maintainers (in this case a handful of motivated PIs that carve out some of their time for that).
If you want to have good software in the sciences, you first have to solve the OSS funding problem.
PS: the `user-select: none;` on your page is really annoying
Accounting is a bit different, because it has already been invented. There are standards and best practices for it. In bioinformatics, writing software is often a research activity. You write software to determine what the software should do, and then you adjust your ideas and rewrite it. The person writing the first version(s) of the software is a researcher – at least in practice if not by job title.
For many specific problem sets in the natural science informatics disciplines, you can just stay up-to-date on ML trends and release a new paper that applies them every few years, in an almost automatable way.
I fully agree that the software used is really bad in general, but what is worse is the level of IT literacy among the PhD’s and post-docs from the biology side. (Also statistics, I guess a lot of p-hacking is the result of authors simply being clueless…)
After finishing my thesis, I was offered to stay and work at the lab. After thinking about it, and accepting, I was told that funding wasn’t secured yet, but that it should come ”any day now”…
Thanks, but no thanks.
Anyway, I fully see the need for professional software engineers in this field, but job security and even job availability (aside from the low salaries) in academia is abysmal, so I don’t think the current situation will change any time soon.
When everything is a mess of ad hoc Perl, Python and R scripts to solve unique one-off problems, you might well find that there isn't a sufficiently common subset of functionality that people are prepared to pay money for. That is, while the need may be there, the business case may not be. It might be that most of the field are quite content with the status quo.
It's easier and cheaper to get some poorly-trained PhD students to wrangle badly-written and poorly-maintainable scripts than it is to pay a company to provide a robust and well-written solution instead. The "indentured labour" also distorts the supporting ecosystem. [I say this after having done a PhD in biomedical science.]
I remember one of my colleagues asking me to help him getting some special software from a particular group working [for DNA methylation analysis]. They wanted paying $10K for it IIRC. It was a complete mess, wouldn't work, had not documentation, and I didn't trust it was genuinely functional it was just such a state. For a one-off, maybe $10K was worth it, but if you only have 2-3 customers worldwide who will pay, it's not a viable business if the product works perfectly, let alone if it's a fragile disaster that barely works at all.
You need to feel out the particular person you'll be reporting to on how well they personally respect and understand the role, and also whether they'll have clout/funding and have your back if the org turns out to be rough (think AMZN). And also try to feel out respect within the organization, and some of the people/teams with whom you'll be collaborating.
You also need to check compensation, so you don't wind up a low-paid person who later discovers they're competing for local house offers with others in the org who are getting big-bucks TC (plus consulting on the side).
You also probably have to be OK with never being the star (like you hypothetically could someday be in a software company). Supporting actors should still get respect and get paid.
Find the right science situation, and you might have much more positive impact on the world than you could have in a software company, while also being happy and comfortable.
Some more quick of-the-cuff comments about this (sorry for run-ons, but I need to get back to my weekend)...
* RESPECT -- Whether or not the organization is university-affiliated, a lot of the researchers and administrators might have only worked in academia-like environments before. Academia is very hierarchical, software engineering might be considered commodity technician or support staff, and the high-status people almost certainly don't understand your discipline, though they might think they do. (They often think software is relatively easy grunt work, and that software people just have oversized egos, which has some truth to it, but not that much.)
(Some real-life instances of this I've heard of include: someone with no understanding overriding software engineering technical decisions, because a colleague from their academic caste made an offhand comment, and they assume an academic who hasn't even looked at the system knows more than an experienced practitioner developing it; not wanting to include people who made key software contributions as coauthor on a paper for a software system, but making sure professors who had near-zero involvement were included; scientists openly speaking of the software people as having commodity interchangeable skillsets, in way they'd never speak about peers in their domain; getting an unsalvageable monstrosity of pasted-together incompatible frameworks and Stack Overflow posts done by a summer intern, dumped on software engineer to "clean up" or "extend", and being unable to convince that this is orders of magnitude harder to fix than to just make a viable system in the same time the intern took; in an academic environment, a grad student being higher status than key software people, and bossing them around with bad decisions, while treating their own obligations like homework they were trying to sneak past a grader rather than as a system that has to actually work.)
* COMPENSATION -- Related to the above. If you're very experienced and marketable in tech, and would be making key enabling contributions, are you getting paid like it?
(The most recent life sciences software engineering opportunity I talked with, with a high-profile organization, they needed FAANG-like Staff/Principal experience in multiple areas, all-in-one person, for key bespoke computational infrastructure on which a lot was riding. When we got to salary, it was capped at less than a new grads were getting offered elsewhere, and despite being in a top HCOLA city. The recruiter half-heartedly argued about it being for the science, etc. I said, if they're thinking of this as an academic non-profit, that would be OK, so long as everyone there is making this level of money. But that wasn't the case: the science domain people were considered the valuable assets, making good money, and software was seen as more a commodity support skill by whomever set the pay grade. Maybe within a decade that will agree with the market, everyone will decide that someone who can learn organic chemistry should get paid more than someone who doesn't seem to do much more than fingerpaint in a Web framework builder and type nonsense in Jira, :) and maybe then most software people will be thankful for any job at all, but not yet.)
(I did actually look at a science company with a strong software tech company influence. But, though they claimed to be rethinking how the tech company did things, they seemed to carbon-copy the single most obvious bad side of that company. Talking with colleagues after I withdrew my application, the gossip was that they were getting lots of software people who'd burnt out on the tech company. So I guess maybe the rethinking was on what had been bothering those people, who were already at the tech company, and so who weren't entirely representative of the talent pool that included people for whom the tech company had showstoppers.)
There is next to zero demand for tool development internally. I do it on the side of "normal" IT data management because I love high performance computing, algorithms, and multithreaded hackery. But even at my large, well-funded institution, there isn't a specific role where that is all that you do by design.
I do suck at marketing - meaning, despite having some success with big improvements in research tools that folks have definitely appreciated, no one comes to me asking for help with better engineering of genomic applications. Partly that is due to many researchers maybe only know R, so they will default to whatever packages are already available in Bioconductor, install those, and throw the resulting mash-up for their current research effort onto the compute cluster and simply wait for hours or days for the jobs to finish.
PIs are often insulated from software engineering problems too - if work is completed before the next bi-weekly meeting and update session, well, it must be ok.
DNA base units not viewed as base 4 binary number system that can be transformed into an abstract software language, where can select abstraction level of choice to use. Much like musical notation not viewed as numeric system.
Although, most software engineers don't view systems as numerical language development, too.
- The compensation absolutely do not match the workload and education required. - The sheer number of disreputable PIs and their unrealistic goals for software. - The data is likely questionable and often underpowered. - Institutional politics. everywhere. - Marketing ("Curing Cancer"). The role is actually just juggling various bioinformatics file formats.
Your other points are spot on. This one I want to address specifically. The file formats. Academics love their incredibly over-engineered file formats. MARC. SGML. DICOM. HL7. RDF. Those are just the ones I know. Universally, they try to cover every corner case that anyone could ever imagine. Academics absolutely love their ontologies. Just implementing one of them is a nightmare. Going from one to another is an exercise in the philosophies of ontologies.
The downside is it's a giant hellscape of unstructured, poorly specified formats where data types are barely specified at all or if they are most of the schema is published on some rambling blog post by some rando scientist. You will spend most of your time understanding it by empirical reverse engineering of the data that you are trying to deal with.
DICOM is for radiology.
RDF and SGML, well, they're from the same era as XML, so yeah.
Good thing there are lots of competing implementations! It would be a shame if these files were actually portable.
I need an advanced degree for that?
Nothing like putting that boilerplate pablum on research grant proposals. Either that or something about green energy. Some PIs just want to play with ligands, man.
The person that runs a research lab, which at a university is usually a (tenured) Professor.
https://news.ycombinator.com/item?id=31577376
https://www.nature.com/articles/d41586-022-01516-2
> "Fundamentally, RSEs build software to support scientific research. They generally don’t have research questions of their own — they develop the computer tools to help other people to do cool things."
That said: a lot of the comments are spot on. You're working in a field where the hard scientists and business people rule and you're a helper. Maybe they're grateful for your help OR maybe they regard you as an overpaid lab assistant. After all, they have PhD's and postdocs, and you don't.
I've never actually worked in that field. I'd guess that it might be very satisfying, despite the low pay. Or not.
It was my first fulltime job, and by far the most chill. People were great. The PI was laid back, the whole lab went out for beers every now and then - and not because of a mandatory startup-style 'bonding' event. We genuinely enjoyed each others company and hung out outside of work. I never had that in any other job, which were/are all commercial operations.
The vibe and the power structure felt very different. More level. There werent any purely managerial roles, everyone was doing at least a bit of 'science'. And even junior ICs like me got to coach undergrads every now and then. Most of the operational budget comes from grants, on which you have to deliver. The pay is not amazing, so most employees really are in it for the science.
Or I was still young and naive and was lucky all of the two layers of management were all nice people.
Ultimately I left, as the grant money coudnt keep up with offers I was getting.
It is still the job I am most proud of. I love talking about it, and it really sucks that even a well funded lab cant really afford market engineering rates.
Its no different in finance, healthcare, genomics etc. I'd love to work in a setting where I'm paired with an SME product manager in a domain I have no clue about and they respect my work and I respect theirs and we are partners.
This is one of the biggest factors that made software/internet companies explode. They respected people who build software. They didn't need to. A bunch of MBAs could have easily just decided that the best way to run the company was to treat the people building the product as a cost center. Many did. I think that's probably one of the reason for the lack of innovation and down fall in many old tech companies like HP/IBM.
The ones that treated SWEs properly and valued them accordingly, did very well.
Your comment reminds me to be thankful that at many software companies engineering, product, and design do respect each other as equal partners. I totally agree that to do otherwise is business suicide.
But, YMMV.
Everything you don't understand looks complicated from the outside.
As to the work environment, it seems to be extremely varied depending on the lab and team your on. I came from a number of years doing web development in marketing and finance before joining an R1 university research lab, and in many ways the day-to-day is quite similar in both fields. You are not the 'go-to' person for most things, but with that said, even as an individual contributor I feel my voice is heard on technical decisions where appropriate. As for pay, it's the biggest aspect that will make me leave at some point. If you do not have a PhD, or even a degree in my case, you can't expect to get paid a lot. As to the speculation on the satisfaction of the work, it is indeed deeply satisfying!
I got to have a conversation with one of the hero donors that gave a kidney biopsy after a life-saving transplant. It's hard to overstate just how impactful your work feels when talking to someone like that. Even as a small cog in the larger machine (our lab is around 50 strong with many people being at the top of their sub-fields), the end results of the effort will be massive improvements in individuals quality of life, this alone makes it quite easy to get out of bed in the morning.
This definitely was the culture when I started working in the field 6 years ago. However, the culture has shifted (at least where I work) to where biologists and engineers are equal partners that work together on solving these problems. For those organizations that are not this way, I think they’re going to have to change if they want to innovate.
These days more biotech companies are computationally/software focused. They understand that to pull in strong talent they're not operating in the same academic science world.
This isn't just genomics, by the way. Scientific computing folks are very similar.
Everyone who says you're the hired help and treated about as well as a secretary that the organization dislikes is dead on. At best, you're viewed as an overpaid cost center.
Which is sad, because I'd love to work in these areas... but I'm not giving up 66% to 75% of my income to as charity to private corporations.
Tangential, but what are the chemistry prereqs to grasp this book?
It's essentially an upper-level undergraduate textbook.
I would argue that, to understand the book, you specifically don't need to know electrochemistry, organic chemistry, analytical chemistry, organometallics, spectroscopy, or even physical chemistry.
Doing fundamental reseach is a taller order. But lots of software, tools, pipelines etc need maintainers, optimizations...
Despite my interest, I’ve found that landing a job in this field at my desired compensation level is very difficult especially if you not have the ”correct” academic background. Who does a double degree for computer science and forensic genealogy? I’m sure some people but for $75k/yr you’d think the companies need to at least adjust their expectations.
I looked. There are zero full-time, remote roles that don't require previous genomics experience at any of the companies listed.
Tooling roles in SWE in every other field are highly regarded. Why not here?
They have a hard time having someone with a BS or MS making 50-75k more than a freshly-minted PhD.
I just left a job in pharma because I cannot do it anymore (salary being a big one, but my experiences reflect many in this thread).
They spent 500k on a consulting company to build a few NGS processing pipelines. This was built using a framework I was unfamiliar with. I re-factored one of them and was able to increase runtime by 60% in a couple weeks. I was paid in the low 100’s.
They would rather contract out the high-paid work and pay orders of magnitude more for it.
it depends on the role. it worked out really well for me when I got to drop in and do piece work on lots of projects in different fields. working on a larger software development project can be really painful and demoralizing because the people running it don't really understand how the sausage gets made.
Can you really trust the scientific results if they depend on software made by people who don't care about code quality?
I also experienced "software engineers" who had no idea what they were doing being given more credence because they had a PhD in some bio-related field. Oh, you got a PhD in some molecular aspect of some tiny piece of biology, and that makes you qualified to build big data systems? It did not. Apparently what that gives you is an adherence to reading decades old textbooks about database design. It was like working with a first year software engineering undergrad from twenty years ago.
To be fair, it looks like the same can be said for machine learning. Many software engineers I know are in the "machine learning space", but report that they are just operations support for data scientists, and don't actually get to learn about, let alone be involved with creating, the models they support.
If you are a software engineer, work in a software company, where engineering is the value proposition.
I used to work in genomics and computational biology. It was incredibly interesting. But it's university research and gets paid as such. 2-year time-limited contracts, lots of interns and students, extremely low salaries.
I'm interested in contributing to this field. I have significant experience in 3D graphics, game engines, compilers and language runtimes. I'm a competent low-level engineer.
There's a lot of red-flags in this thread about adverse working conditions, but I'm running under the assumption there are a handful of companies out there that work with a software-minded approach.. ie. respect SWEs for who they are and what they do. If you represent one such company, and are looking for engineers who have a keen eye for performance and architecture, I'd love to hear from you.
jesse@scallywag.software
https://scallywag.software/resume.html
EDIT: Largely interested in remote roles, but could relocate for the right offer.
What happens when the world's most brillant minds do something else than making us click on more ads ?
There are probably 100s of tools written for this but no clear winner so far. The traditional software engineering approaches like git, ci/cd seem too heavyweight (or rather too low-level) especially during development. IMHO there could be space for a fully remote/cloud solution where one would code/debug/deploy from the browser optimized for writing/maintaining pipelines.
There's also more need for integrating "multi-omics" data, where you have data from multiple assays (gene expression, phospho-proteomics, lipidomics, epigenetics, small RNA expression, etc etc) with the goal of somehow combining all these different assay results from various levels of gene regulation, to get closer to figuring out actual mechanism for complex processes. Building on that, we can also do single-cell multi-omics to some extent- where you have results from different sequencing-based assays on the level of the same individual cell. This is still pretty limited, but it's exciting and advancing pretty quickly. This will eventually be combined with things like spatial transcriptomics, which is useful for mapping out what's going on in heterogeneous tissue samples like tumors, for example, so we'll end up with spatial single-cell multi-omics, at which point you're looking at 1) some quantitative trait for multiple genes/loci/molecules, and often 10k+ of such features at the same time per assay, 2) multiple assays, such as DNA accessibility and gene expression, in 3) single-cells, of which you might have 10k of in a single sample, 4) across a physical tissue sample where individual cells are spatially mapped, and where you probably want to figure out how cells might influence the state of those around them, and 5) in multiple different samples, where you might want to compare disease vs control, or look for correlation to heterogeneity of results within one group.
There's a lot of public data already available for single-cell gene expression projects if you want to get a feel for how these things are structured and how (passable but not amazing) the existing tooling is- one of the main repositories for this data is the NCBI's SRA https://www.ncbi.nlm.nih.gov/sra but you'll quickly note that searching and browsing is not as easy as you might think it would be- because one of the main limiting factors in bioinformatics is how bad everyone is at keeping terminology consistent. For many bioinformaticians, a majority of time is spent in the data cleaning phase. It's awful. Sometimes the experimental parameters make it into SRA or GEO, but sometimes you have to read through the associated paper to pull that out. Often it's only large consortium projects like the The Cancer Genome Atlas (TCGA) or the Genotype-Tissue Expression project (GTEx) - which have enough funding for staff dedicated to data management- end up publishing datasets that are easy to "consume" without having to jump through a whole bunch of hurdles to figure out how the data was produced.
I have a BS/MS in bioinformatics and I'm presently a PhD candidate in genetics and computational biology defending in February.
The endemic disease of the field is the leadership. A leadership made out of Principal Investigators forged in academia, appear simply incapable of producing any item which is not articles (or equivalents thereof).
Things the study said would not work never worked i.e. biologists wanted "temporarily" private data, say until till published as psyc predicted they would never freely share it.
but the biggest thing I will try to paraphrase:
Biology is an observational, the work is in interpreting which lends to group dynamics and politics, leaders ect.
Which is at odds with Math/CS which is constructive where if something can be proved then that is that.
So when a CS person states a fact from their perspective a biologist might see it as just another opinion subject to hierarchical ranking.
So I would argue it is a function the individuals proclivities and correlated training in the cultural environment they end up in.
So a healthy work environment could value both fact and opinion where each has a complementary role whether academic or industry.
But as a longtime academic, I am now sadly looking towards industry.
They have more cash to play with, but their leadership fails in the same pattern.
The offer was $38k a year. About two days later, I got my second offer, $50k from a game company, and then a real offer, $60k, which I took. This was in the late 1990s.
That was 20+ years ago, of course, but I sort of wonder if things have changed. I frankly think a lot of SWE work for fundamentally evil, socially destructive companies, and I honestly don't think you have to to earn a good living, but you also don't have to work for companies that deliriously underpay you.
Software development becomes important when certain data processing methods have been standardised, eg mapping sequencing data to mouse or human genome, differential expression analysis, pca visualisations.
It was an enlightening experience. While I was the programming expert with a CS degree, I wasn't trusted for anything, because I wasn't a PhD or had a background in bioinformatics. However, I did get to work with lots of smart people, fixed and improved the code and processes that the Phd level statisticians and bioinformaticians used.
It is a real joy to work in hard science, with brilliant people who love their work. I learned a ton and gained a healthy respect for the people that do this kind of work.
However, the downsides are pretty bad. Pay and compensation is awful. Most people, myself included, could have made as good if not better pay waiting tables. There end up being different levels of people Administrators, Private investigators, and lab workers (peons). Unless you are an admin or a high level PI you're not gonna be getting much money.
Everybody lives and dies by the grant. If funding dries up, you will be out of a job.
Ethics. Us CS people are woefully under educated on ethics. You will find yourself asking why we can simply do something, often the answer will be ethics.
Regulations, like ethics, you will have to bend to regulations. It's not a bad thing, just a different thing.
Unless you find yourself in a admin role, you will just be another lab peon. Its not a a bad place to be, but you will never be at the top of the totempole.
Loads and loads of ego. You will work with very smart and sometimes unreasonable people. Learning to navigate this with tact is important.
The other posters are not wrong about compensation. Total compensation is off by a factor of two to three.
However, it is absolutely possible to work with a group of top-notch engineers on serious distributed systems & compilers in service of an excellent scientific-user experience. I know because I do. We are lucky to have a PI who respects and hires a diversity of expertise within his lab.
I enjoy being deeply embedded with our users. I do not have to guess what they need or want because I help them do it every day.
I also enjoy enmeshing engineering with statistics, mathematics, and biology. Work is more interesting when so many disciplines conspire towards the end of improved human health.
The problem is that the organizations involved in this sort of work often still consider software development as a cost center and therefore do not offer competitive salaries.
This field needs marketing, product and project managers (for-profit or non-profit variety) that could figure out:
1. what product to build to have the biggest impact
2. how to build it.
Once 1. and 2. is clear it will be equally clear that if you have a bunch of scientists you won't get a great product, as nobody will build the product, everyone will build a prototype.
So then it will follow that the project needs to hire (=attract) software engineers to be in charge of software, and attracting software engineers means giving them competitive compensation.
That's their whole life: "where did you do your PhD? Who did you do your postdoc under?"
Many world-class hackers would do pretty poorly on those questions.
One of the borderline fraudulent aspects of the field is the pretense that method publications are real software.
That is, you come up with a break through statistical or algorithmic method, you get it to run exactly once based on whatever random walk of exploratory code got you to a result that looks better than competing/prior methods, and then you dump your workspace into a script and put it on Github and pretend this is something anybody else could or should responsibly use in your Tier 1 publication. The minute the publication is approved there is zero benefit to the authors in maintaining the software, and in fact its better if nobody can run it because that way they can't disprove your results. Then naturally nobody can get this to work afterwards and 50% of software engineering time and effort is trying to run code that can/never will work outside the context it was created in - but you have to try because this is now the accepted best practice method of doing X or Y based on its publication.
The bigger problem is that this whole cycle actually shapes the view of software engineering by academics to the point where they really do think that most software engineering is a waste of time. A small number of 10x engineers manage to prosper in the environment, but it's mainly because they have the sheer technical capability to deal with ALL of that while still doing something useful, and it actually makes the problem worse because the academics then see that as the baseline for software engineering capability.
Sometimes I really don't understand. Much of the field's code does not even have testing, and it is baffling for me to think how the results are believed to be correct in the first place if there is no rigorous testing.
I remember Manolis Kellis sprinkled some pretty interesting genomic questions into his Algorithm class's problem sets. There were a number of cool problems about optimally aligning strings, searching within text, etc.
This was like 15 years ago and I haven't kept up with the discipline at all. But is there still algorithmic low hanging fruit?
I do keep reading about an ongoing series of problems with Microsoft Excel distorting analysis in the scientific literature (https://www.nature.com/articles/d41586-021-02211-4) and wondering if the tooling is having trouble..?
Algorithmic bioinformatics has become a separate research field, because there are so many low-hanging fruit. Biotech companies create new instruments producing new kinds of data, researchers find new uses for the data, and new algorithmic problems emerge all the time. There is also a steady migration of people from theoretical computer science to bioinformatics, because it's often easier to get research funding for something bioinformatics-related than for pure CS.
I would say no unless looking at the frontiers of what is done in the wet lab which might require new analytical tools. But this stuff is probably much easier for and much better aligned with someone doing CS in academia.
My impression that there is quite some space for ML-based approaches including DL. But even there I would not call it low-hanging.
The product (Dragen) has been around for a few years and now will be integrated in the new generation of sequencers. Extremely impressive technology and a better fit for the niche compared to GPU-based solutions I have seen. More downstream processing and analytics is sometimes closer to traditional ML and naturally there are lots of GPU-based algos.
True, but working in academia is very VERY different working in a tech/product company.
Who wants to fix other peoples code mess? This is a no-no if you want to promote a job opening.
The original authors' quirks get enshrined in the code base, and its neigh impossible to fix until they leave the company that commercialized it.
It's the same reason why there's a lack of qualified computer science teachers in schools.
Shoot me a message at raphael@atomic.ai if you want to learn more.
So I spent more than 8 years as a SWE at Google, and now work here with both experimental biologists and machine learning scientists. And yes, a lot of the concerns mentioned in this thread are also things I have had anxiety about.
Most obvious to me, being a software engineer at Google felt like being the center of the universe. Coming here, the focus is the scientific research. And yes, the scientists all managed to complete their PhDs so they don't necessarily need me to unblock them every second of their day. But contrary to my expectations, this has been remarkably freeing. I think one particularly important part of our company that makes this work is that, even on the science side, we're multidisciplinary (at a high level, emphasizing both experimental biology and ML.) And so engineering feeling like another arm of that multi-discipline nature is fairly... natural.
The reason I feel it's freeing, and the reason I enjoy working here, is also the greatest challenge. Because the scientists are focused on the science, because they respect me and trust me to figure it out, and because they aren't constantly blocked by me, my job is mostly about dreaming extremely expansively about what I can do to reduce toil and make the scientists more productive. Of course they have feedback and input, but how I use my time and what I build is ultimately my decision because I am the engineer. And I have been able to do some things I am very proud of, like rolling out Bazel and Kubernetes and finding ways to seamlessly bring them into the cloud (we're even multi-cloud now without them even noticing!) On the other hand, it's very challenging because when you work on a product, say Google Photos, as a SWE, you always have some direct tether to the product ("what should we build next? ahhhh, well I guess we could just embed stable difficusion and a million people would immediately play with it".) At Atomic, my tether is very ambiguous. If I do my job successfully, they'll be able to do research more quickly (? effectively?), and eventually we'll be able to produce a therapeutic that hopefully changes the world. Identifying what I can do today to speed up that far outcome in the future is very challenging, but it is a far more interesting challenge than gluing some pre-existing software into my UI or running A/B tests to turn a red button blue.
If, like me, you enjoy being given ownership over incredibly ambiguous problems, please do reach out!
This role focuses on directly partnering with the biologists: https://boards.greenhouse.io/atomai/jobs/4726839004
This role is expansive cloud infra: https://boards.greenhouse.io/atomai/jobs/4531035004
And this role is directly partnering with the ML scientists: https://boards.greenhouse.io/atomai/jobs/4191285004
I'm a nerd about everything - I love learning, and this field is incredible for it. The complexity and depth of biological systems dwarfs what we're doing in the software industry. I work with brilliant people doing absolutely fascinating work, and I get to learn more every day. At the same time, I get to build things that make a genuine contribution to the people I'm working with - I can see the value and impact of my skillset in a way that was a lot harder when I was working at a software company. The leverage that good software folks can provide to folks outside the industry is almost impossible to overstate - our ability to scale up what the practitioners in the field are doing can offer an almost category change in what they can attempt.
At the same time, there's still really, really knotty software problems to be had - computer science has benefited quite a lot from our ability to segment and structure our problems, but biology doesn't allow for that - everything that we're working with is operating at every scale, from molecular interactions up through genomics into protein design and folding and into metabolic modelling. Add to that that the data structures you're dealing with can vary from a few characters up to a couple megabytes (within the same represented "object"), distant elements within the same object can interact meaningfully, the objects themselves tend to be embedded in larger structures with which they meaningfully interact, and you've got some fiendishly complex problems.
And at the end of all that, you've got a field which offers a legitimate possibility of helping us move past petrochemicals; an enormous expansion in the kinds, potency, and specificity of healthcare; and a new and novel set of tools for shaping our world. It's an incredibly exciting place to be, and I've found people are genuinely thrilled to have good software folks along.
Our company values software quality and we're very product focussed. We're actively hiring in London: https://news.ycombinator.com/item?id=33423547
We had bad attrition to both more interesting and higher paying work. (I left for both after a year at the consultancy)
If you're into bioinformatics or genomics, but aren't excited about an academic setting, take a peek: https://recruiting2.ultipro.com/ARU1000ARUP/JobBoard/62cc791...
We hire fully remote positions and starting salaries are about US$100k.
But I've known several financially successful developers who have gone back for a PhD in bioinformatics and genomics, and, after getting over their distaste for existing tools, have made important and well-recognized contributions. But they did not make more money.
https://wiki.debian.org/DebianGenomics https://blends.debian.org/med/tasks/ https://blends.debian.org/science/tasks/
Google and Microsoft probably know how to make software?
Side note: why does this page have user-select: none on body? It's annoying; what does it accomplish?
Bioinformaticians come in two flavors. Those that studied biology and then took up coding and then the even rarer computer scientists who learned biology. The latter are so rare that they are almost all professors or founders or work at Deep Mind etc... Then, there are the biomedical engineers, etc...
The computer scientists will go off a solve protein folding when the bioinformaticians and chemists worked on it for years.. I am exaggerating a little here, I imagine there were plenty of bioinformaticians on the Alpha Fold team, but the fundamental breakthrough was DNNs.
research software engineer will develop the mathematics to describe things, then use the numerical system to write software to determine things.
It's been a challenging topic to learn about, because most of the information comes from Computer Science papers and articles where the information is presented in a very formal, mathematical way, which I am just not used to.
Normally when thinking about data structures and algorithms, we're mostly concerned with optimizing for speed. Space complexity is not usually as big of a consideration. Succinct data structures are all about creating ways to achieve good runtime performance while representing the data in a "compressed" format. I think this comes in handy when doing things like DNA sequencing since data sets are so large.
I'm excited to check out some of links in the post, and in case any one else is interested in learning more about succinct data structures, here's a few resources I'd recommend:
Prof. Ben Langmead's YouTube channel: https://www.youtube.com/user/BenLangmead/featured
Alex Bowe's blog has some good content: https://www.alexbowe.com/articles/
Prof. Erik Demaine's "succinct" lectures from his adv. data structures course at MIT on YouTube: https://www.youtube.com/watch?v=3Y2weLDiUWw
Edward Kmett's Haskell live coding session going into some details about succinct: https://www.youtube.com/watch?v=9MKEmNNJgFc
There's also a lot of research papers, which you should be able to find by searching for "succinct data structures" (Jacobson, Munro, Brodnik, Raman, Rao, Navaro, Sadakane just to name a few). I at least have a basic CS undergraduate degree, but many of these papers are over my head, but I have still been able to slowly understand more and more. Some I had to purchase.
I basically work in EdTech. The company is not an EdTech company, it's a education services company. I was hired on to develop software that we couldn't find in the market[0].
I'm the process of building this thing, we've been attending and speaking at conferences in our industry. And I'm seeing a lot of the same stories: academia is trying to do research, the research fundamentally requires software to make the research happen, the quality of the software can have a huge impact on results, but because software development is tangential to the research goals, there's little to no allocation to software developers. This leaves the researches to cobble together a solution that maybe kinda fulfills their need, not corky, and certainly not perpetually (a lot of reliance on trial software and services).
We would love to offer our software to researchers in our field. We've gotten feedback from several that what we are building is exactly the sort of thing they need. But they have no money, and even if we were in a position to give it away for free, we can't even make those connections come to fruition.
So I don't know what to do. I really am thinking of starting to give it away for free, because at least we'd benefit from more research results in our field pricing the efficacy of our approach. But that's a really slow burn.
[0] Specifics don't matter, but if you're curious, I make a VR environment for foreign language training emphasizing culture.
A. What are the broad and medium goals you work on?
B. What are your daily activities? How do they fir into (A)?
C. What does nonprofit genomics vs for profit look like from a revenue standpoint?
D. What specific technologies/stacks are you using?
E. The CRUD frontend+backend+database to serve users (and sell ads) is pretty ubiquitous in 'tech', with some branches. How does your field compare?
SWEs cannot write code that maps equations that may change daily completely due to modeling / assumptions change.
Too much focus on modularizing, premature optimization, useless unit testing etc. Who cares about all these if the underlying model is wrong?
If things are stable enough to go into production then the code should leave academia and be re-written properly by SWEs, not by clueless bio phds.
fixed the title.
You’ll be the janitor cleaning up their 20k LoC, one file Python with zero abstraction.
If this is already a thing at a FAANG, it will be worse at a pure science shop.
aka need to be able to develop the dna / dna number system equivalent of things (aka something other than binary / punch card block based number system) such as:
treesitter
nyquest : https://en.wikipedia.org/wiki/Nyquist_(programming_language)
slippery chicken : https://ccrma.stanford.edu/workshops/algorithmic-composition-with-slippery-chicken
but wind up doing the equivalent of automated statistical analysis, because focus is NOT to develop software package/system.short broader subject take, what programming groups dont get about applicative programming vs. algol/block programming
I think this is a little generous. Engineers of all stripes should take responsibility for their work. If they say, "Yes I can add methylation analysis in three weeks," then they should make sure that means it's made well, with tests and all. I've definitely encountered people who don't communicate the scale of the task, and for most of them it's because they don't do software engineering; they do informatics scripting.
virtual or reprogram the robotic arm?
Genomics companies: consider paying more.
It feels like it’s going very slowly though. The field just really depends on their Unix philosophy tools, there is a lot of gzipped text files that are piped through bash scripts and tool like awk and grep. It works, mostly, but there is a lot of weirdness.
It sounds like academia is simply too toxic, entitled, full of itself and hierarchical to provide an environment with good software practices can thrive.
Working in research in general doesn't seem to pay that well or the well paid jobs are few and far.
Maybe it's a sector ready to be disrupted by a startup with quality developers; but I still have to see disruption based on improving code quality. It's a tangential aspect as well and doesn't impact much the actual business.
Where do they live? What do they do?
Like, do you need a genome interpreter? Does one exist? Are there any open source products used by the field currently? I know the names of the programs and items I'd look at to get started in AI, for example. But for genomics, it's a total mystery.
https://wiki.debian.org/DebianGenomics https://blends.debian.org/med/tasks/ https://blends.debian.org/science/tasks/
PS. Feel free to reach out. Email in profile. I’ll be happy to email around the subject.
For me, working in the field is worth doing because I have come to a place in my life where I value doing something useful more than I value other things. You really can't put a value on being able to get up every single day and know that you are actually doing something good for the world that day. And getting paid, while less than your absolute highest potential, still a really good salary by comparison to most of society.
Plus you do get a lot of freedom and autonomy, and exposure to absolutely fascinating research and biology, and if you want to dabble in academia, it's surprisingly easy if you have a supportive group.
I appreciate the authors honesty. Been there, take it easy bud.
First the comp, most people think about the income they get as in levels.fyi TC. IMHO, The no. 1 value add is working for an academic center is the freedom in both time and spirit you get in pursuing your interest and the side ventures & hustles which eventually compounds. The hours are very reasonable in academia and in most places, you can take classes internally on campus or get reimbursed for it and get supportive managers who let you take time off from work to study. Or just great WLB to pursue something you really enjoy. And this compounds both spiritually and financially.
Just a data point of one, I took an online data science degree whilst working like 15hrs/week and 25hrs on classes. From the classes, I got the bug to apply data science I learned on my degree and on the genomics analysis job to apply to the financial markets/automated trading. Now over the past 4 years, I've achieved CAGR of 35%, and sharpe of 2.5 where my options trading portfolio capital gains outsizes consistently my W-2 pay and keep me par on L5 of FAANG engineer. To give you an idea, my other co-workers have gone into side-hustles real estate (not sure about now) or running day-care to great success. Yes because you have that much free time.
Now autonomy/academic stimulation, I would not give it up for the world even if I was doing it for free. Previously I was working for a "hot tech" company where I was bored out of my minds cranking CRUD widgets and re-learning JS frameworks every year and attending BS lunch n' learn work sessions of new crappy libraries with hipster names. In genomics, you get to apply traditional stat techniques (bioconductor), deep learning techniques (tensorflow, AlphaFold, GANs) and learn latest sequencing protocols (scRNASeq, ChIPSeq, CRISPR screenings) and learn the biology domain too (immunology, viral responses, cell regulatory networks, synthetic biology. It's like being on the front-seat to a movie cinema or basketball court where the scientific evolution is happening. You're learning something new everyday and you are at the center of it all as PIs, wet lab bench scientists all depend on you to perform the analysis and build the pipelines... and 8 years in, and I'm still excited with the only disappointment that I will never learn it all.
Obv. a subjective data point of one, but I just want to add my data point just in case somebody out there on the fence. Yes sometimes you can truly have it all.
https://github.com/TimothyStiles/poly
A large part of my project's community are devs that want to get into the field but can't tolerate the ridiculously low pay, laughably bad management, disrespect, and what amounts to 40+ years of technical debt that's endemic to biotech software.
I've had companies here in the Bay Area offer me 100K a year with a straight face. I've had companies during interview tell me they're looking for someone to help, "set up GitHub". I've seen job listings for low paid web dev positions require applicants to have PhDs.
The reality is that except for a growing handful of places management straight up won't know the difference between IT and software engineers. It's what I call the naive buyers problem.
The demand for software engineers in biotech is generated by naive buyers that don't know what they need, why they need it, or how to get it.
Benchling and Recursion Pharmaceuticals have reputations in the industry of paying, "standard software salaries". So do the research divisions at places like deepmind/microsoft/google but in my experience there's even new multi-billion dollar institutes where senior management has never even heard the term devops.
Most places advertise for "data scientist", positions or some analog, instead of software engineers. This is mostly because upper management has never met an actual practicing software engineer in a professional setting. Many come from academia where the culture and work requirements heavily disincentivize standard software engineering practices.
It's also not uncommon for a biotech company to either have a very under qualified CTO whose main programming experience is what they learned doing ML research like stuff during their PhD or not even have one at all which has huge downstream consequences.
This week a software engineer trying to make the switch to biotech actually DM'd me to ask why they were seeing a ton of data science / ML job positions but no software engineering / devops positions.
They were worried that these companies were trying to save on costs by forcing their data scientists to create infrastructure but it's actually worse than that. Most of these companies aren't even aware that there's supposed to be infrastructure.
Despite all of this the future is looking better and I'm starting to find new companies and positions that are well... reasonable. I learned about this thread from a friend at a party last night that works at one of these companies. There's a small, strong new wave of companies and developers out there pushing biotech software forward. Hopefully some (including myself) make it big while pushing the idea that better tech equals better biotech.
1. Compensation – In academia, you will likely take a big salary hit (much of this is discussed). There are a few exceptions like newer institutes like Chan Zuckerberg, Arc Institute, etc that are paying much more competitive salaries though. In well-backed startups and larger biotech/pharma, cash is likely equal (or often more) to software comps elsewhere – the bigger hit you take is usually in equity – no one has been able to match FAANG on total comp with RSUs in the mix. Startups can provide options, but it's not very fungible. For example, we benchmark salary on comparable A16Z pre-public non-bio companies use as well as stats from the broader SV SWE salary datasets. There are startups in bio that pay even higher to lure talent.
2. Research vs Product – Over the last decade, there are a bunch of highly profitable tech companies and large funded new startups (e.g., Calico, Altos, Deepmind, etc) trying to take on bio as the next frontier. These places (like those named in the blog post) pay very competitively. Thus far, these places often turn into a big mess because it becomes hard to deliver products (like drugs) in a mostly academic-y atmosphere. I don't think anyone has really cracked this nut yet (or if it's even possible).
2. Culture of SW importance – In a lot of startups these days, this has changed quite a bit over the last 5 years. Lots of software & data science first startups. I think in the larger pharma/biotech though, the centrality of drug discovery takes a lot more oxygen than software, which are often thought of as innovation bets and different places have different levels of long term commitment.
3. I think one important difference is the type of company. There are many software companies in healthcare/bio that are software products supporting R&D, healthcare, drug development etc. Many of them have done quite well (e.g., Benchling, Komodo Health etc in A16Z portfolio alone) and are basically just software companies that just happen to be in bio. There are many others like most drug discovery companies (like us) where software and data science is enabling, but the product is often ultimately drugs. For a lot of SWEs, this becomes problematic because people often want the satisfaction of having externally deployed software products to push into the world. The heroes and heroines of this world are often drug hunters over tool developers, and this has cultural consequences as well. Some people are really good with this (getting a lot of satisfaction out of enabling new drugs to treat serious disease), but a lot of folks aren't.
4. The current biotech crash has been bigger and more sustained than the tech crash thus far. High interest rates impact this industry much more than others, because revenue on new drugs, which drive a large part of the industry usually take a decade or more to develop before revenues are flowing. This is less of an issue in healthtech companies that can often deploy much more quickly (90% of healthcare costs are not drugs).
5. Finally, there are many happy SWEs and DS in bio at companies that value software and can build good careers in it building products that ultimately help human health in new ways. It's a pretty amazing time in biology, with a suite of new technologies to read, interpret, write, edit, deploy molecules/DNA/cells that are really unlocking many of the mysteries of human diseases. I feel lucky every day we get to continue building in this space.
https://web.archive.org/web/20221119162905/https://claymcleo...
If I can get a better salary and working conditions at some crappy no-name startup, why would I choose to work at an organization that respects my craft so little they haven't bothered to maintain their software for a decade?
> why would I choose to work at an organization that respects my craft so little they haven't bothered to maintain their software for a decade
This is changing in my experience, albeit slowly. And really, this is what I'm calling on us, as a community, to do better on.
The reason you _would_ work at these organizations is because (1) the subject-matter is really interesting, (2) there are hard problems to be solved, and (3) you wake up every morning knowing that you are working on something that will have an impact on the lives of people around the world.
At least those are my reasons :)
Amen, couldn't have said it better myself. I'm sure it's very worthy and all working on a genomics project that aims to eradicate some killer disease, but you need to live and provide for your family while you're doing it.
High potential areas like genomics that are behind on software are amazing places for talented software people with a givig attitude to have a big positive impact.
Initial code would still be developed by SME, who:
- Don't understand most programming abstractions
- Don't see the advantage of a clean codebase
- Would rather go back to their code spaghetti mess, than help figure out why some corner cases behave differently in a fresh codebase
- Would still submit changes to their code spaghetti mess and expect you to apply them to the cleaned codebase
I did what the author suggested (not in genomics, but in a different research-heavy scientific field) for a while and would not recommend it to anyone.
And that's not even taking compensation and work conditions into account.
The only reason he told me for still working there, is that:
1- the workload is fairly low
2- he has a lot of autonomy
3- he shows up every day around noon and leaves at 5PM