What do we typically do in academic biomedical research in this situation?
The lead PI looks around the lab and finds a grad student or postdoc who knows how to turn on a computer and if very lucky also has had 6 months of experience noodling around with R or Python. This grad or postdoc is then charged with running some statistical analyses without any training whatsoever in data science. What is an outlier anyway, what do you mean by “normalize”, what is metadata exactly?
You get my drift: It is newbies in data science and programming (often 40-and 50-year-olds) leading novices (20- and 30-year-olds) to the slaughter. Might contribute to some lack of replicability ;-)
And it has been this way in the majority of academic labs since I started using CPM on an Apple 2 in 1980 at UC Davis in an electrophysiology lab in Psychology, to the first Macs I set up at Yale in a developmental neurobiology lab in 1984, and up to the point at which I set up my own lab in neurogenetics at the University of Tennessee with a pair of Mac IIs in 1989 and $150,000 in set-up funds, just enough for me to hire one very inexperience technician to help me do everything.
So in this context I hope all of you can appreciate that ANY help in bringing some real data science into mom-and-pop laboratories would be a huge huge boon.
And please god, let it be FOSS.