What IS a huge problem is the almost complete lack of systematically acquired quantitative data on human health (and diseases) for a very large number (1 million subjects) of diverse humans WITH multiple deep-tissue biopsies (yes, essentially impossible) that srr suitable for multiomics at many ages/stages and across many environments. (Note, we can do this using mice.)
Some specific examples/questions to drive this point home: What is the largest study of mRNA expression in humans? ANSWER: The small but very expensive NUH GTEx study (n max of about 1000 Americans). This study acquired postmortem biopsies for just over 50 tissues. And what is the largest study of protein expression in humans across tissues? Oh sorry, this has never been done although we know proteins are the work-horses of life. What about lipids, metabolites, metagenomics, epigenomics? Sorry again, there is no systematically acquired data at all.
What we have instead is a very large cottage-industry of lab-level studies that are structurally incoherent.
Some brag about the massive biomedical data we have, but it is truly a ghost and most real data evaporates with a few years.
Here is my rant on fundamental data design flaws and fundamental data integration flaws in biomedical research:
Herding Cats: The Sociology of Data Integration https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2751652/