The problem with getting more and more specific about my experiences and expertise is you get closer and closer to doxxing yourself. Honestly, it may be time to retire this account again.
Also its worth noting that I am not claiming my experience is the end-all be-all. I am stating that, from my experience, I have incredible distrust for many studies that are published with 'amazing' results until peer reviewed preferably on disparate datasets.
My experience is based around outcome based studies of the effect of drugs/treatment/regimens in oncology and oncology adjacent fields. This includes drugs treated alongside traditional cancer regimens to assist with managing adverse events and toxicities.
Other fields may not have this reproducibility problem. Mine does. Even if the study design if perfect, and I can't imagine most are, the data itself can be questionable.
Consider - what dataset would you use to identify if patients taking keytruda had a higher incidence of high blood pressure?
You can use data from an EHR, licensed for deidentified studies, but EHR data is a burnt down trailer park of questionableness and its use in studies has been laughed at in many conferences.
You can use data from individual enrolled patients (for a clinical study) but then the cost is extremely high vs a non interventional observational study using other data methods. The value of the data is likely to be higher, but since it costs more to collect maybe you are only in a few regions that may have a higher prevalence or incidence of this anyway. Troublesome.
What about insurance data? You can get it cheaply, if you have high blood pressure GOOD doctors are likely to medicate you with a drug meant to treat it, and you can get it across the country. Seems good right? And it is, as you can generally draw an implication of high-blood-pressure->treatment-with-drug-x. So for a yes/no study it can help, but what if the base condition causes high blood pressure and we want to tell if the drug causes a HIGHER amount of high blood pressure than others. Insurance data by itself may not be enough to tell this data.
So what do you do? You are stuck with no great answers.... and this is assuming your study design is perfect.
So you buy multiple datasets in some third party health marketplace, and someone gets the great idea to combine the datasets to increase the n value. Well, too bad those datasets have a high overlap. So you have attributed a higher power to the study than is relevent.
I hope this explains more about the concerns I have. Though, I suppose, it may mean that this account is now dead. I will have to think further. Anyway, hope you have a great day.