That coin-flip prediction example seems incomplete to me. Here's how I would extend it to relate to the topic at hand: you ask a researcher to predict the distribution of heads and tails for a specific coin, which is secretly a trick coin with heads on both sides. The scientist trains a prediction algorithm on thousands of coin-flips performed on many randomly selected coins in the wild. Upon testing your coin, they find that 100% of the outcomes are heads, statistically very different from the training data (as confirmed by the appropriate statistical methods). You seem to advocate an uncritical reporting of this result, with no allowance for discussing possible explanations beyond the strict study flow of "predict 50% ± X% heads, flip coin N times, observe 100% heads, report result, end of story". A responsible scientist would discuss possible explanations for the anomalous result, including biased instrumentation, biased training data, biased sampling, uncontrolled variables, and so on.
Uncritically collecting data, grinding through statistical tests, and reporting p-values/effect sizes/confidence intervals is not science. Researchers are fully expected to interpret their results in the greater context of their field, while of course maintaining statistical rigour, which is necessary but not sufficient to do good science.
No comments yet.