A more general observation: If your conclusion after reading a bunch of studies is "wow I really don't understand the fancy math they're doing here" then usually you should do the work to understand that math before you conclude that it's all a load of crap. Not always, of course, but usually.
EA types spend a lot of time talking with subject matter experts, see e.g. https://www.givewell.org/international/technical/programs/vi...
So then you are hands in the air, trying to get others internalize, a discussion culture where you actually can as normal thing claim or are by default assumed to be probably superficial.
Even if someone knows lots about foundational things like philosophy & what they want etc., statements have a certain feel to it regardless of mathematical truthness, and at some point you will do the value benefit calculation that you might be superficial but you should learn about others topics first because you know you can for less time gain more information that is valuable from them.
---
Pearl's causality as far as I see is best modelled with cdr car + NBG as embedded agency foundations + computability in the mix + signal theory (time as discrete evolution from one state in the chain to another; signal theory is relevant when you have multiple agents or the environment & PARTS of you run at different clocks or something more complex) i.e. part of formal embedded agency. Doesn't feel too meaningful without it except epistemologically (close what can we know type questions) where causality might be a good lens, especially to filter inquiries.
While this is true, putting the onus on the reader to understand a lot of advanced math makes it easy to avoid scrutiny by increasing the complexity of your math such that the only people who are ever going to be able to critique you are the intersection between PhD-level mathematicians and experts in the field your paper actually pertains to. Anyone can just say whatever they want and assure you that they must be right because they know more math than everyone else who's interested in that problem.
Instead of, "Understand the math before you conclude that it's all a load of crap," I would say, if it's an unreasonable level of complexity for the particular problem and you can't find a large body of other papers doing something similar with the same problem, just ignore it.
[0] edit: Hansen was awarded the Nobel Memorial Prize in Economics in 2013 for GMM, not that that means it can't fail, but clearly a lot of people have found it useful.
> The "controlling for" thing relies on a lot of subtle assumptions and can break in all kinds of weird ways. Here's[1] a technical explanation of some of the pitfalls; here's[2] a set of deconstructions of regressions that break in weird ways.
[1] https://journals.plos.org/plosone/article?id=10.1371/journal...
[2] https://www.cold-takes.com/phil-birnbaums-regression-analysi...
To me this seems to demonstrate a stronger understanding of regression analysis than 90+% of scientists who use the technique.
> "generalized method of moments" approaches to cross-country analysis (of e.g. the effectiveness of aid)
Which is an entirely reasonable criticism. GMM is a complex mathematical process, wiki suggests [0] that it assumes data generated by a weakly stationary ergodic stochastic process of multivariate normal variables. There are a lot of ways that a real world data for aid distribution might be nonergodic, unstationary, generally distributed or even deterministic!
Verifying that a paper has used a parameter estimation technique like that properly is not a trivial task even for someone who understands GMM quite well. A reader can't be expected to follow what the implications are from reading a study; there is a strong element of trust.
[0] https://en.wikipedia.org/wiki/Generalized_method_of_moments
E.g. I certainly found myself agreeing with his points about observational studies, and there are plenty of real-world examples you can point to where experts have been lead astray by these kinds of studies (e.g. alcohol consumption recommendations, egg/cholesterol recommendations, etc.)
But when he talked about his reservations re "the wheat" studies, they seemed really weak to me and semi-bizarre:
1. Regarding "The paper doesn't make it easy to replicate its analysis." I mean, no shit Sherlock? The whole point is that it would be prohibitively expensive or unethical to carry out these real experiments, so we rely on these "natural" experiments to reach better conclusions.
2. "There was other weird stuff going on (e.g., changes in census data collection methods), during the strange historical event, so it's a little hard to generalize." First, this seems kind of hand-wavy (not all natural experiments have this issue), but second and more importantly, of course it's hard to "generalize" these kinds of experiments because their value in the first place is that they're trying to tease out one specific variable at a specific point in time.
3. The third bullet point just seemed like it could be summarized as "news flash, academics like to endlessly argue about shit."
I think the fundamental problem when looking for "does X cause Y", is that in the real world these are complex systems: lots of other things cause Y too (or can reduce its chances), so you're only ever able to make some statistical statement, e.g. X makes Y Z% more likely, on average. But even then, suppose there is some thing that could make Y Z% more likely among some specific sub-population, but make it some percent less likely in another sub-population (not an exact analogy but my understanding is that most people don't really need to worry about cholesterol in eggs, but a sub-population of people is very reactive to dietary cholesterol).
Basically, it feels like the author is looking for some definitive, unambiguous "does X cause Y", but that's not really how complex systems work.
One of the first time I got interested in reading medical studies was when I saw a bunch of headlines announcing that a randomized controlled trial had proved that echinacea was ineffective for treating respiratory problems. This surprised me, because I'd always been a dogmatic drinker of echinacea tea whenever I had a cold, and had thought that it helped. But then again, I come from a culture of damn dirty hippies, so I was open to being wrong about it. Rather than rely on the headlines, I decided to dig up the study itself.
Here's what the study actually found: that rubbing an echinacea-infused ointment on your wrists has no effect on respiratory health.
Er... yeah, no shit, Sherlock. Literally nobody uses echinacea that way. You've just falsified a total straw-man of a hypothesis, and based on the number of headlines generated off the back of this, I think it's reasonable to presume there was some kind of funded apparatus for disseminating that bogus result.
Ever since then, I've learned not to trust the headlines when it comes to trials, reserving judgment until I've looked at the methodology. When I do, a lot come up short.
For some areas of research, truly understanding causality is essentially impossible - if well-controlled experiments are impossible and the list of possible colliders and confounders is unknowable.
The key problem is that any causal relation can be an illusion caused by some other, unobserved relation!
This means that in order to show fully valid causal effect estimates, we need to
- measure precisely
- measure all relevant variables
- actively NOT measure all harmful (i.e. falsely correlated) variables
I heartily recommend the book of why [1] by Pearl and Mackenzie for a deeper reading and the "haunted DAG" in McElreath's wonderful Statistical Rethinking.
Most things we learn about DAGs and causality are frustrating, but simulating a DAG (e.g. with lavaan in R) is a technique that actually helps in understanding when and how those assumptions make sense. That's (to me) a key part of making causality productive.
it all assumes you can divide the world cleanly into variables that can be the nodes of your DAG. The philosopher Nancy Cartwright talks about this a lot, but it’s also a practical problem.
You can make the argument, from correlative data, that bridges and train tracks cause truck accidents. And more importantly, if you act like they do when designing roadways, you actually will decrease truck accidents. But it's a common-sense-odd meaning of causality to claim a stationary object is acting upon a mobile object...
There is also this great book on causality in ML, but it's a much heavier read:
Chernozhukov, V., Hansen, C., Kallus, N., Spindler, M., & Syrgkanis, V. (2025). Causal Inference with ML and AI.
"The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data" (https://pmc.ncbi.nlm.nih.gov/articles/PMC5888052/)
"Does water kill? A call for less casual causal inferences" (https://pmc.ncbi.nlm.nih.gov/articles/PMC5207342/)
--
Can we nevertheless extract causality from correlation?
I would argue that, theoretically, we cannot. Practically speaking, however, we frequently settle for “very, very convincing correlations” as indicative of causation. A correlation may be persuasively described as causation if three conditions are met:
Completeness: The association itself (R²) is 100%. When we observe X, we always observe Y.
No bias: The association between X and Y is not affected by a third, omitted variable, Z.
Temporality: X temporally precedes Y.
The problem comes when we try to do so practically, because reality is full of surprising detail.
> No bias: The association between X and Y is not affected by a third, omitted variable, Z.
This is, practically speaking, the difficult condition. I'm not so convinced the others are necessary (practically speaking, anyway) but you should read Pearl if you're into this!
- No colliders have been included in the analysis, which would introduce appearance of causality that does not exist
And as another commenter already pointed out: You can't really eradicate the existence of an unknown Z
------------------ Confounders ------------------
A variable that affects both the exposure and the outcome. It is a common cause of both variables.
Role: Confounders can create a spurious association between the exposure and outcome if not properly controlled for. They are typically addressed by controlling for them in statistical models, such as regression analysis, to reduce bias and estimate the true causal effect.
Example: Age is a common confounder in many studies because it can affect both the exposure (e.g., smoking) and the outcome (e.g., lung cancer).
------------------ Colliders ------------------
A variable that is causally influenced by two or more other variables. In graphical models, it is represented as a node where the arrowheads from these variables "collide."
Role: Colliders do not inherently create an association between the variables that influence them. However, conditioning on a collider (e.g., through stratification or regression) can introduce a non-causal association between these variables, leading to collider bias.
Example: If both smoking and lung cancer affect quality of life, quality of life is a collider. Conditioning on quality of life could create a biased association between smoking and lung cancer.
------------------ Differences ------------------
Direction of Causality: Confounders cause both the exposure and the outcome, while colliders are caused by both the exposure and the outcome.
Statistical Handling: Confounders should be controlled for to reduce bias, whereas controlling for colliders can introduce bias.
Graphical Representation: In Directed Acyclic Graphs (DAGs), confounders have arrows pointing away from them to both the exposure and outcome, while colliders have arrows pointing towards them from both the exposure and outcome.
------------------ Managing ------------------
Directed Acyclic Graphs (DAGs): These are useful tools for identifying and distinguishing between confounders and colliders. They help in understanding the causal structure of the variables involved.
Statistical Methods: For confounders, methods like regression analysis are effective for controlling their effects. For colliders, avoiding conditioning on them is crucial to prevent collider bias.
It takes a specific topic (here, health effects of red meat) and explains how each type of study can provide information, without proving anything. It helped me a lot understand the science related to nutrition, where you never have perfect studies.
I see this all the time in people’s interpretation of nutrition research, and they do exactly as this article suggests and fall back to the “intuitive option”, and go onto some woo diet that they eventually give up because they start feeling awful.
I would disagree that observational study designs should be thrown out the window or that it makes sense to, as this article seems to do, lump cross-sectional ecological data in with prospective cohort studies.
Things often “make intuitive sense” only because of these study designs. We used to get kids to smoke pipes to stave off chest infections because it made “intuitive sense” and it’s only because of observational studies that we now believe smoking causes lung cancer.
The direction of evidence from prospective cohort studies to RCTs in the field of nutrition science on intake vs intake shows a 92% agreement. If we take RCTs to be the “gold standard” of evidence that best tracks with reality, it seems a little odd that these deeply flawed observational studies that we should apparently disregard seem to do such a good job coming to the correct conclusions.
https://bmcmedicine.biomedcentral.com/articles/10.1186/s1291...
First, experiments have their own varieties of horrors. Many are small N, with selective data reporting, and lack external validity — that is, the thing you really want to randomize is difficult or impossible to randomize, so researchers randomize something else as a proxy that's not at all the same. Other times there's complex effects that distort the interpretation of the casual pathway implied by the experiment.
Second, sometimes it's important to show that any association exists. There are cases where it's pretty clear an association is non-existent, based on observational data and covariate analysis. You just don't hear about those because people stop talking about them because of the null effects. So there's a kind of survivorship bias in the way results are discussed, especially in the popular literature.
It's easy to handwave about limitations of studies, it's much harder to create studies that provide evidence, for logical, practical, and ethical reasons. Why you'd want less information about an important phenomenon isn't clear to me.
This is an interesting example, because I don’t know of any studies (although there probably are some, if only old ones) specifically about whether smoking pipes staves off lung infections, but the “intuitive sense” answer has changed because of adjacent evidence. And in this case, it’s not the lung cancer evidence that makes it intuitively unlikely that pipe smoking would be helpful, but a broader understanding of what causes lung infections, and what tobacco smoke contains and doesn’t contain.
But about causality. Long ago (old cars) I had a friend who told me that most mornings his car would not start until he opened the hood and wrapped some wires with tape (off with the old tape on with the new). Then the car would start. Every now and then it would take two wraps. Hmmm.
After he demonstrated this, I decided to try to help. I followed the wires that were wrapped. Two of them. To my surprise they were not connected at either end. This was insane, and yet his study - and my own observation - demonstrated that wrapping these two wires which were completely disconnected caused his car to start. Now there is causality for you.
Except that if you have a more complex model of cars, there is a sane explanation. Again this is an old car with a carburetor. In case you don't know this is a little bowl of gas it that provides a combustible mix of air and gas. If there is too much gas then your car won't work. The mix is controlled by a little float that controls the level of gas in the little bowl. Toilet bowls work on the same principle.
If your float is bad (or other issues) your car engine would get too much gas - be "flooded" and you have to wait until much of it evaporates. So if you flood your car engine, go and wrap some wires, it may be that your car will start right up.
So I rebuilt the carburetor and my friend never had that problem again.
The moral of the story is that I had better "model" of how cars work. But in the back of my mind I am aware that my model may be or have been just as deficient. Did you know that we are bombarded from space by an unknown type of neutrino that stops electricity from working unless there is a little pool of some liquid nearby or it is Thursday. I am going to do a study of this.
There are very good reasons to understand how frail our ability to understand causality is. And we are talking simple things here. The scientific method is about EXPERIMENTS. Yes, I did that in bold. Doing things. We have deeply complex situations we need to understand and in my opinion, studies do not help.
You didn't show causality, though. You never randomized anything. His study and your observation was purely observational. At no point did you open the hood, get ready to wrap the wires, and flip a coin to decide whether to wrap the wires or do a placebo wrapping somewhere else.
Had you done that, you would have found, per your ultimate explanation, that the wrapping made no causal difference: you did the procedure, and either way, the car turned on. Hence, there is no causality for you.
Causality eventually demands a "theory" for full explanatory power and understanding. Theories have premises, involve inference, and have predictions. Otherwise, we get ad-hoc models of phenomena via observations which is a great start, but ends up as an oversimplification. X causes Y but, what caused X or why did X cause Y and not Z ? models represent phenomena while theories explain them. we start with models, and then our curiosity eventually leads to a theory. refer [1] for a great read from a physicist turned quant.
[1] https://www.amazon.com/Models-Behaving-Badly-Confusing-Illus...
Your comparison to placebo is very apt: Giving medication to a patient (vs not giving anything) causes them to get better, but it might be the "giving a pill" part instead of the "ingesting medication" part that matters.
If you could reproduce it, it would usually be intermittent. Eventually you would learn “when I X then my character will Y, but only sometimes.
This is due to the real command being a subset or being a slight variation of what you thought was correct that you accidentally do sometimes.
Even when it’s ephemeral and seemingly random I still find these things valuable. It’s better to be able to reproduce it sometime instead of never. Answering the question “is doing this better than random?” (P95) can help you throw away a bad hypothesis. Most people don’t realize that when they are providing evidence for causality they are competing with random. If they had instead done jumping jacks or said a prayer to the engine gods X times, then the correlation between the wires and the engine might suddenly seem much weaker.
Once you have one hypothesis you can test it against others and I believe that’s powerful. Provided it’s done systemically and with at least a mild understanding of probability and error. Also a hypothesis without a theory first scientific. Why did your friend wrap the wires to begin with?
It’s okay to act in random until we find some effect, but then we also need to take the time to roll back (as you did) to ask “WHY did this happen?” In which case you can begin the process with a fresh hypothesis.
I feel when we are taught the scientific method in elementary school it doesn’t stick for most of us, even engineers. Especially non-engineer folks. It seems at first blush like some truisms strung together, but that simplicity hides very powerful capabilities and subtle edge cases.
There are all manner of observable, reproducible behaviors in nature that we barely have an explanation of. Those things remain observable and reproducible whether we can tell a tidy story about why they happen.
In a very meaningful sense, the local healer applying poultices formulated from generations of experimentation is using science much as the medical doctor is (assuming, of course, they're taking notes, passing on the discoveries, and the results are reproducible). The doctor having tied their results to the "germ theory of medicine" vs. the local healer having tied theirs to "the Earth Mother's energies impregnate the wound" is an irrelevant distinction until (and unless) a need comes along to unify the theory to some other observable outcomes.
Doctors routinely prescribe medications that have no randomized clinical trials supporting their use. In those cases, clinical experience replaces trial data; they "know" the drugs work because all the patients have effectively been trial subject over a span of decades.
IMO the model of that story is the S at the end of experiments is more than just repeating the same things. Fixing the carburetor was the second and vastly more informative experiment, but your friend could have tried various alternatives to doing exactly what he was doing which would then uncover the time component.
Science digs into problems, so the most important part of meta analysis, which is often ignored, is asking if the question even makes sense in a generic context. Just as crucial is narrowing down the effect size which may be dwarfed by random noise in some experiments etc.
If my parser gets null's when it should be non-null then I first need to find where they could potentially even come from, aka get a better understanding of the model I'm working with.
If you want to predict Y and you know X, you can use data that tell you when they happen together.
If you are trying to cause (or prevent) Y, it's harder. If you can't do experiments (e.g. macroeconomics), it's borderline impossible.
That sounds very much like a skills issue. Because it can. You call out what you consider might be confounders as independent variables (covariates). You can then use regression analysis to estimate the individual contributions from each confounder, and control for them by essentially filtering out that contribution.
Is reality harder than that? Yes. Much. The world of science isn't 9th grade math, sorry. You are not entitled to understand everything deeply with 5 minutes of mediocre effort.
You mean they don't cluster the data into sets of overlapping bins where the controlled attribute has approximately the same value and then look for the presence of an XY relationship within the bins instead of across them?
I don't actually know how the method you suggest compares in the limit of finer bins. It's possible it might only achieve similar results?
Good primer on both here: https://www.mynutritionscience.com/p/statistical-adjustment
This is a frustrating type of issue. Dismissing something with "I don't understand this, but I don't believe it" isn't the sort of thing I want to be doing. However, I don't have any desire to waste time trying to understand what someone has done (and did they really understand what they were doing themselves?) when it's clear that the effect isn't cleanly isolated in the data and no amount of mathematics is going to change that.
The causality is always present, we just don't have the processing power to ensure with 100% certainty that all relevant factors are accounted for and all spurious factors dismissed.
Direct quote from the author of this post and I couldn't agree more, particulartly about the post itself.
Does X cause Y? An in-depth evidence review - https://news.ycombinator.com/item?id=30613882 - March 2022 (3 comments)
If a study is publicly funded, there should be a minimum requirement: it must include at least two research arms—one with an experimentally manipulated variable and a proper control condition. Furthermore, no study should be considered conclusive until its findings have been successfully replicated, demonstrating a consistent predictive effect. This isn’t an unreasonable demand; it’s the foundation of real science. Yet, in clinical psychology, spineless researchers and overly cautious annd/or power crazed ethics committees have effectively neutered most studies into passive, observational, and ultimately useless exercises in statistical storytelling.
And for the love of all that is scientific, we need to stop the obsession with p-values. Statistical significance is meaningless if it doesn’t translate into real-world impact. Instead of reporting p-values as if they prove anything on their own, researchers should prioritize effect sizes that demonstrate meaningful clinical relevance. Otherwise, we’re left with a field drowning in “statistically significant” noise—impressive on paper but useless in practice.
What worth is a result with p<0.01 when the 10 previous articles with negative results were never actually written?
to make a career, you need to discovering quirky counterintuitive findings that can be turned into ted talks and 'one weird trick' clickbait. you become a big deal once you start providing fodder for the annoying "well, actually..." guy to drop on people at a dinner party/reddit comment section.
If you define everything to be "inside", then causality disappears because intervention disappears.
A healthy respect for the difficulties of determining causality is beneficial. Irrational skepticism ignoring the evidence of strong observational research simply replaces it with... what exactly? That's how we ended up with an 71 year old anti-vaccine conspiracist as the health secretary.
Instead you were drawn to a topic which seemed ambiguous, which had multiple possible interpretations, multiple plausible angles, and on which nobody could agree. You didn’t explicitly know these things starting out, but they were embedded in the very circumstances which caused you to investigate the subject further.
Yes, determining causation is sometimes hard, is it also sometimes very easy. However, very easy answers are not interesting ones, and so we find ourselves here.
Baby boom as solar panels sales skyrocket.
Causality is a largely orthogonal problem to frequentist/bayesian - it makes everything harder, not just one of those!
I mean, that's maths, either approach has to give the same results, as they come from the same theory. The Bayes theorem is just a theorem, use it explicitly or not, the numbers will be the same because the axioms are the same.