Understanding Causality: The Importance of Randomized Experiments
Written on
Establishing causality is a powerful capability, allowing one to assert relationships with the word "because." For example, we can say our sales surged because we revamped our website. Similarly, a reduction in crime rates can be attributed to new preventive measures. Identifying causal relationships accurately is vital for data-informed decision-making in both business—optimizing operations—and government—ensuring effective use of taxpayer money. This article series will explore four statistical methodologies that lend scientific rigor to assertions of causality.
Only through a systematic approach to causality can we justifiably use "because."
The four techniques we will examine include:
- Randomized experiments
- Instrumental variables
- Regression discontinuity
- Difference-in-differences
This first installment focuses on the gold standard in scientific inquiry: randomized experiments.
Correlation Does Not Imply Causation
You may have encountered the phrase "correlation does not imply causation." Just because two events occur simultaneously doesn't mean one causes the other. A classic example is the correlation between the number of people who drown in pools and the number of films featuring Nicolas Cage released.
Intuitively, this appears to be a misleading correlation. It seems absurd to posit a causal link between these two occurrences. However, how would one determine the direction of causation? Is it that Nicolas Cage's films are so dismal that they drive viewers to drown? Or could it be that drownings somehow lead to an increase in his film roles? Perhaps an external factor, such as economic downturn, is the true cause, inducing despair and prompting Cage to act in more films to enhance his earnings.
To tackle questions of causality, robust statistical methods are essential. The most straightforward and effective method is randomized experiments.
The Potential Outcome Model
Before discussing randomized experiments, it's essential to introduce the framework for analyzing causality known as the Potential Outcome Model. Much of its terminology is borrowed from medical research, where causality estimation is often referred to as treatment evaluation. This nomenclature stems from the model’s early focus on assessing whether administering treatment to ill patients causes an improvement in their health.
Let's establish some notation. In the Potential Outcome Model, we focus on two variables:
- C? = {0, 1}, indicating whether individual i received the treatment,
- Y?, the outcome for individual i.
Here, C = 1 might refer to treating a sick patient with a drug, where Y could represent the viral load in that patient. Alternatively, C = 1 could represent showing a user a new website layout instead of the old one (C = 0), with Y measuring the time spent on the site. Regardless of whether we're assessing a drug's effectiveness or the impact of a website redesign, the same principles apply.
The outcome for any individual can take one of two values, termed potential outcomes:
- Y?? indicates the outcome if individual i was treated (when C? = 1),
- Y?? indicates the outcome if individual i was not treated (when C? = 0).
Naturally, a single individual cannot be treated and untreated at the same time, so we only observe one of these values. The unobserved value is known as the counterfactual. For instance, if user i experiences the new website layout and spends 50 seconds on it, we have Y?? = 50, but we lack the counterfactual Y?? indicating how long they would have spent under the old layout.
Imagine if we could traverse alternate realities and observe this person's outcome in both scenarios: treated and untreated. Then, the treatment effect for individual i could be calculated as:
TE? = Y?? - Y??
If our user would have spent only Y?? = 30 seconds with the old layout, the causal effect of switching to the new layout would be 50 - 30 = 20 seconds for that specific user.
If we had similar data from many individuals, we could average these effects to estimate the Average Treatment Effect (ATE):
ATE = E[Y?? - Y??]
In most instances, however, researchers are less interested in ATE. The typical objective is to apply the treatment to larger populations if it proves effective. For example, if a drug is shown to aid recovery, we would want to distribute it widely. If a new website layout increases user engagement, we would implement it for all users. Therefore, the key metric we seek is the Average Treatment Effect on the Treated (ATET), which is simply the average effect of the treatment among those who received it:
ATET = E[Y?? - Y?? | C? = 1]
If we could compute ATET, we would have the average causal impact of the treatment. Unfortunately, this is not feasible since we cannot observe the necessary counterfactuals.
Naive Comparison?
What if we merely look at the difference in average outcomes between those treated and those untreated?
Naive comparison = E[Y?? | C? = 1] - E[Y?? | C? = 0]
For instance, if patients receiving the drug average 90 virus cells per cm³ of tissue, while those who did not average 140, does this demonstrate the drug's causal influence? Sadly, it does not. To understand why, we can manipulate the equation by adding and subtracting the counterfactual Y?? for treated individuals. This approach does not alter the equation:
By adding these counterfactual terms (highlighted in orange), we reveal that a naive comparison reflects the ATET plus an additional term known as selection bias. If selection bias were zero, this naive comparison would yield the treatment's causal effect.
Causal effect estimation hinges on eliminating selection bias.
Typically, selection bias is not zero. It arises from the reasons individuals choose to receive treatment, and addressing it presents a significant challenge in establishing causality. Fortunately, there is a solution.
Enter Randomized Experiments
The solution lies in randomized experiments, which are remarkably straightforward. The key is to randomly assign participants into two groups: a treatment group that receives the intervention and a control group that does not.
The rationale is that, if the sample is sufficiently large and the assignment is genuinely random, all characteristics unrelated to the treatment—and potentially influencing outcomes—will be evenly distributed across both groups. Consequently, both groups will include similar proportions of various traits, such as height, intelligence, and age. The only distinction between the treatment and control groups will be the treatment itself.
In other words, in the absence of the treatment, both groups should be statistically identical:
E[Y?? | C? = 0] = E[Y?? | C? = 1],
This indicates that the average outcome in the treated group would match that of the control group if neither had received the treatment. Notably, these equal terms are what constituted the selection bias in the previous formula (marked in pink). Since they are equal, they cancel each other out after subtraction. Therefore, in a randomized experiment, the naive difference in average outcomes between the treated and untreated groups reflects the ATET or the causal effect of the treatment on those who received it.
In a randomized experiment, the difference in average outcomes between treated and untreated groups represents the causal effect!
The Fly in the Ointment
While randomized experiments are a powerful tool, they necessitate appropriate design, a sufficiently large sample, and true randomization to accurately establish causal effects. In fields like medical research, randomized trials are standard practice, enabling the approval of new drugs by demonstrating that patients recover because of the treatment administered. In marketing, A/B testing is also a form of randomized experimentation, where a subset of users experiences a design change, allowing us to attribute any observed increase in engagement directly to that change.
However, there are limitations. Randomized experiments may not always be feasible or ethical, especially in social sciences and economics. For instance, can one impose a tax reform on a random sample to assess its impact on unemployment? It is impractical and unethical to assign children to be bullied to study long-term effects on success. In such cases, alternative methodologies must be employed to estimate causal impacts.
Next, we will explore the instrumental variables method.
Thanks for reading!
If you enjoyed this article, consider subscribing for email updates on my latest writings. By becoming a Medium member, you can support my work and gain unlimited access to stories from other authors as well.
Want to stay updated on the rapidly evolving fields of machine learning and AI? Check out my new newsletter, AI Pulse. For consulting inquiries, feel free to reach out or book a 1:1 session here.
You might also be interested in my other articles. Can't decide? Here are a few suggestions:
6 Useful Probability Distributions with Applications to Data Science Problems
A practical overview with examples and Python code.
The Gentlest of Introductions to Bayesian Data Analysis
8 Tips for Object-Oriented Programming in Python
Follow these steps to make your Python classes bulletproof.