Synthetic Control and Experimentation Culture

When standard experiments fail: utilizing Synthetic Controls, managing experimentation culture, and understanding various treatment effects.

While randomized controlled experiments (A/B testing) remain the gold standard for drawing causal inferences, there are many scenarios where they are deeply flawed or impossible to implement.

Standard experimental techniques—ranging from individual-level A/B testing to spatial/cluster randomization or switchback experiments—become unworkable when:

In these cases, we rely on causal inference techniques designed for observational data. Arguably the most important innovation in policy evaluation in recent decades is Synthetic Control.

Uber Cash Trips: The Spillover Effect

Uber launched in the US as a card-only service but later expanded into cash-heavy markets like Latin America and India. While accepting cash unlocked new rider segments, it introduced operational friction—specifically, drivers having to carry change and Uber struggling to collect its commissions.

To evaluate this, Uber ran an experiment: showing drivers the payment method upfront to measure the impact on trip acceptance rates and unpaid service fees.

However, standard A/B testing fails here due to network interference (the spillover effect). If drivers in the treatment group prefer cash and systematically decline card trips, they will consume the supply of cash trips. Consequently, the control group is starved of cash trips, artificially skewing the experiment’s results even though they cannot see the payment types.

How about switchback experiment? We can fix city and switch back and forth between treatment and control, over different time intervals. See here for the example from Doordash’s algorithm change experiment and Lyft’s surge pricing subsidy experiment. Note that these features are not user-facing. They can be silently deployed. Switchback experiment diagram from Uber PyData Amsterdam 2019

Cash trip is a user facing feature. Driver can be both in control and treatment group across different time bucket. Therefore we cannot use switchback experiment for this cash trip experiment. Even for the algorithmic changes, the user might notice the switchback experiment these days so it is getting harder to run switchback experiments. For example, Nick Jones at Uber mentioned that they could not use switchback experiment for their surge pricing algorithm change experiment.

Synthetic Control

testimonials

When to use this

The Main Idea

Example


Experimentation Culture

When running experiments in the tech industry, the goal fundamentally shifts away from purely “scientific” rigidity.

Classical Power Analysis vs. Discovery-Driven Experimentation

In a traditional scientific approach, experiments are sized (via Power Analysis) strictly to reject false hypotheses and accept true ones. You set a sample size $N$ optimized to detect an effect $X$ with statistical significance, and you run the test to completion.

The Problem: This wastes time and sample size. Tech companies have to prioritize velocity. They have an endless backlog of features to test, and the limitation is network bandwidth/sample size.

The Solution: Discovery-driven or Adaptive Experimentation. We “peek” at the results smartly using upper and lower dynamic thresholds. If a product is clearly amazing early on, we stop the experiment, declare victory, and ship it. If it’s clearly a dud or causing harm, we shut it down immediately to save sample size for the next idea.

The Universal Holdout

Standard A/B tests isolate single features, making it impossible to answer: “What is the aggregate effect of everything we shipped this quarter on long-term retention?”

To measure long-term, multi-feature impact, companies utilize a Universal Holdout. At the beginning of a quarter (or year), a small set of users is completely held back from receiving any new products or experiments. At the end of the time frame, analyzing the difference between this universal control group and the general population reveals the cumulative impact of the entire product roadmap.


Miscellaneous Topics

Ethics in Experimentation

When does an experiment cross ethical boundaries?

Specific Treatment Effects

Depending on the randomization and compliance, evaluating causality yields different definitions of “Effect”: