A/B Testing Metrics

A comprehensive guide to selecting and evaluating metrics in A/B testing and online experimentation.

A/B testing (or online experimentation) is the gold standard for causal inference in tech and business. A key challenge in A/B testing is deciding what to measure. The choice of metrics determines whether we can accurately assess the causal impact of a new feature or change.

Types of Metrics

When designing an experiment, metrics generally fall into different roles:

1. Primary Metric

Example: Click-Through Rate (CTR)

2. Secondary Metrics

3. Guardrail Metrics

A metric in an A/B test is only useful if it reliably distinguishes robust changes from noise. It should be:

  1. simple and claer. shoud be explained in one sentence.
  2. actionable: must lead to a decision. for example, if the metric is short-term revenue, it is easy to make it significant (increase the price). but in long term lose custemer and lost long term goal. so it is not actionable.
  3. Sensitive (Statistical Power): The metric must be sensitive enough to detect meaningful changes caused by the treatment. If the metric is too noisy (high variance), it will be difficult to achieve statistical significance.
  4. Robust: The metric should not be overly sensitive to outliers or unrelated systemic variations.
  5. Actionable & Understandable: Stakeholders must easily understand what the metric measures so they can make informed decisions when the experiment concludes.
  6. Timely: The metric must manifest quickly enough to be measured within the duration of the experiment. For example, “Customer lifetime value over 5 years” is a poor A/B test metric because it takes 5 years to measure; a good proxy metric would be “User retention after 7 days”.

Statistical Properties

When analyzing metrics, we typically rely on normal approximation via the Central Limit Theorem. Important statistical concepts include:

References