Paper review: Performance Guarantees for Individualized Treatment Rules

A review of Qian and Murphy (2011) on formulating individualized treatment rules via conditional outcome maximization with performance guarantees.

Performance Guarantees for Individualized Treatment Rules

Min Qian, and Susan A. Murphy

The Annals of Statistics, Apr 2011

Personalizing Medicine: Guaranteeing the Performance of Treatment Rules

In clinical practice, many illnesses show heterogeneous response to treatment.
Therefore, we want to recommend the treatment achieving the best predicted prognosis for that patient.
In other word, we want to build a policy that optimizes the conditional mean outcome.

The Setup

Data from single stage randomized trial
pretreatment variables representing patient characteristics ($X$)
a finite treatment space ($A$)
an outcome ($R$)
known randomization distribution of A given X by $p(\cdot \mid X)$
The objective is to formulate an individualized treatment rule (ITR), defined as a deterministic decision rule from $\mathcal{X}$ into the treatment space $\mathcal{A}$
We search over a class of policies.
The Old Way: Value Maximization
For low-dimensional $X$ and simple policy class, value maximization finds the optimal policy by maximizing the following quantity over the class of policies:

\begin{equation*} V(d)=E\left[\frac{1_{A=d(X)}}{p(A \mid X)}R\right] \end{equation*}

Does not work well for high-dimensional $X$ and large policy space.

The Solution: Conditional Outcomes

Instead of direct value maximization, the authors propose a conditional outcome approach. They prove that “an optimal ITR satisfies $d_0(X) \in \arg\max_{a\in\mathcal{A}} Q_0(X,a)$ a.s.” (p. 86). By characterizing the optimal policy as the population conditional outcome maximizer, the rule simply becomes: predict the outcome for each treatment, and pick the highest one (pp. 83, 86).

The Guarantee: Theorem 3.1

Why does this work? Theorem 3.1 proves that the “estimated ITR will be of high quality (i.e., have high Value) if we can estimate $Q_0$ accurately” (p. 88). It mathematically bounds the policy value error by the conditional outcome error, scaled by a margin condition (pp. 96, 102). This margin essentially “measures the difference in mean responses between the optimal treatment(s) and the best suboptimal treatment(s) at $x$” (p. 128).

The Catch

There is one major trap. If your approximation space doesn’t contain the true model, “minimizing the prediction error may not result in the ITR… that maximizes the Value” (p. 152). This occurs “when the approximation space $\mathcal{Q}$ does not provide a treatment effect term close to the treatment effect term in $Q_0$” (p. 153).

The Remedy

To “deal with the mismatch between minimizing the prediction error and maximizing the Value”, the authors consider “a large linear approximation space” via basis expansion (p. 162). By applying $l_1$-penalized least squares, this regression-based method sifts through high-dimensional data, prevents overfitting, and guarantees a highly effective, personalized treatment rule (pp. 162, 163, 180).