A review of Qian and Murphy (2011) on formulating individualized treatment rules via conditional outcome maximization with performance guarantees.
known randomization distribution of A given X by $p(\cdot \mid X)$
The objective is to formulate an individualized treatment rule (ITR), defined as a deterministic decision rule from $\mathcal{X}$ into the treatment space $\mathcal{A}$
\begin{equation*} V(d)=E\left[\frac{1_{A=d(X)}}{p(A \mid X)}R\right] \end{equation*}
Instead of direct value maximization, the authors propose a conditional outcome approach. They prove that “an optimal ITR satisfies $d_0(X) \in \arg\max_{a\in\mathcal{A}} Q_0(X,a)$ a.s.” (p. 86). By characterizing the optimal policy as the population conditional outcome maximizer, the rule simply becomes: predict the outcome for each treatment, and pick the highest one (pp. 83, 86).
Why does this work? Theorem 3.1 proves that the “estimated ITR will be of high quality (i.e., have high Value) if we can estimate $Q_0$ accurately” (p. 88). It mathematically bounds the policy value error by the conditional outcome error, scaled by a margin condition (pp. 96, 102). This margin essentially “measures the difference in mean responses between the optimal treatment(s) and the best suboptimal treatment(s) at $x$” (p. 128).
There is one major trap. If your approximation space doesn’t contain the true model, “minimizing the prediction error may not result in the ITR… that maximizes the Value” (p. 152). This occurs “when the approximation space $\mathcal{Q}$ does not provide a treatment effect term close to the treatment effect term in $Q_0$” (p. 153).
To “deal with the mismatch between minimizing the prediction error and maximizing the Value”, the authors consider “a large linear approximation space” via basis expansion (p. 162). By applying $l_1$-penalized least squares, this regression-based method sifts through high-dimensional data, prevents overfitting, and guarantees a highly effective, personalized treatment rule (pp. 162, 163, 180).