May 12, 2025
Missing Completely At Random: Prob(miss) has no relationship with missing var
\(\textcolor[rgb]{1.0,1.0,1.0}{space}\)
\(\textcolor[rgb]{1.0,1.0,1.0}{space}\)
(Conditionally) Missing At Random: P(miss) & missing var relation explained by other vars
\(P(miss_{test}) = f(income)\)
\(P(miss_{test}) = f(income)\)
Missing Not At Random: P(miss) related to missing var, independent of other model vars
\(P(miss_{test}) = f(test)\)
\(P(miss_{test}) = f(test)\)
\(P(miss_{test}) = f(income)\)
\(P(miss_{income}) = f(test)\)
\(P(miss_{test}) = f(income)\), \(P(miss_{income}) = f(test)\)
\(\textcolor[rgb]{1.0,1.0,1.0}{space}\)
\(P(miss_{test}) = f(income, treat)\)
\(P(miss_{test}) = f(income, treat)\), \(P(miss_{income}) = f(test_{pre})\)
\(P(miss_{income}) = f(test_{pre}, treat)\)
Positives
Negatives
Check out this tool for more RCT scenarios
\(\textcolor[rgb]{1.0,1.0,1.0}{space}\)
\(P(miss_{income}) = f(test_{pre})\)
\(P(miss_{income}) = f(test_{pre}, treat)\)
Mean | Variance | Covar | |
---|---|---|---|
Income | $84k | $400k | 40 |
Test Score | 11.5 | 12 |
Missing data are imputed by randomly sampling values from probability distributions created by model parameters, given individual’s observed data
But…
Method | Ease of Use | When to Use |
---|---|---|
Listwise Deletion | Easy |
|
FIML | Medium |
|
Multiple Imputation | Hard |
|
\[ test_i = \beta_0 + \beta_1 income_i + \epsilon_i \]
\[ \beta_0 = \beta_{0(obs)} \times p_{obs} + \beta_{0(miss)} \times p_{miss} \] \(p\) = sample proportion
\[ test_i = \beta_{0(obs)} + \delta miss_i + \beta_1 income_i + \epsilon_i \]
\(miss_i\) = 1 if missing test data, 0 if observed test data
\(\delta = \beta_{0(miss)} - \beta_{0(obs)}\)
\(\beta_0 = \beta_{0(obs)} \times p_{obs} + \beta_{0(miss)} \times p_{miss} \textcolor[rgb]{1.0,1.0,1.0}{(\delta + )}\)
\(\beta_0 = \beta_{0(obs)} \times p_{obs} + (\beta_{0(obs)} + \delta) \times p_{miss}\)
\[ \delta = d \times \sigma_{test} \]
\(d\) = Cohen’s D effect size
\(\sigma_{test}\) = test score SD
\[ \delta = 0.2 \times 3 = 0.6 \]
\(test_i = \beta_{0(obs)} + \delta miss_i + \beta_1 income_i + \epsilon_i \textcolor[rgb]{1.0,1.0,1.0}{(\delta + )}\)
\(test_i = \beta_{0(obs)} + 0.6 \times miss_i + \beta_1 income_i + \epsilon_i\)
\[ \beta_0 = \beta_{0(obs)} \times p_{obs} + (\beta_{0(obs)} + \delta) \times p_{miss} \]
\[ test_i = \beta_0 + \beta_1 treat_i + \beta_2 income_i + \epsilon_i \]
Interact missing indicator w/ variable whose slope you want to test
\[\begin{align} test_i = & \textcolor[rgb]{0.20,0.70,0.20}{\beta_{0(obs)} + \delta_0 miss_i} + \textcolor[rgb]{0.00,0.00,1.00}{\beta_{1(obs)} treat_i +} \\ & \textcolor[rgb]{0.00,0.00,1.00}{\delta_1 miss_i \times treat_i} + \beta_2 income_i + \epsilon_i \end{align}\]
\(\textcolor[rgb]{0.20,0.70,0.20}{\delta_0}\) = ctrl group, missing vs observed Y mean diff
\(\textcolor[rgb]{0.00,0.00,1.00}{\delta_1}\) = treat group, missing vs observed ATE diff
Observed Mean | Missing Mean | |
---|---|---|
Ctrl | \(\textcolor[rgb]{0.20,0.70,0.20}{\beta_{0(obs)}}\) | \(\textcolor[rgb]{0.20,0.70,0.20}{\beta_{0(obs)} + \delta_0}\) |
Treat | \(\textcolor[rgb]{0.00,0.00,1.00}{\beta_{0(obs)} + \beta_{1(obs)}}\) | \(\textcolor[rgb]{0.00,0.00,1.00}{\beta_{0(obs)} + \delta_0 + \beta_{1(obs)} + \delta_1}\) |
\[ \textcolor[rgb]{0.20,0.70,0.20}{mean_{ctrl}} = \textcolor[rgb]{0.20,0.70,0.20}{\beta_0} = \textcolor[rgb]{0.20,0.70,0.20}{\beta_{0(obs)} \times p_{0(obs)}} + \textcolor[rgb]{0.20,0.70,0.20}{(\beta_{0(obs)} + \delta_0) \times p_{0(miss)}} \]
\[\begin{align} \textcolor[rgb]{0.00,0.00,1.00}{mean_{treat}} = & \textcolor[rgb]{0.00,0.00,1.00}{(\beta_{0(obs)} + \beta_{1(obs)}) \times p_{1(obs)}} + \\ & \textcolor[rgb]{0.00,0.00,1.00}{(\beta_{0(obs)} + \delta_0 + \beta_{1(obs)} + \delta_1) \times p_{1(miss)}} \end{align}\]
\[ ATE_{MNAR} = \textcolor[rgb]{0.00,0.00,1.00}{mean_{treat}} - \textcolor[rgb]{0.20,0.70,0.20}{mean_{ctrl}} \]
…and let me know if you want some help!
van Buuren, S. (2018). 2.7 When not to use multiple imputation. Flexible Imputation of Missing Data. https://stefvanbuuren.name/fimd/sec-when.html
Enders, C. K. (2023). Missing data: An update on the state of the art. Psychological Methods. https://doi.org/10.1037/met0000563