Pooling clubSandwich results across multiple imputations

A colleague recently asked me about how to apply cluster-robust hypothesis tests and confidence intervals, as calculated with the clubSandwich package, when dealing with multiply imputed datasets. Standard methods (i.e., Rubin’s rules) for pooling estimates from multiple imputed datasets are developed under the assumption that the full-data estimates are approximately normally distributed. However, this might not be reasonable when working with test statistics based on cluster-robust variance estimators, which can be imprecise when the number of clusters is small or the design matrix of predictors is unbalanced in certain ways. Barnard and Rubin (1999) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets. In this post, I’ll show how to implement their technique using the output of clubSandwich, with multiple imputations generated using the mice package.

Setup

To begin, let me create missingness in a dataset containing multiple clusters of observations:

library(mlmRev)
library(mice)
library(dplyr)

data(bdf)

bdf <- bdf %>%
  select(schoolNR, IQ.verb, IQ.perf, sex, ses, langPRET, aritPRET, aritPOST) %>%
  mutate(
    schoolNR = factor(schoolNR),
    sex = as.numeric(sex)
    ) %>%
  filter(as.numeric(schoolNR) <= 30) %>%
  droplevels()

bdf_missing <- 
  bdf %>% 
  select(-schoolNR) %>%
  ampute(run = TRUE)

bdf_missing <- 
  bdf_missing$amp %>%
  mutate(schoolNR = bdf$schoolNR)

summary(bdf_missing)
##     IQ.verb         IQ.perf            sex             ses       
##  Min.   : 4.00   Min.   : 5.333   Min.   :1.000   Min.   :10.00  
##  1st Qu.:10.50   1st Qu.: 9.333   1st Qu.:1.000   1st Qu.:20.00  
##  Median :11.50   Median :10.667   Median :1.000   Median :28.00  
##  Mean   :11.66   Mean   :10.747   Mean   :1.471   Mean   :28.65  
##  3rd Qu.:13.00   3rd Qu.:12.333   3rd Qu.:2.000   3rd Qu.:38.00  
##  Max.   :18.00   Max.   :16.667   Max.   :2.000   Max.   :50.00  
##  NA's   :38      NA's   :40       NA's   :28      NA's   :47     
##     langPRET        aritPRET        aritPOST        schoolNR  
##  Min.   :15.00   Min.   : 1.00   Min.   : 2.00   40     : 35  
##  1st Qu.:30.00   1st Qu.: 9.00   1st Qu.:12.00   54     : 31  
##  Median :35.00   Median :11.00   Median :18.00   55     : 30  
##  Mean   :33.93   Mean   :11.57   Mean   :17.62   38     : 28  
##  3rd Qu.:39.00   3rd Qu.:14.00   3rd Qu.:23.00   1      : 25  
##  Max.   :48.00   Max.   :20.00   Max.   :30.00   18     : 24  
##  NA's   :51      NA's   :51      NA's   :38      (Other):354

Now I’ll use mice to create 10 multiply imputed datasets:

Impute_bdf <- mice(bdf_missing, m=10, meth="norm.nob", seed=24)

Am I imputing while ignoring the hierarchical structure of the data? Yes, yes I am. Is this is a good way to do imputation? Probably not. But this is a quick and dirty example, so we’re going to have to live with it.

Model

Suppose that the goal of our analysis is to estimate the coefficients of the following regression model:

\[ \text{aritPOST}_{ij} = \beta_0 + \beta_1 \text{aritPRET}_{ij} + \beta_2 \text{langPRET}_{ij} + \beta_3 \text{sex}_{ij} + \beta_4 \text{SES}_{ij} + e_{ij}, \]

where \(i\) indexes students and \(j\) indexes schools, and where we allow for the possibility that errors from the same cluster are correlated in an unspecified way. With complete data, we could estimate the model by ordinary least squares and then use clubSandwich to get standard errors that are robust to within-cluster dependence and heteroskedasticity. The code for this is as follows:

library(clubSandwich)
lm_full <- lm(aritPOST ~ aritPRET + langPRET + sex + ses, data = bdf)
coef_test(lm_full, cluster = bdf$schoolNR, vcov = "CR2")
##          Coef Estimate     SE d.f. p-val (Satt) Sig.
## 1 (Intercept)  -2.1921 1.3484 22.9       0.1177     
## 2    aritPRET   1.0053 0.0833 23.4       <0.001  ***
## 3    langPRET   0.2758 0.0294 24.1       <0.001  ***
## 4         sex  -1.2040 0.4706 23.8       0.0173    *
## 5         ses   0.0233 0.0266 20.5       0.3909

If cluster dependence were no concern, we could simply use the model-based standard errors and test statistics. The mice package provides functions that will fit the model to each imputed dataset and then combine them by Rubin’s rules. The code is simply:

with(data = Impute_bdf, 
     lm(aritPOST ~ aritPRET + langPRET + sex + ses)
     ) %>%
  pool(method = "Rubin") %>%
  summary() %>%
  round(3)
##                est    se      t       df Pr(>|t|)  lo 95  hi 95 nmis   fmi
## (Intercept) -1.718 1.148 -1.497 1759.752    0.135 -3.969  0.533   NA 0.073
## aritPRET     0.975 0.074 13.111  336.195    0.000  0.828  1.121   51 0.169
## langPRET     0.259 0.038  6.757  532.218    0.000  0.184  0.334   51 0.133
## sex         -1.172 0.430 -2.728  494.551    0.007 -2.016 -0.328   28 0.138
## ses          0.040 0.021  1.863  208.718    0.064 -0.002  0.082   47 0.215
##             lambda
## (Intercept)  0.072
## aritPRET     0.164
## langPRET     0.130
## sex          0.135
## ses          0.208

However, this approach ignores the possibility of correlation in the errors of units in the same cluster, which is clearly a concern in this dataset:

# ratio of CRVE to conventional variance estimates
diag(vcovCR(lm_full, cluster = bdf$schoolNR, type = "CR2")) / 
  diag(vcov(lm_full))
## (Intercept)    aritPRET    langPRET         sex         ses 
##   1.5296837   1.5493134   0.6938735   1.4567650   2.0053186

So we need a way to pool results based on the cluster-robust variance estimators, while also accounting for the relatively small number of clusters in this dataset.

Barnard & Rubin (1999)

Barnard and Rubin (1999) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets that seems to work in this context. Rather than using large-sample normal approximations for inference, they derive an approximate degrees-of-freedom that combines uncertainty in the standard errors calculated from each imputed dataset with between-imputation uncertainty. The method is as follows.

Suppose that we have \(m\) imputed datasets. Let \(\hat\beta_{(j)}\) be the estimated regression coefficient from imputed dataset \(j\), with (in this case cluster-robust) sampling variance estimate \(V_{(j)}\). Further, let \(\eta_{(j)}\) be the degrees of freedom corresponding to \(V_{(j)}\). To combine these estimates, calculate the averages across multiply imputed datasets:

\[ \bar\beta = \frac{1}{m}\sum_{j=1}^m \hat\beta_{(j)}, \qquad \bar{V} = \frac{1}{m}\sum_{j=1}^m V_{(j)}, \qquad \bar\eta = \frac{1}{m}\sum_{j=1}^m \eta_{(j)}. \]

Also calculate the between-imputation variance

\[ B = \frac{1}{m - 1} \sum_{j=1}^m \left(\hat\beta_{(j)} - \bar\beta\right)^2 \]

And then combine the between- and within- variance estimates using Rubin’s rules:

\[ V_{total} = \bar{V} + \frac{m + 1}{m} B. \]

The degrees of freedom associated with \(V_{total}\) modify the estimated complete-data degrees of freedom \(\bar\eta\) using quantities that depend on the fraction of missing information in a coefficient. The fraction of missing information is given by

\[ \hat\gamma_m = \frac{(m+1)B}{m V_{total}} \]

The degrees of freedom are then given by

\[ \nu_{total} = \left(\frac{1}{\nu_m} + \frac{1}{\nu_{obs}}\right)^{-1}, \]

where

\[ \nu_m = \frac{(m - 1)}{\hat\gamma_m^2}, \quad \text{and} \quad \nu_{obs} = \frac{\bar\eta (\bar\eta + 1) (1 - \hat\gamma)}{\bar\eta + 3}. \]

Hypothesis tests and confidence intervals are based on the approximation

\[ \frac{\bar\beta - \beta_0}{\sqrt{V_{total}}} \ \stackrel{\cdot}{\sim} \ t(\nu_{total}) \]

Implementation

Here is how to carry out these calculations using the results of clubSandwich::coef_test and a bit of dplyr:

# fit results with clubSandwich standard errors

models_robust <- with(data = Impute_bdf, 
                      lm(aritPOST ~ aritPRET + langPRET + sex + ses) %>% 
                         coef_test(cluster=bdf$schoolNR, vcov="CR2")
                      ) 


# pool results with clubSandwich standard errors

robust_pooled <- 
  models_robust$analyses %>%
  
  # add coefficient names as a column
  lapply(function(x) {
    x$coef <- row.names(x)
    x
  }) %>%
  bind_rows() %>%
  as.data.frame() %>%
  
  # summarize by coefficient
  group_by(coef) %>%
  summarise(
    m = n(),
    B = var(beta),
    beta_bar = mean(beta),
    V_bar = mean(SE^2),
    eta_bar = mean(df)
  ) %>%
  
  mutate(
    
    # calculate intermediate quantities to get df
    V_total = V_bar + B * (m + 1) / m,
    gamma = ((m + 1) / m) * B / V_total,
    df_m = (m - 1) / gamma^2,
    df_obs = eta_bar * (eta_bar + 1) * (1 - gamma) / (eta_bar + 3),
    df = 1 / (1 / df_m + 1 / df_obs),
    
    # calculate summary quantities for output
    se = sqrt(V_total),
    t = beta_bar / se,
    p_val = 2 * pt(abs(t), df = df, lower.tail = FALSE),
    crit = qt(0.975, df = df),
    lo95 = beta_bar - se * crit,
    hi95 = beta_bar + se * crit
  )

robust_pooled %>%
  select(coef, est = beta_bar, se, t, df, p_val, lo95, hi95, gamma) %>%
  mutate_at(vars(est:gamma), round, 3)
## # A tibble: 5 x 9
##   coef            est     se      t    df  p_val    lo95   hi95  gamma
##   <chr>         <dbl>  <dbl>  <dbl> <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
## 1 (Intercept) -1.72   1.42   - 1.21  19.6 0.241  -4.69    1.25  0.0470
## 2 aritPRET     0.975  0.0930  10.5   18.5 0       0.779   1.17  0.104 
## 3 langPRET     0.259  0.0360   7.29  17.8 0       0.184   0.333 0.151 
## 4 ses          0.0400 0.0320   1.24  16.8 0.233  -0.0280  0.108 0.0920
## 5 sex         -1.17   0.469  - 2.50  18.9 0.0220 -2.15   -0.191 0.113

It is instructive to compare the calculated df to eta_bar and df_m:

robust_pooled %>%
  select(coef, df, df_m, eta_bar) %>%
  mutate_at(vars(df, df_m, eta_bar), round, 1)
## # A tibble: 5 x 4
##   coef           df  df_m eta_bar
##   <chr>       <dbl> <dbl>   <dbl>
## 1 (Intercept)  19.6  4138    22.4
## 2 aritPRET     18.5   827    22.9
## 3 langPRET     17.8   393    23.8
## 4 ses          16.8  1069    20.5
## 5 sex          18.9   701    23.7

Here, eta_bar is the average of the complete data degrees of freedom, and it can be seen that the total degrees of freedom are somewhat less than the average complete-data degrees of freedom. This is by construction. Further df_m is the conventional degrees of freedom used in multiple-imputation, which assume that the complete-data estimates are normally distributed, and in this example they are way far off.

Further thoughts

How well does this method perform in practice? I’m not entirely sure—I’m just trusting that Barnard and Rubin’s approximation is sound and would work in this setting (I mean, they’re smart people!). Are there other, better approaches? Totally possible. I have done zero literature review beyond the Barnard and Rubin paper. In any case, exploring the performance of this method (and any other alternatives) seems like it would make for a very nice student project.

There’s also the issue of how to do tests of multi-dimensional constraints (i.e., F-tests). The clubSandwich package implements Wald-type tests for multi-dimensional constraints, using a small-sample correction that we developed (Tipton & Pustejovsky, 2015; Pustejovsky & Tipton, 2016). But it would take some further thought to figure out how to handle multiply imputed data with this type of test…

Related

Next
Previous
comments powered by Disqus