The clubSandwich package for meta-analysis with RVE
I’ve recently been working on small-sample correction methods for hypothesis tests in linear regression models with cluster-robust variance estimation. My colleague (and grad-schoolmate) Beth Tipton has developed small-sample adjustments for t-tests (of single regression coefficients) in the context of meta-regression models with robust variance estimation, and together we have developed methods for multiple-contrast hypothesis tests. We have an R package (called clubSandwich
) that implements all this stuff, not only for meta-regression models but also for other models and contexts where cluster-robust variance estimation is often used.
The alpha-version of the package is currently available on Github. See the Github README for instructions on how to install it in R. Below I demonstrate how to use the package to get robust variance estimates, t-tests, and F-tests, all with small-sample corrections. The example uses a dataset of effect sizes from a Campbell Collaboration systematic review of dropout prevention programs, conducted by Sandra Jo Wilson and her colleagues.
The original analysis included a meta-regression with covariates that capture methodological, participant, and program characteristics. I’ll use a regression specification that is similar to Model III from Wilson et al. (2011), but treat the evaluator_independence
and implementation_quality
variables as categorical rather than interval-level; the original analysis clustered at the level of the sample (some studies reported results from multiple samples), whereas I will cluster at the study level.
I fit the model two ways, first using the robumeta
package and then using metafor
.
robumeta model
options(width=150)
library(robumeta)
library(clubSandwich)
## Registered S3 method overwritten by 'clubSandwich':
## method from
## bread.mlm sandwich
data(dropoutPrevention)
m3_robu <- robu(LOR1 ~ study_design + attrition + group_equivalence + adjusted
+ outcome + evaluator_independence
+ male_pct + white_pct + average_age
+ implementation_quality + program_site + duration + service_hrs,
data = dropoutPrevention, studynum = studyID, var.eff.size = varLOR,
modelweights = "HIER")
print(m3_robu)
## RVE: Hierarchical Effects Model with Small-Sample Corrections
##
## Model: LOR1 ~ study_design + attrition + group_equivalence + adjusted + outcome + evaluator_independence + male_pct + white_pct + average_age + implementation_quality + program_site + duration + service_hrs
##
## Number of clusters = 152
## Number of outcomes = 385 (min = 1 , mean = 2.53 , median = 1 , max = 30 )
## Omega.sq = 0.24907
## Tau.sq = 0.1024663
##
## Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
## 1 X.Intercept. 0.016899 0.615399 0.0275 16.9 0.97841541 -1.28228 1.31608
## 2 study_designNon.random..non.matched -0.002626 0.185142 -0.0142 40.5 0.98875129 -0.37667 0.37141
## 3 study_designRandomized -0.086872 0.140044 -0.6203 38.6 0.53869676 -0.37024 0.19650
## 4 attrition 0.118889 0.247228 0.4809 15.5 0.63732597 -0.40666 0.64444
## 5 group_equivalence 0.502463 0.195838 2.5657 28.7 0.01579282 0.10174 0.90318 **
## 6 adjustedadjusted.data -0.322480 0.125413 -2.5713 33.8 0.01470796 -0.57741 -0.06755 **
## 7 outcomeenrolled 0.097059 0.139842 0.6941 16.5 0.49727848 -0.19862 0.39274
## 8 outcomegraduation 0.147643 0.134938 1.0942 30.2 0.28253825 -0.12786 0.42315
## 9 outcomegraduation.ged 0.258034 0.169134 1.5256 16.3 0.14632629 -0.10006 0.61613
## 10 evaluator_independenceIndirect..influential -0.765085 0.399109 -1.9170 6.2 0.10212896 -1.73406 0.20389
## 11 evaluator_independencePlanning -0.920874 0.346536 -2.6574 5.6 0.04027061 -1.78381 -0.05794 **
## 12 evaluator_independenceDelivery -0.916673 0.304303 -3.0124 4.7 0.03212299 -1.71432 -0.11903 **
## 13 male_pct 0.167965 0.181538 0.9252 16.4 0.36824526 -0.21609 0.55202
## 14 white_pct 0.022915 0.149394 0.1534 21.8 0.87950385 -0.28704 0.33287
## 15 average_age 0.037102 0.027053 1.3715 21.2 0.18458247 -0.01913 0.09333
## 16 implementation_qualityPossible.problems 0.411779 0.128898 3.1946 26.7 0.00358205 0.14714 0.67642 ***
## 17 implementation_qualityNo.apparent.problems 0.658570 0.123874 5.3164 34.6 0.00000635 0.40699 0.91015 ***
## 18 program_sitemixed 0.444384 0.172635 2.5741 28.6 0.01550504 0.09109 0.79768 **
## 19 program_siteschool.classroom 0.426658 0.159773 2.6704 37.4 0.01115192 0.10303 0.75028 **
## 20 program_siteschool..outside.of.classroom 0.262517 0.160519 1.6354 30.1 0.11236814 -0.06525 0.59028
## 21 duration 0.000427 0.000873 0.4895 36.7 0.62736846 -0.00134 0.00220
## 22 service_hrs -0.003434 0.005012 -0.6852 36.7 0.49752503 -0.01359 0.00672
## ---
## Signif. codes: < .01 *** < .05 ** < .10 *
## ---
## Note: If df < 4, do not trust the results
Note that robumeta
produces small-sample corrected standard errors and t-tests, and so there is no need to repeat those calculations with clubSandwich
. The evaluator_independence
variable has four levels, and it might be of interest to test whether the average program effects differ by the degree of evaluator independence. The null hypothesis in this case is that the 10th, 11th, and 12th regression coefficients are all equal to zero. A small-sample adjusted F-test for this hypothesis can be obtained as follows.
(The vcov = "CR2"
option means that the standard errors will be corrected using the bias-reduced linearization method proposed by McCaffrey, Bell, and Botts, 2001.)
Wald_test(m3_robu, constraints = 10:12, vcov = "CR2")
## Test F d.f. p.val
## HTZ 2.78 16.8 0.0732
By default, the Wald_test
function provides an F-type test with degrees of freedom estimated using the approximate Hotelling’s \(T^2_Z\) method. The test has less than 17 degrees of freedom, even though there are 152 independent studies in the data, and has a p-value of .07, so not-quite-significant at conventional levels. The low degrees of freedom are a consequence of the fact that one of the levels of evaluator independence
has only a few effect sizes in it:
table(dropoutPrevention$evaluator_independence)
##
## Independent Indirect, influential Planning Delivery
## 6 33 43 303
metafor model
Our package also works with models fit using the metafor
package. Here I re-fit the same regression specification, but use REML to estimate the variance components (robumeta
uses a method-of-moments estimator) and use a somewhat different weighting scheme than that used in robumeta
.
library(metafor)
m3_metafor <- rma.mv(LOR1 ~ study_design + attrition + group_equivalence + adjusted
+ outcome + evaluator_independence
+ male_pct + white_pct + average_age
+ implementation_quality + program_site + duration + service_hrs,
V = varLOR, random = list(~ 1 | studyID, ~ 1 | studySample),
data = dropoutPrevention)
summary(m3_metafor)
##
## Multivariate Meta-Analysis Model (k = 385; method: REML)
##
## logLik Deviance AIC BIC AICc
## -489.0357 978.0714 1026.0714 1119.5371 1029.6217
##
## Variance Components:
##
## estim sqrt nlvls fixed factor
## sigma^2.1 0.2274 0.4769 152 no studyID
## sigma^2.2 0.1145 0.3384 317 no studySample
##
## Test for Residual Heterogeneity:
## QE(df = 363) = 1588.4397, p-val < .0001
##
## Test of Moderators (coefficients 2:22):
## QM(df = 21) = 293.8694, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## intrcpt 0.5296 0.7250 0.7304 0.4651 -0.8915 1.9506
## study_designNon-random, non-matched -0.0494 0.1722 -0.2871 0.7741 -0.3870 0.2881
## study_designRandomized 0.0653 0.1628 0.4010 0.6884 -0.2538 0.3843
## attrition -0.1366 0.2429 -0.5623 0.5739 -0.6126 0.3395
## group_equivalence 0.4071 0.1573 2.5877 0.0097 0.0988 0.7155 **
## adjustedadjusted data -0.3581 0.1532 -2.3371 0.0194 -0.6585 -0.0578 *
## outcomeenrolled -0.2831 0.0771 -3.6709 0.0002 -0.4343 -0.1320 ***
## outcomegraduation -0.0913 0.0657 -1.3896 0.1646 -0.2201 0.0375
## outcomegraduation/ged 0.6983 0.0805 8.6750 <.0001 0.5406 0.8561 ***
## evaluator_independenceIndirect, influential -0.7530 0.4949 -1.5214 0.1282 -1.7230 0.2171
## evaluator_independencePlanning -0.7700 0.4869 -1.5814 0.1138 -1.7242 0.1843
## evaluator_independenceDelivery -1.0016 0.4600 -2.1774 0.0294 -1.9033 -0.1000 *
## male_pct 0.1021 0.1715 0.5951 0.5518 -0.2341 0.4382
## white_pct 0.1223 0.1804 0.6777 0.4979 -0.2313 0.4758
## average_age 0.0061 0.0291 0.2091 0.8344 -0.0509 0.0631
## implementation_qualityPossible problems 0.4738 0.1609 2.9445 0.0032 0.1584 0.7892 **
## implementation_qualityNo apparent problems 0.6318 0.1471 4.2965 <.0001 0.3436 0.9201 ***
## program_sitemixed 0.3289 0.2413 1.3631 0.1729 -0.1440 0.8019
## program_siteschool classroom 0.2920 0.1736 1.6821 0.0926 -0.0482 0.6321 .
## program_siteschool, outside of classroom 0.1616 0.1898 0.8515 0.3945 -0.2104 0.5337
## duration 0.0013 0.0009 1.3423 0.1795 -0.0006 0.0031
## service_hrs -0.0003 0.0047 -0.0654 0.9478 -0.0096 0.0090
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
metafor
produces model-based standard errors, t-tests, and confidence intervals. The coef_test
function from clubSandwich
will calculate robust standard errors and robust t-tests for each of the coefficients:
coef_test(m3_metafor, vcov = "CR2")
## Coef. Estimate SE t-stat d.f. p-val (Satt) Sig.
## 1 intrcpt 0.529569 0.724851 0.7306 20.08 0.47347
## 2 study_designNon-random, non-matched -0.049434 0.204152 -0.2421 58.42 0.80952
## 3 study_designRandomized 0.065272 0.149146 0.4376 53.17 0.66342
## 4 attrition -0.136575 0.306429 -0.4457 10.52 0.66485
## 5 group_equivalence 0.407108 0.210917 1.9302 23.10 0.06595 .
## 6 adjustedadjusted data -0.358124 0.136132 -2.6307 43.20 0.01176 *
## 7 outcomeenrolled -0.283124 0.237199 -1.1936 7.08 0.27108
## 8 outcomegraduation -0.091295 0.091465 -0.9981 9.95 0.34188
## 9 outcomegraduation/ged 0.698328 0.364882 1.9138 8.02 0.09188 .
## 10 evaluator_independenceIndirect, influential -0.752994 0.447670 -1.6820 6.56 0.13929
## 11 evaluator_independencePlanning -0.769968 0.403898 -1.9063 6.10 0.10446
## 12 evaluator_independenceDelivery -1.001648 0.355989 -2.8137 4.89 0.03834 *
## 13 male_pct 0.102055 0.148410 0.6877 9.68 0.50782
## 14 white_pct 0.122255 0.141470 0.8642 16.88 0.39961
## 15 average_age 0.006084 0.033387 0.1822 15.79 0.85772
## 16 implementation_qualityPossible problems 0.473789 0.148660 3.1871 22.44 0.00419 **
## 17 implementation_qualityNo apparent problems 0.631842 0.138073 4.5761 28.68 < 0.001 ***
## 18 program_sitemixed 0.328941 0.196848 1.6710 27.47 0.10607
## 19 program_siteschool classroom 0.291952 0.146014 1.9995 42.70 0.05195 .
## 20 program_siteschool, outside of classroom 0.161640 0.171700 0.9414 29.27 0.35420
## 21 duration 0.001270 0.000978 1.2988 31.96 0.20332
## 22 service_hrs -0.000309 0.004828 -0.0641 49.63 0.94915
Note that coef_test
assumed that it should cluster based on studyID
, which is the outer-most random effect in the metafor model. This can also be specified explicitly by including the option cluster = dropoutPrevention$studyID
in the call.
The F-test for degree of evaluator independence uses the same syntax as before:
Wald_test(m3_metafor, constraints = 10:12, vcov = "CR2")
## Test F d.f. p.val
## HTZ 2.71 18.3 0.0753
Despite some differences in weighting schemes, the p-value is very close to the result obtained using robumeta
.