clubSandwich at the Austin R User Group Meetup

Last night I attended a joint meetup between the Austin R User Group and R Ladies Austin, which was great fun. The evening featured several lightning talks on a range of topics, from breaking into data science to network visualization to starting your own blog. I gave a talk about sandwich standard errors and my clubSandwich R package. Here are links to some of the talks: Caitlin Hudon: Getting Plugged into Data Science Claire McWhite: A quick intro to networks Nathaniel Woodward: Blogdown Demo!

Pooling clubSandwich results across multiple imputations

A colleague recently asked me about how to apply cluster-robust hypothesis tests and confidence intervals, as calculated with the clubSandwich package, when dealing with multiply imputed datasets. Standard methods (i.e., Rubin’s rules) for pooling estimates from multiple imputed datasets are developed under the assumption that the full-data estimates are approximately normally distributed. However, this might not be reasonable when working with test statistics based on cluster-robust variance estimators, which can be imprecise when the number of clusters is small or the design matrix of predictors is unbalanced in certain ways.

Imputing covariance matrices for meta-analysis of correlated effects

In many systematic reviews, it is common for eligible studies to contribute effect size estimates from not just one, but multiple relevant outcome measures, for a common sample of participants. If those outcomes are correlated, then so too will be the effect size estimates. To estimate the degree of correlation, you would need the sample correlation among the outcomes—information that is woefully uncommon for primary studies to report (and best of luck to you if you try to follow up with author queries).

Bug in nlme::lme with fixed sigma and REML estimation

About one year ago, the nlme package introduced a feature that allowed the user to specify a fixed value for the residual variance in linear mixed effect models fitted with lme(). This feature is interesting to me because, when used with the varFixed() specification for the residual weights, it allows for estimation of a wide variety of meta-analysis models, including basic random effects models, bivariate models for estimating effects by trial arm, and other sorts of multivariate/multi-level random effects models.

Simulation studies in R (Fall, 2016 version)

In today’s Quant Methods colloquium, I gave an introduction to the logic and purposes of Monte Carlo simulation studies, with examples written in R. Here are the slides from my presentation. You can find the code that generates the slides here. Here is my presentation on the same topic from a couple of years ago. David Robinson’s blog has a much more in-depth discussion of beta-binomial regression. The data I used is from Lahman’s baseball database.

Bug in nlme::getVarCov

I have recently been working to ensure that my clubSandwich package works correctly on fitted lme and gls models from the nlme package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of nlme, I noticed a bug in the getVarCov function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted lme or gls model.

Assigning after dplyr

Hadley Wickham’s dplyr and tidyr packages completely changed the way I do data manipulation/munging in R. These packages make it possible to write shorter, faster, more legible, easier-to-intepret code to accomplish the sorts of manipulations that you have to do with practically any real-world data analysis. The legibility and interpretability benefits come from using functions that are simple verbs that do exactly what they say (e.g., filter, summarize, group_by) and chaining multiple operations together, through the pipe operator %>% from the magrittr package.


Simulate systematic direct observation data.


cluster-robust variance estimation.

Estimating average effects in regression discontinuities with covariate interactions

Regression discontinuity designs (RDDs) are now a widely used tool for program evaluation in economics and many other fields. RDDs occur in situations where some treatment/program of interest is assigned on the basis of a numerical score (called the running variable), all units scoring above a certain threshold receiving treatment and all units scoring at or below the threshold having treatment withheld (or vice versa, with treatment assigned to units scoring below the threshold).