At AERA this past weekend, one of the recurring themes was how software availability (and its usability and default features) influences how people conduct meta-analyses. That got me thinking about the R packages that I’ve developed, how to understand the extent to which people are using them, how they’re being used, and so on. I’ve had badges on my github repos for a while now:
clubSandwich: ARPobservation: scdhlm: SingleCaseES: These statistics come from the METACRAN site, which makes available data on daily downloads of all packages on CRAN (one of the main repositories for sharing R packages).
Last night I attended a joint meetup between the Austin R User Group and R Ladies Austin, which was great fun. The evening featured several lightning talks on a range of topics, from breaking into data science to network visualization to starting your own blog. I gave a talk about sandwich standard errors and my clubSandwich R package. Here are links to some of the talks:
Caitlin Hudon: Getting Plugged into Data Science Claire McWhite: A quick intro to networks Nathaniel Woodward: Blogdown Demo!
A colleague recently asked me about how to apply cluster-robust hypothesis tests and confidence intervals, as calculated with the clubSandwich package, when dealing with multiply imputed datasets. Standard methods (i.e., Rubin’s rules) for pooling estimates from multiple imputed datasets are developed under the assumption that the full-data estimates are approximately normally distributed. However, this might not be reasonable when working with test statistics based on cluster-robust variance estimators, which can be imprecise when the number of clusters is small or the design matrix of predictors is unbalanced in certain ways.
In many systematic reviews, it is common for eligible studies to contribute effect size estimates from not just one, but multiple relevant outcome measures, for a common sample of participants. If those outcomes are correlated, then so too will be the effect size estimates. To estimate the degree of correlation, you would need the sample correlation among the outcomes—information that is woefully uncommon for primary studies to report (and best of luck to you if you try to follow up with author queries).
About one year ago, the nlme package introduced a feature that allowed the user to specify a fixed value for the residual variance in linear mixed effect models fitted with lme(). This feature is interesting to me because, when used with the varFixed() specification for the residual weights, it allows for estimation of a wide variety of meta-analysis models, including basic random effects models, bivariate models for estimating effects by trial arm, and other sorts of multivariate/multi-level random effects models.
In today’s Quant Methods colloquium, I gave an introduction to the logic and purposes of Monte Carlo simulation studies, with examples written in R.
Here are the slides from my presentation. You can find the code that generates the slides here. Here is my presentation on the same topic from a couple of years ago. David Robinson’s blog has a much more in-depth discussion of beta-binomial regression. The data I used is from Lahman’s baseball database.
I have recently been working to ensure that my clubSandwich package works correctly on fitted lme and gls models from the nlme package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of nlme, I noticed a bug in the getVarCov function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted lme or gls model.
Hadley Wickham’s dplyr and tidyr packages completely changed the way I do data manipulation/munging in R. These packages make it possible to write shorter, faster, more legible, easier-to-intepret code to accomplish the sorts of manipulations that you have to do with practically any real-world data analysis. The legibility and interpretability benefits come from
using functions that are simple verbs that do exactly what they say (e.g., filter, summarize, group_by) and chaining multiple operations together, through the pipe operator %>% from the magrittr package.