distribution theory | James E. Pustejovsky

Distribution of the number of significant effect sizes

A while back, I posted the outline of a problem about the number of significant effect size estimates in a study that reports multiple outcomes. This problem interests me because it connects to the issue of selective reporting of study results, which creates problems for meta-analysis.

Approximating the distribution of cluster-robust Wald statistics

\[ \def\Pr{{\text{Pr}}} \def\E{{\text{E}}} \def\Var{{\text{Var}}} \def\Cov{{\text{Cov}}} \def\cor{{\text{cor}}} \def\bm{\mathbf} \def\bs{\boldsymbol} \] In Tipton and Pustejovsky (2015), we examined several different small-sample approximations for cluster-robust Wald test statistics, which are like \(F\) statistics but based on cluster-robust variance estimators.

Implementing Consul's generalized Poisson distribution in Stan

\[ \def\Pr{{\text{Pr}}} \def\E{{\text{E}}} \def\Var{{\text{Var}}} \def\Cov{{\text{Cov}}} \def\bm{\mathbf} \def\bs{\boldsymbol} \] For a project I am working on, we are using Stan to fit generalized random effects location-scale models to a bunch of count data.

Implementing Efron's double Poisson distribution in Stan

Variance component estimates in meta-analysis with mis-specified sampling correlation

\[ \def\Pr{{\text{Pr}}} \def\E{{\text{E}}} \def\Var{{\text{Var}}} \def\Cov{{\text{Cov}}} \] In a recent paper with Beth Tipton, we proposed new working models for meta-analyses involving dependent effect sizes. The central idea of our approach is to use a working model that captures the main features of the effect size data, such as by allowing for both between- and within-study heterogeneity in the true effect sizes (rather than only between-study heterogeneity).

Implications of mean-variance relationships for standardized mean differences

A question came up on the R-SIG-meta-analysis listserv about whether it was reasonable to use the standardized mean difference metric for synthesizing studies where the outcomes are measured as proportions. I think this is an interesting question because, while the SMD could work perfectly fine as an effect size metric for proportions, there are also other alternatives that could be considered, such as odds ratios or response ratios or raw differences in proportions. Further, there are some situations where the SMD has disadvantages for synthesizing contrasts between proportions. Thus, it's a situation where one has to make a choice about the effect size metric, and where the most common metric (the SMD) might not be the right answer. In this post, I want to provide a bit more detail regarding why I think mean-variance relationships in raw data can signal that the standardized mean differences might be less useful as an effect size metric compared to alternatives.

Standardized mean differences in single-group, repeated measures designs

I received a question from a colleague about computing variances and covariances for standardized mean difference effect sizes from a design involving a single group, measured repeatedly over time.

Finding the distribution of significant effect sizes

In basic meta-analysis, where each study contributes just a single effect size estimate, there has been a lot of work devoted to developing models for selective reporting. Most of these models formulate the selection process as a function of the statistical significance of the effect size estimate; some also allow for the possibility that the precision of the study’s effect influences the probability of selection (i.

Simulating correlated standardized mean differences for meta-analysis

As I’ve discussed in previous posts, meta-analyses in psychology, education, and other areas often include studies that contribute multiple, statistically dependent effect size estimates. I’m interested in methods for meta-analyzing and meta-regressing effect sizes from data structures like this, and studying this sort of thing often entails conducting Monte Carlo simulations.

Sampling variance of Pearson r in a two-level design

Consider Pearson’s correlation coefficient, \(r\), calculated from two variables \(X\) and \(Y\) with population correlation \(\rho\). If one calculates \(r\) from a simple random sample of \(N\) observations, then its sampling variance will be approximately