Bug in nlme::getVarCov
I have recently been working to ensure that my clubSandwich
package works correctly on fitted lme
and gls
models from the nlme
package, which is one of the main R packages for fitting hierarchical linear models. In the course of digging around in the guts of nlme
, I noticed a bug in the getVarCov
function. The purpose of the function is to extract the estimated variance-covariance matrix of the errors from a fitted lme
or gls
model.
It seems that this function is sensitive to the order in which the input data are sorted. This bug report noted the problem, but unfortunately their proposed fix doesn’t seem to solve the problem. In this post I’ll demonstrate the bug and a solution. (I’m posting this here because the R project’s bug reporting system is currently closed to people who were not registered as of early July, evidently due to some sort of spamming problem.)
The issue
Here’s a simple demonstration of the problem. I’ll first fit a gls
model with a heteroskedastic variance function and an AR(1) auto-correlation structure (no need to worry about the substance of the specification—we’re just worried about computation here) and then extract the variances for each of the units.
# Demonstrate the problem with gls model
library(nlme)
data(Ovary)
gls_raw <- gls(follicles ~ sin(2*pi*Time) + cos(2*pi*Time), data = Ovary,
correlation = corAR1(form = ~ 1 | Mare),
weights = varPower())
Mares <- levels(gls_raw$groups)
V_raw <- lapply(Mares, function(g) getVarCov(gls_raw, individual = g))
Now I’ll repeat the process using the same data, but sorted in a different order
Ovary_sorted <- Ovary[with(Ovary, order(Mare, Time)),]
gls_sorted <- update(gls_raw, data = Ovary_sorted)
V_sorted <- lapply(Mares, function(g) getVarCov(gls_sorted, individual = g))
The variance component estimates are essentially equal:
all.equal(gls_raw$modelStruct, gls_sorted$modelStruct)
## [1] TRUE
However, the extracted variance-covariance matrices are not:
all.equal(V_raw, V_sorted)
## [1] "Component 1: Mean relative difference: 0.03256"
## [2] "Component 3: Mean relative difference: 0.05830791"
## [3] "Component 4: Mean relative difference: 0.1142209"
## [4] "Component 5: Mean relative difference: 0.03619692"
## [5] "Component 6: Mean relative difference: 0.09260648"
## [6] "Component 8: Mean relative difference: 0.08650327"
## [7] "Component 9: Mean relative difference: 0.07627162"
## [8] "Component 10: Mean relative difference: 0.018103"
## [9] "Component 11: Mean relative difference: 0.1020658"
Here’s the code of the relevant function:
nlme:::getVarCov.gls
## function (obj, individual = 1, ...)
## {
## S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
## if (!is.null(obj$modelStruct$varStruct)) {
## ind <- obj$groups == individual
## vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
## }
## else vw <- rep(1, nrow(S))
## vars <- (obj$sigma * vw)^2
## result <- t(S * sqrt(vars)) * sqrt(vars)
## class(result) <- c("marginal", "VarCov")
## attr(result, "group.levels") <- names(obj$groups)
## result
## }
## <bytecode: 0x000000001bc39d00>
## <environment: namespace:nlme>
The issue is in the 4th line of the body. getVarCov.gls
assumes that varWeights(obj$modelStruct$varStruct)
is sorted in the same order as obj$groups
, which is not necessarily true. Instead, varWeights
seem to return the weights sorted according to the grouping variable. For this example, that means that the varWeights
will not depend on the order in which the groups are sorted.
identical(gls_raw$groups, gls_sorted$groups)
## [1] FALSE
identical(varWeights(gls_raw$modelStruct$varStruct),
varWeights(gls_sorted$modelStruct$varStruct))
## [1] TRUE
Fix for nlme:::getVarCov.gls
I think this can be solved by either
- putting the
varWeights
back into the same order as the raw data or - sorting
obj$groups
before identifying the rows corresponding to the specifiedindividual
.
Here’s a revised function that takes the second approach:
# proposed patch for getVarCov.gls
getVarCov_revised_gls <- function (obj, individual = 1, ...) {
S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
if (!is.null(obj$modelStruct$varStruct)) {
ind <- sort(obj$groups) == individual
vw <- 1 / varWeights(obj$modelStruct$varStruct)[ind]
}
else vw <- rep(1, nrow(S))
vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)
class(result) <- c("marginal", "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}
Testing that it works correctly:
V_raw <- lapply(Mares, function(g) getVarCov_revised_gls(gls_raw, individual = g))
V_sorted <- lapply(Mares, function(g) getVarCov_revised_gls(gls_sorted, individual = g))
all.equal(V_raw, V_sorted)
## [1] TRUE
Fix for nlme:::getVarCov.lme
The same issue comes up in getVarCov.lme
. Here’s the fix and verification:
# proposed patch for getVarCov.lme
getVarCov_revised_lme <- function (obj, individuals, type = c("random.effects", "conditional", "marginal"), ...) {
type <- match.arg(type)
if (any("nlme" == class(obj)))
stop("not implemented for \"nlme\" objects")
if (length(obj$group) > 1)
stop("not implemented for multiple levels of nesting")
sigma <- obj$sigma
D <- as.matrix(obj$modelStruct$reStruct[[1]]) * sigma^2
if (type == "random.effects") {
result <- D
}
else {
result <- list()
groups <- sort(obj$groups[[1]])
ugroups <- unique(groups)
if (missing(individuals))
individuals <- as.matrix(ugroups)[1, ]
if (is.numeric(individuals))
individuals <- ugroups[individuals]
for (individ in individuals) {
indx <- which(individ == ugroups)
if (!length(indx))
stop(gettextf("individual %s was not used in the fit",
sQuote(individ)), domain = NA)
if (is.na(indx))
stop(gettextf("individual %s was not used in the fit",
sQuote(individ)), domain = NA)
ind <- groups == individ
if (!is.null(obj$modelStruct$corStruct)) {
V <- corMatrix(obj$modelStruct$corStruct)[[as.character(individ)]]
}
else V <- diag(sum(ind))
if (!is.null(obj$modelStruct$varStruct))
sds <- 1/varWeights(obj$modelStruct$varStruct)[ind]
else sds <- rep(1, sum(ind))
sds <- obj$sigma * sds
cond.var <- t(V * sds) * sds
dimnames(cond.var) <- list(1:nrow(cond.var), 1:ncol(cond.var))
if (type == "conditional")
result[[as.character(individ)]] <- cond.var
else {
Z <- model.matrix(obj$modelStruct$reStruc, getData(obj))[ind,
, drop = FALSE]
result[[as.character(individ)]] <- cond.var +
Z %*% D %*% t(Z)
}
}
}
class(result) <- c(type, "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}
lme_raw <- lme(follicles ~ sin(2*pi*Time) + cos(2*pi*Time),
random = ~ 1 | Mare,
correlation = corExp(form = ~ Time),
weights = varPower(),
data=Ovary)
lme_sorted <- update(lme_raw, data = Ovary_sorted)
all.equal(lme_raw$modelStruct, lme_sorted$modelStruct)
## [1] TRUE
# current getVarCov
V_raw <- lapply(Mares, function(g) getVarCov(lme_raw, individual = g, type = "marginal"))
V_sorted <- lapply(Mares, function(g) getVarCov(lme_sorted, individual = g, type = "marginal"))
all.equal(V_raw, V_sorted)
## [1] "Component 1: Component 1: Mean relative difference: 0.003989954"
## [2] "Component 3: Component 1: Mean relative difference: 0.003784181"
## [3] "Component 4: Component 1: Mean relative difference: 0.003028662"
## [4] "Component 5: Component 1: Mean relative difference: 0.0005997944"
## [5] "Component 6: Component 1: Mean relative difference: 0.002350456"
## [6] "Component 7: Component 1: Mean relative difference: 0.007103733"
## [7] "Component 8: Component 1: Mean relative difference: 0.001887638"
## [8] "Component 9: Component 1: Mean relative difference: 0.0009601843"
## [9] "Component 10: Component 1: Mean relative difference: 0.004748783"
## [10] "Component 11: Component 1: Mean relative difference: 0.001521097"
# revised getVarCov
V_raw <- lapply(Mares, function(g) getVarCov_revised_lme(lme_raw, individual = g, type = "marginal"))
V_sorted <- lapply(Mares, function(g) getVarCov_revised_lme(lme_sorted, individual = g, type = "marginal"))
all.equal(V_raw, V_sorted)
## [1] TRUE
Session info
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] nlme_3.1-144
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 bookdown_0.14 lattice_0.20-38 digest_0.6.25
## [5] grid_3.6.3 magrittr_1.5 evaluate_0.14 blogdown_0.18
## [9] rlang_0.4.5 stringi_1.4.3 rmarkdown_2.1 tools_3.6.3
## [13] stringr_1.4.0 xfun_0.12 yaml_2.2.0 compiler_3.6.3
## [17] htmltools_0.4.0 knitr_1.28