Epilepsy GLMM example

Adam Howes

Abstract

Background Rue, Martino, and Chopin (2009) use the epilespy example from the OpenBUGs manual as demonstration of the integrated nested Laplace approximation method. The model is a Poisson GLMM including both fixed and random effects.

Task We implement the epilepsy example in Stan, R-INLA, TMB, glmmTMB, tmbstan, aghq, and a custom Laplace marginal version of aghq. We then compare the results of the inference methods. This serves to compare inference methods, and check new methods are behaving as expected.

knitr::opts_chunk$set(
  cache = TRUE,
  autodep = TRUE,
  cache.lazy = FALSE,
  cache.comments = FALSE
)
options(scipen = 999)
cbpalette <- multi.utils::cbpalette()
library(tidyverse)

rstan::rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

1 Background

1.1 Model description

Section 5.2. of Rue, Martino, and Chopin (2009) demonstrates the application of R-INLA to a generalised linear mixed model. The data, previously included as an example in the OpenBUGs manual, concern the results of a clinical trial for a new epilepsy drug. Patients $i = 1, \ldots, 59$ are each either assigned the new drug $\texttt{Trt}_i = 1$ or placebo $\texttt{Trt}_i = 0$. Each patient makes four visits the clinic $j = 1, \ldots, 4$. The observations $y_{ij}$ give the number of seizures of the $i$th person in the two weeks preceeding their $j$th visit to the clinic. The covariates are age $\texttt{Age}_i$, baseline seizure counts $\texttt{Base}_i$ and an indicator for the final clinic visit $\texttt{V}_4$, which are all centered.

The observations are modeleed using a Poisson distribution $y_{ij} \sim \text{Poisson}(e^{\eta_{ij}})$ with linear predictor \[\begin{align*} \eta_{ij} &= \beta_0 + \beta_\texttt{Base} \log(\texttt{Baseline}_j / 4) + \beta_\texttt{Trt} \texttt{Trt}_i + \beta_{\texttt{Trt} \times \texttt{Base}} \texttt{Trt}_i \times \log(\texttt{Baseline}_j / 4) \\ &+ \beta_\texttt{Age} \log(\texttt{Age}_i) + \beta_{\texttt{V}_4} {\texttt{V}_4}_j + \epsilon_i + \nu_{ij}, \quad i=1:59, \quad j=1:4, \end{align*}\] where the prior on each of the regression parameters, including the intercept, is $\mathcal{N}(0, 100^2)$. The random effects are IID $\epsilon_i \sim \mathcal{N}(0, 1/\tau_\epsilon)$ and $\nu_{ij} \sim \mathcal{N}(0, 1/\tau_\nu)$ with precision priors $\tau_\epsilon, \tau_\nu \sim \Gamma(0.001, 0.001)$.

1.2 Data preparation

The data is available within R-INLA as follows:

data(Epil, package = "INLA")
head(Epil)

Epil %>% ggplot(aes(x =  as.factor(Trt), y = y)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  theme_minimal() +
  labs(x = "Treatment", y = "Number of seizures")

It looks like treatment is associated to fewer seizures on average, but of course we haven't accounted for the different covariates, hence the use of a statistical model!

Figure 1.1: It looks like treatment is associated to fewer seizures on average, but of course we haven’t accounted for the different covariates, hence the use of a statistical model!

We modify the Epil dataframe by centering all of the covariates:

center <- function(x) (x - mean(x))

Epil <- Epil %>%
  mutate(CTrt    = center(Trt),
         ClBase4 = center(log(Base/4)),
         CV4     = center(V4),
         ClAge   = center(log(Age)),
         CBT     = center(Trt * log(Base/4)))

We will implement the model based upon a response variable of length $N \times J$, where $N$ is the number of patients and $J$ is the number of visits to the clinic, and a model matrix with $K$ predictors (including the intercept term):

N <- 59
J <- 4
K <- 6
X <- model.matrix(formula(~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT), data = Epil)
y <- Epil$y

For the individual specific random effect $\epsilon_i$ we use a transformation matrix E which repeats elements of vector of length N each J times.

make_epsilon_matrix <- function(N, J) {
  t(outer(1:N, 1:(N * J), function(r, c) as.numeric((J*(r - 1) < c) & (c <= J*r))))
}

For example, with $N = 3$ and $J = 2$:

E <- make_epsilon_matrix(N = 3, J = 2)
E

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    1    0    0
## [3,]    0    1    0
## [4,]    0    1    0
## [5,]    0    0    1
## [6,]    0    0    1

t(E %*% c(1, 2, 3)) # Same as rep(1:3, 2)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    2    2    3    3

Multiplying this matrix $E$ by the vector $\epsilon$ allows it to be directly added to the linear predictor $\eta = \beta X + \nu + E \epsilon$, where $\beta$ is the vector of coefficients and $X$ is the model matrix.

dat <- list(N = N, J = J, K = K, X = X, y = y, E = make_epsilon_matrix(N, J))

2 Inference methods

We consider a range of inference methods, described below.

2.1 Stan

First, Hamiltonian Monte Carlo (HMC) via the No-U-Turn Sampler (NUTS) algorithm, with four Markov chains of iter = 1000 samples, and warmup = 100 burn-in. Markov chain Monte Carlo (MCMC) like NUTS are exact as the number of samples tends to infinity. We provide the stan function with file = "epil.stan" which refers to a C++ file implementing the log-posterior in the Stan probabilistic programming language:

start_fit1 <- Sys.time()

fit1 <- stan(
  file = "epil.stan",
  data = dat,
  chains = 4,
  warmup = 100,
  iter = 1000,
  control = list(adapt_delta = 0.95),
  refresh = 0
)

end_fit1 <- Sys.time()
time_fit1 <- end_fit1 - start_fit1
time_fit1

## Time difference of 19.43963 secs

It took 19.44 seconds to fit this model. All of the $\hat R$ values are below 1.05, so it looks like the MCMC has reached convergence:

monitor1 <- rstan::monitor(fit1, print = FALSE)
  
monitor1 %>%
  as.data.frame() %>%
  tibble::rownames_to_column(var = "parameter") %>%
  ggplot(aes(x = parameter, y = Rhat)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 1.05, linetype = "dashed") +
  labs(x = "Parameter")  +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    legend.position = "bottom"
  )

2.2 `R-INLA`

R-INLA is an R package which implements integrated nested Laplace approximations. Rather than specifying the log-posterior via C++, the package uses a formula interface. This makes it easy to specify common models, but does limit the flexibility of the package for more advanced users.

We specify the $\Gamma(0.001, 0.001)$ prior on the random effects $\tau_\epsilon$ and $\tau_\nu$ below. In R-INLA, the precision is internally represented as log-precision, therefore we must set a loggamma prior.

tau_prior <- list(prec = list(
  prior = "loggamma",
  param = c(0.001, 0.001),
  initial = 1,
  fixed = FALSE)
)

The variable Epil$rand is gives the row number for each entry and the variable Epil$Ind gives the patient number. These variables can be used to define the IID random effects $\nu_{ij}$ and $\epsilon_i$ in R-INLA as follows. The usual R formula notation (e.g. y ~ x1 + x2) is used for the rest of the linear predictor.

formula <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT +
  f(rand, model = "iid", hyper = tau_prior) +  #' Nu random effect
  f(Ind,  model = "iid", hyper = tau_prior)    #' Epsilon random effect

We can use various settings ("gaussian", "simplified.laplace" and "laplace") for the strategy passed to control.inla, which controls the approximation used for the latent field posterior marginals. We create a function epil_inla which takes as input the latent field approximation strategy, as well as the hyperparameter quadrature approach:

epil_inla <- function(strat, int_strat) {
  inla(
    formula,
    control.fixed = list(mean = 0, prec = 1 / 100^2), #' Beta prior
    family = "poisson",
    data = Epil,
    control.inla = list(strategy = strat, int.strategy = int_strat),
    control.predictor = list(compute = TRUE)
  )
}

start_fit2 <- Sys.time()
fit2 <- epil_inla(strat = "gaussian", int_strat = "eb")
end_fit2 <- Sys.time()
(time_fit2 <- end_fit2 - start_fit2)

## Time difference of 4.481125 secs

start_fit3 <- Sys.time()
fit3 <- epil_inla(strat = "simplified.laplace", int_strat = "eb")
end_fit3 <- Sys.time()
(time_fit3 <- end_fit3 - start_fit3)

## Time difference of 4.336194 secs

start_fit4 <- Sys.time()
fit4 <- epil_inla(strat = "laplace", int_strat = "eb")
end_fit4 <- Sys.time()
(time_fit4 <- end_fit4 - start_fit4)

## Time difference of 4.49743 secs

start_fit5 <- Sys.time()
fit5 <- epil_inla(strat = "gaussian", int_strat = "grid")
end_fit5 <- Sys.time()
(time_fit5 <- end_fit5 - start_fit5)

## Time difference of 4.40382 secs

start_fit6 <- Sys.time()
fit6 <- epil_inla(strat = "simplified.laplace", int_strat = "grid")
end_fit6 <- Sys.time()
(time_fit6 <- end_fit6 - start_fit6)

## Time difference of 4.440583 secs

start_fit7 <- Sys.time()
fit7 <- epil_inla(strat = "laplace", int_strat = "grid")
end_fit7 <- Sys.time()
(time_fit7 <- end_fit7 - start_fit7)

## Time difference of 6.758156 secs

2.3 `TMB`

Template Model Builder, or TMB, is an R package which implements the Laplace approximation using automatic differentiation. The user writes a C++ file, here epil.cpp (see appendix), which contains the negative log-posterior for the model in question:

compile("epil.cpp")

## [1] 0

dyn.load(dynlib("epil"))

We need to specify the initialisation parameters:

param <- list(
  beta = rep(0, K),
  epsilon = rep(0, N),
  nu = rep(0, N * J),
  l_tau_epsilon = 0,
  l_tau_nu = 0
)

We integrate the random effects, random, out with a Laplace approximation. Here these corresponds to c("beta", "epsilon", "nu"):

#' Note: Perhaps you could argue about when the "start" and "end" of the model fitting are
#' For example, is compilation of the TMB template part of the model fitting?
start_fit8 <- Sys.time()

obj <- MakeADFun(
  data = dat,
  parameters = param,
  random = c("beta", "epsilon", "nu"),
  DLL = "epil",
  silent = TRUE
)

The objective function obj$fn and its gradient obj$gn are a function of only the parameters, which in this instance are the logarithms of $\tau_\epsilon$ and $\tau_\nu$. This can be checked with names(obj$par). We optimise obj using 1000 iterations of the the nlminb optimiser, passing in the starting values start, objective function objective and its derivative gradient:

its <- 1000 #' May converge before this

opt <- nlminb(
  start = obj$par,
  objective = obj$fn,
  gradient = obj$gr,
  control = list(iter.max = its, trace = 0)
)

sd_out <- sdreport(
  obj,
  par.fixed = opt$par,
  getJointPrecision = TRUE
)

end_fit8 <- Sys.time()
(time_fit8 <- end_fit8 - start_fit8)

## Time difference of 0.1567409 secs

2.3.1 Check `TMB` objective is the same as Stan

For completeness, we verify that the objective functions as implemented in TMB and Stan are identical. This is actually quite difficult to do, so pat it’s worth checking, and giving yourself a pat on the back if they are. To obtain the TMB negative log-posterior we call MakeADFun as before, but now do not specify any parameters to be integrated out. For Stan, we create an empty model, then use rstan::log_prob:

tmb_nll <- MakeADFun(data = dat, parameters = param, DLL = "epil")
stan_nll <- stan("epil.stan", data = dat, chains = 0)

## the number of chains is less than 1; sampling not done

We can test the NLL of the initialisation parameters param:

c("TMB" = tmb_nll$fn(unlist(param)), "Stan" = -rstan::log_prob(object = stan_nll, unlist(param)))

##      TMB     Stan 
## 4365.854  236.002

But these look different! This is just because the two functions are different by a constant, which we can show by evaluating the NLL at some of the other parameters explored during the MCMC:

pars_mat <- as.matrix(fit1)
pars_list <- apply(pars_mat, 1, function(x) relist(flesh = x, skeleton = fit1@inits[[1]]))
#' Note that Stan transformed constrained parameters in the parameters block to unconsrained parameters that can be sampled from
upars_list <- lapply(pars_list, function(x) rstan:::unconstrain_pars(fit1, x))

Using just a few of them, we can see that the difference between the TMB and Stan objectives is a constant:

test_pars <- upars_list[500:510]
tmb_evals <- sapply(test_pars, tmb_nll$fn)
stan_evals <- -sapply(test_pars, FUN = rstan::log_prob, object = stan_nll)

data.frame(
  "TMB" = tmb_evals,
  "Stan" = stan_evals
) %>%
  mutate(Difference = TMB - Stan) %>%
  pull(Difference)

##  [1] 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852

Note: When there are errors in the C++ template code, usually to do with indexing (unlike Stan, in TMB there is no requirement when defining variables to give them dimensions), calling the MakeADFun function tends to crash the R session. A workaround (courtesy of Kinh) for debugging without crashing the working R session is to use the following (which creates new R sessions which crash in preference to the working R session):

library(parallel)

testrun <- mcparallel({MakeADFun(data = dat,
                                 parameters = param,
                                 DLL = "epil")})

obj <- mccollect(testrun, wait = TRUE, timeout = 0, intermediate = FALSE)

2.4 `glmmTMB`

glmmTMB is an R package written by Ben Bolker and collaborators which allows fitting generalised linear mixed models in TMB without writing the C++ code manually as we have in epil.cpp. For example, rather than writing epil.cpp, we could have called the following:

start_fit9 <- Sys.time()
formula9 <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + (1 | rand) + (1 | Ind)
fit9 <- glmmTMB(formula9, data = Epil, family = poisson(link = "log"))
end_fit9 <- Sys.time()
(time_fit9 <- end_fit9 - start_fit9)

## Time difference of 0.8262661 secs

This won’t be exactly the same model, in terms of priors and so on, but it may be close. We mention this package here in case it is of interest, but will not focus on it during the remainder of this notebook, as we would like to compare inferential methods on exactly the same model.

2.5 `tmbstan`

tmbstan (Cole Monnahan and Kasper Kristensen) is another helpful TMB package which allows you to pass the same C++ template you use in TMB to Stan in order to perform NUTS (if you have standard C++ code then this can also likely be done using stanc).

start_fit10 <- Sys.time()
fit10 <- tmbstan(obj = obj, chains = 4, refresh = 0)

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess

end_fit10 <- Sys.time()
(time_fit10 <- end_fit10 - start_fit10)

## Time difference of 59.5346 secs

The results from tmbstan, up to stochasticity of the MCMC sampler, should be exactly the same as those we might generate from Stan – given that we have confirmed the objective functions they are sampling from are identical up to a constant. For this reason, we do not need to focus on these results in the remainder of the notebook, but again highlight this package in case it is of interest.

2.6 `aghq`

aghq is an R package written by Alex Stringer which implements adaptive Gauss-Hermite quadrature. It also works smoothly with TMB templates. The parameter k = 3 determines the number of quadrature points per dimension:

start_fit11 <- Sys.time()
fit11 <- aghq::marginal_laplace_tmb(
  obj,
  k = 3,
  startingvalue = c(param$l_tau_epsilon, param$l_tau_nu)
)

fit11_samples <- aghq::sample_marginal(fit11, M = 1000)$samps %>%
  t() %>%
  as.data.frame() %>%
  inf.utils::replace_duplicate_colnames()
end_fit11 <- Sys.time()
(time_fit11 <- end_fit11 - start_fit11)

Another (non-standard) approach that is possible is to use glmmTMB to get the TMB template which can then be passed to aghq.

glmm_model_info <- glmmTMB(formula9, data = Epil, family = poisson(link = "log"), doFit = FALSE)

glmm_ff <- with(glmm_model_info, {
  TMB::MakeADFun(
    data = data.tmb,
    parameters = parameters,
    random = names(parameters)[grep("theta", names(parameters), invert = TRUE)],
    DLL = "glmmTMB",
    silent = TRUE
  )
})

glmm_quad <- aghq::marginal_laplace_tmb(glmm_ff, k = 3, startingvalue = glmm_ff$par)

This could be useful in situations where you don’t have a TMB template written.

2.7 Custom Laplace method with `aghq`

Ok, this next section is going to be a bit more involved.

2.7.1 Empirical Bayes with multiple C++ template

Let’s just starting by trying to get the Laplace marginals for $\beta$. To do this, we create the TMB template epil_beta_index.cpp (see appendix) which has the beta parameter replaced by beta_i and beta_minus_i. As well, there is an additional DATA_INTEGER(i) allowing us to pass in the index of i that we would like to obtain a full Laplace approximation to.

compile("epil_beta_index.cpp")

## [1] 0

dyn.load(dynlib("epil_beta_index"))

We have already obtained the MAP estimate $\hat \theta$ in the TMB section above that we can use in our empirical Bayes procedure.

opt_theta <- opt

We create a function to get the Laplace marginal for the $i$th element of $\beta$. Some key things to note about this:

We use the map argument to fix the hyperparameters to their MAP estimates, which requires passing factor(NA) to map, as well as setting the initialisation parameters for the hyperparameters to be their MAP estimates.
Setting random = c("beta_minus_i", "nu", "epsilon") integrates out all of the random effects but beta_i.
Quadrature is performed using aghq::aghq to obtain a the normalised posterior marginal.

xi_laplace_marginal_beta <- function(i, opt_theta, finegrid) {
  dat$i <- i
  
  theta_names <- unique(names(opt_theta$par))
  map_fixed_theta <- list()

  param_fixed_theta <- param

  for(theta in theta_names) {
    map_fixed_theta[[theta]] <- rep(factor(NA), length(param[[theta]]))
    param_fixed_theta[[theta]] <- opt_theta$par[names(opt_theta$par) == theta]
  }

  #' Prepare the initialisation parameters
  param_fixed_theta[["beta_i"]] <- param_fixed_theta$beta[i]
  param_fixed_theta[["beta_minus_i"]] <- param_fixed_theta$beta[-i]
  param_fixed_theta[["beta"]] <- NULL

  obj_fixed_theta <- MakeADFun(
    data = dat,
    parameters = param_fixed_theta,
    random = c("beta_minus_i", "nu", "epsilon"),
    DLL = "epil_beta_index",
    silent = TRUE,
    map = map_fixed_theta
  )

  quad <- aghq::aghq(
    ff = obj_fixed_theta,
    k = 3,
    startingvalue = 0,
    control = aghq::default_control_tmb()
  )

  pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad, finegrid = fine_grid)[[1]]
  
  return(pdf_and_cdf)
}

Now we can compute the Laplace marginal for all 6 indices of $\beta$.

fine_grid <- seq(-5, 5, length.out = 500)

laplace_beta <- lapply(1:dat$K, xi_laplace_marginal_beta, opt_theta = opt_theta, finegrid = fine_grid) %>%
  bind_rows(.id = "index") %>%
  mutate(index = as.numeric(index))

Here are what the marginals look like:

laplace_beta %>%
  ggplot(aes(x = theta, y = pdf)) +
  geom_line() +
  facet_wrap(~index) +
  theme_minimal() +
  labs(x = "beta", y = "Posterior")

2.7.2 Empirical Bayes with one C++ template

Now that we have the Laplace marginals for beta, we can think about what we would do if we wanted to get them for all of the elements of the latent field. One approach would be to create separate TMB templates for each named random effect (beta, nu and epsilon). To see what this would look like, have a look at epil_beta_index.cpp, epil_nu_index.cpp and epil_epsilon_index.cpp.

What might be better is to do it in one TMB template, epil_index.cpp (see appendix):

compile("epil_index.cpp")

## [1] 0

dyn.load(dynlib("epil_index"))

We’re going to use a data structure to pass to epil_index. A toggle value of 0 corresponds to beta, a toggle value of 1 corresponds to nu, and a toggle value of 2 corresponds to epsilon:

index_controller <- data.frame(
  "toggle" = c(rep(0, dat$K), rep(1, dat$N * dat$J),  rep(2, dat$N)),
  "i" = c(1:dat$K, 1:(dat$N * dat$J), 1:dat$N)
)

Create function which takes as input the index i (from index_controller), the results of optimisation for $\theta$, and a grid of values for $x_i$ to evaluate at:

xi_laplace_marginal <- function(i, opt_theta, finegrid) {
  dat$toggle <- index_controller[i, "toggle"]
  dat$i <- index_controller[i, "i"]

  theta_names <- unique(names(opt_theta$par))
  map_fixed_theta <- list()
  param_fixed_theta <- param

  for(theta in theta_names) {
    map_fixed_theta[[theta]] <- rep(factor(NA), length(param[[theta]]))
    param_fixed_theta[[theta]] <- opt_theta$par[names(opt_theta$par) == theta]
  }

  #' Prepare the initialisation parameters if it's beta that we're toggling
  #' We fix the beta parameter to zero, as we're using beta_i and beta_minus_i
  if(dat$toggle == 0) {
    param_fixed_theta[["beta_i"]] <- param_fixed_theta$beta[i]
    param_fixed_theta[["beta_minus_i"]] <- param_fixed_theta$beta[-i]
    map_fixed_theta[["beta"]] <- rep(factor(NA), length(param[["beta"]]))
  }

  #' Or if it's epsilon that we're toggling
  #' We fix the epsilon parameter to zero, as we're using epsilon_i and epsilon_minus_i
  if(dat$toggle == 1) {
    param_fixed_theta[["nu_i"]] <- param_fixed_theta$nu[i]
    param_fixed_theta[["nu_minus_i"]] <- param_fixed_theta$nu[-i]
    map_fixed_theta[["nu"]] <- rep(factor(NA), length(param[["nu"]]))
  }
  
  #' Or if it's epsilon that we're toggling
  #' We fix the epsilon parameter to zero, as we're using epsilon_i and epsilon_minus_i
  if(dat$toggle == 2) {
    param_fixed_theta[["epsilon_i"]] <- param_fixed_theta$epsilon[i]
    param_fixed_theta[["epsilon_minus_i"]] <- param_fixed_theta$epsilon[-i]
    map_fixed_theta[["epsilon"]] <- rep(factor(NA), length(param[["epsilon"]]))
  }
  
  get_random <- function(dat) {
    if(dat$toggle == 0) return(c("beta_minus_i", "nu", "epsilon"))
    if(dat$toggle == 1) return(c("beta", "nu_minus_i", "epsilon"))
    if(dat$toggle == 2) return(c("beta", "nu", "epsilon_minus_i"))
  }
  
  obj_fixed_theta <- MakeADFun(
   data = dat,
   parameters = param_fixed_theta,
   random = get_random(dat),
   DLL = "epil_index",
   silent = TRUE,
   map = map_fixed_theta
  )

  quad <- aghq::aghq(
    ff = obj_fixed_theta,
    k = 3,
    startingvalue = 0,
    control = aghq::default_control_tmb()
  )

  pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad, finegrid = fine_grid)[[1]]
  
  return(list("pdf_and_cdf" = pdf_and_cdf, "quad" = quad))
}

Now we can compute the Laplace marginal for all 301 random effects of $(\beta, \nu, \epsilon)$.

fine_grid <- seq(-5, 5, length.out = 500)

#' It's possible to get errors with the initial value not being finite
safe_xi_laplace_marginal <- function(...) {
  return(tryCatch(xi_laplace_marginal(...), error = function(e) NULL))
}

laplace_x <- lapply(1:nrow(index_controller), safe_xi_laplace_marginal, opt_theta = opt_theta, finegrid = fine_grid)

laplace_beta_summary <- lapply(head(laplace_x, n = dat$K), function(result) summary(result$quad)$summarytable) %>%
  bind_rows()

rownames(laplace_beta_summary) <- paste0("beta[", 1:dat$K, "]")

3 Comparison

3.1 Table

Let’s look at the mean and standard deviation of each of the $\beta$ parameters:

stan1 <- as.vector(t(summary(fit1)$summary[1:6, c(1, 3)]))
inla2 <- as.vector(t(fit2$summary.fixed[1:6, 1:2]))
inla3 <- as.vector(t(fit3$summary.fixed[1:6, 1:2]))
inla4 <- as.vector(t(fit4$summary.fixed[1:6, 1:2]))
inla5 <- as.vector(t(fit5$summary.fixed[1:6, 1:2]))
inla6 <- as.vector(t(fit6$summary.fixed[1:6, 1:2]))
inla7 <- as.vector(t(fit7$summary.fixed[1:6, 1:2]))
tmb8 <- as.vector(t(data.frame(sd_out$par.random[1:6], sqrt(sd_out$diag.cov.random[1:6]))))
glmmtmb9 <- as.vector(t(summary(fit9)$coefficients$cond[, c("Estimate", "Std. Error")]))
tmbstan10 <- as.vector(t(summary(fit10)$summary[1:6, c(1, 3)]))

fit11_beta_samples <- fit11_samples[, 1:6]
aghq11 <- as.vector(t(data.frame(mean = apply(fit11_beta_samples, 2, mean), sd = apply(fit11_beta_samples, 2, sd))))

adam12 <- as.vector(t(laplace_beta_summary[1:6, 1:2]))

df <- cbind(stan1, inla2, inla3, inla4, inla5, inla6, inla7, tmb8, glmmtmb9, tmbstan10, aghq11, adam12) %>%
  as.data.frame() %>%
  mutate(type = gl(2, 1, 12, labels = c("Mean", "SD")))

beta_i <- function(i) { c(paste0("beta_", i), paste0("sd(beta_", i, ")")) }
rownames(df) <- c(sapply(0:5, beta_i))
method_names <- c("Stan", "EB-G", "EB-SL", "EB-L", "Grid-G", "Grid-SL", "Grid-L", "TMB", "glmmTMB", "tmbstan", "aghq", "adam")
colnames(df) <- c(method_names, "type")

df %>%
  select(-type) %>%
  kableExtra::kable(digits = 3)

	Stan	EB-G	EB-SL	EB-L	Grid-G	Grid-SL	Grid-L	TMB	glmmTMB	tmbstan	aghq	adam
beta_0	1.572	1.627	1.574	1.574	1.626	1.573	1.573	1.627	1.579	1.572	1.626	1.573
sd(beta_0)	0.078	0.075	0.075	0.075	0.078	0.078	0.078	0.076	0.073	0.078	0.075	0.076
beta_1	-0.955	-0.926	-0.953	-0.954	-0.927	-0.954	-0.956	-0.926	-0.949	-0.953	-0.937	-0.955
sd(beta_1)	0.414	0.409	0.409	0.408	0.419	0.419	0.419	0.414	0.396	0.426	0.427	0.415
beta_2	0.881	0.858	0.880	0.880	0.859	0.880	0.881	0.858	0.880	0.883	0.856	0.880
sd(beta_2)	0.139	0.134	0.134	0.134	0.138	0.138	0.138	0.136	0.129	0.139	0.141	0.136
beta_3	-0.100	-0.100	-0.103	-0.103	-0.101	-0.103	-0.104	-0.101	-0.103	-0.104	-0.096	-0.103
sd(beta_3)	0.088	0.086	0.086	0.086	0.086	0.086	0.086	0.086	0.086	0.087	0.085	0.086
beta_4	0.483	0.472	0.486	0.486	0.471	0.484	0.484	0.470	0.490	0.492	0.469	0.484
sd(beta_4)	0.376	0.355	0.355	0.355	0.365	0.365	0.365	0.360	0.342	0.357	0.371	0.360
beta_5	0.349	0.340	0.350	0.350	0.340	0.350	0.351	0.340	0.349	0.348	0.345	0.350
sd(beta_5)	0.211	0.208	0.208	0.208	0.214	0.214	0.213	0.210	0.200	0.214	0.219	0.211

saveRDS(df, "comparison-results.rds")

time_df <- data.frame(
  time = c(time_fit1, time_fit2, time_fit3, time_fit4, time_fit5, time_fit6, time_fit7, time_fit8, time_fit9, time_fit10, time_fit11),
  method = head(method_names, n = 11)
)

time_df %>% ggplot(aes(x = method, y = time)) +
  geom_col() +
  theme_minimal() +
  labs(x = "Method", y = "Time taken")

## Don't know how to automatically pick scale for object of type <difftime>. Defaulting to continuous.

3.2 Focus on $\beta_0$

#' Stan histogram base-plot
plot_pdf_stan <- ggplot(data.frame(x = rstan::extract(fit1, pars = "beta[1]")[[1]]), aes(x = x)) +
    geom_histogram(aes(y = ..density..), alpha = 0.6, fill = cbpalette[7], bins = 40) +
    theme_minimal() +
    labs(x = "beta[1]", y = "Posterior PDF")

beta0_inla_marginals <- bind_rows(
  data.frame(fit2$marginals.fixed$`(Intercept)`) %>%
    mutate(method = "Gaussian"),
  data.frame(fit4$marginals.fixed$`(Intercept)`) %>%
    mutate(method = "Laplace")
)

#' Add R-INLA results
plot_pdf_stan_inla <- plot_pdf_stan +
  geom_line(data = beta0_inla_marginals, aes(x = x, y = y, col = method)) +
  scale_color_manual(values = cbpalette) +
  labs(col = "Method", title = "Inference from R-INLA") 

plot_pdf_stan_aghq <- plot_pdf_stan +
  geom_histogram(data = data.frame(x = fit11_beta_samples$`beta[1]`), aes(y = ..density..), alpha = 0.6, fill = cbpalette[1], bins = 40)

plot_pdf_stan_aghq <- plot_pdf_stan_aghq +
    geom_line(data = laplace_x[[1]]$pdf_and_cdf, aes(x = theta, y = pdf), col = cbpalette[2]) +
    lims(x = c(1.2, 2)) +
    labs(title = "Inference from aghq and custom Laplace method") +
    theme_minimal()

plot_intercept_comparison <- plot_pdf_stan_inla / plot_pdf_stan_aghq

plot_intercept_comparison

## Warning: Removed 460 rows containing missing values (`geom_line()`).

ggsave("intercept-comparison.pdf", plot = plot_intercept_comparison, h = 6, w = 6.25)

## Warning: Removed 460 rows containing missing values (`geom_line()`).

3.3 Scatter plot against `tmbstan`

Assuming tmbstan to be the ground truth, we can do two dimensional plots to check the fit

ggplot(df) +
  geom_point(aes(x = tmbstan, y = TMB, col = type)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", alpha = 0.5) +
  theme_minimal()

3.4 Symmetric KL divergence

We use symmetric KL. From the Google group: “… its the symmetric version between the posterior marginal computed using Gaussian approximation and the one use one of the Laplace based ones.”. For "gaussian" the kld value is small but non-zero, why?

head(fit2$summary.random$rand)$kld

## [1] 0 0 0 0 0 0

For laplace and "simplified.laplace" we can look for the random effect with the highest SKLD:

skld <- function(fit) {
  #' Nu random effect
  id_rand <- which.max(fit$summary.random$rand$kld)
  print(fit$summary.random$rand[id_rand, ])
  #' Epsilon random effect
  id_Ind <- which.max(fit$summary.random$Ind$kld)
  print(fit$summary.random$Ind[id_Ind, ])
  return(list(id_rand = id_rand, id_Ind = id_Ind))
}

skld3 <- skld(fit3)

##      ID     mean        sd 0.025quant  0.5quant 0.975quant mode         kld
## 222 222 0.645863 0.2480358  0.1561084 0.6472347   1.129674   NA 0.001331823
##    ID      mean        sd 0.025quant  0.5quant 0.975quant mode         kld
## 56 56 0.9044046 0.2309653  0.4488055 0.9055538   1.355366   NA 0.006909513

skld4 <- skld(fit4)

##      ID      mean        sd 0.025quant  0.5quant 0.975quant mode          kld
## 222 222 0.6429203 0.2476023  0.1546926 0.6439011   1.125762   NA 0.0007942658
##    ID      mean        sd 0.025quant 0.5quant 0.975quant mode         kld
## 56 56 0.9019214 0.2300221  0.4486929   0.9027   1.350875   NA 0.005744471

plot_marginals <- function(random_effect, index) {
  marginal2 <- fit2$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
    as.data.frame() %>%
    mutate(method = "G")

  marginal3 <- fit3$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
    as.data.frame() %>%
    mutate(method = "SL")

  marginal4 <- fit4$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
    as.data.frame() %>%
    mutate(method = "L")

  marginals <- bind_rows(marginal2, marginal3, marginal4)

  ggplot(marginals, aes(x = x, y = y, group = method, col = method)) +
    geom_line() +
    scale_color_manual(values = cbpalette) +
    theme_minimal()
}

plot_marginals("rand", index = skld3$id_rand)

plot_marginals("Ind", index = skld3$id_Ind)

4 Appendix

4.1 `epil.cpp`

// epil.cpp

#include <TMB.hpp>

template <class Type>
  Type objective_function<Type>::operator()()
{
  // Data block
  DATA_INTEGER(N);
  DATA_INTEGER(J);
  DATA_INTEGER(K);
  DATA_MATRIX(X);
  DATA_VECTOR(y);
  DATA_MATRIX(E); // Epsilon matrix
  
  // Parameter block
  PARAMETER_VECTOR(beta);
  PARAMETER_VECTOR(epsilon);
  PARAMETER_VECTOR(nu);
  PARAMETER(l_tau_epsilon);
  PARAMETER(l_tau_nu);
  
  // Transformed parameters block
  Type tau_epsilon = exp(l_tau_epsilon);
  Type tau_nu = exp(l_tau_nu);
  Type sigma_epsilon = sqrt(1 / tau_epsilon);
  Type sigma_nu = sqrt(1 / tau_nu);
  vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
  vector<Type> lambda(exp(eta));
  
  // Initialise negative log-likelihood
  Type nll;
  nll = Type(0.0);
  
  // Priors
  // Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
  nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
  nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
  nll -= dnorm(beta, Type(0), Type(100), true).sum();
  
  // Likelihood
  nll -= dpois(y, lambda, true).sum();
  
  // ADREPORT
  ADREPORT(tau_epsilon);
  ADREPORT(tau_nu);
  
  return(nll);
}

4.2 `epil.stan`

// epil.stan

data {
  int<lower=0> N;              // Number of patients
  int<lower=0> J;              // Number of clinic visits
  int<lower=0> K;              // Number of predictors (inc. intercept)
  matrix[N * J, K] X;          // Design matrix
  int<lower=0> y[N * J];       // Outcome variable
  matrix[N * J, N] E;          // Epsilon matrix
}

parameters {
  vector[K] beta;              // Vector of coefficients
  vector[N] epsilon;           // Patient specific errors
  vector[N * J] nu;            // Patient-visit errors
  real<lower=0> tau_epsilon;   // Precision of epsilon
  real<lower=0> tau_nu;        // Precision of nu
}

transformed parameters {
  vector[N * J] eta = X * beta + nu + E * epsilon;  // Linear predictor
}

model {
  beta ~ normal(0, 100);
  tau_epsilon ~ gamma(0.001, 0.001);
  tau_nu ~ gamma(0.001, 0.001);
  epsilon ~ normal(0, sqrt(1 / tau_epsilon));
  nu ~ normal(0, sqrt(1 / tau_nu));
  y ~ poisson_log(eta);
}

4.3 `epil_beta_index.cpp`

// epil_nu_index.cpp

#include <TMB.hpp>

template <class Type>
Type objective_function<Type>::operator()()
{
  // Data block
  DATA_INTEGER(i); // Index i
  DATA_INTEGER(N);
  DATA_INTEGER(J);
  DATA_INTEGER(K);
  DATA_MATRIX(X);
  DATA_VECTOR(y);
  DATA_MATRIX(E); // Epsilon matrix

  // Parameter block
  PARAMETER_VECTOR(nu);
  PARAMETER_VECTOR(epsilon);

  PARAMETER(beta_i);
  PARAMETER_VECTOR(beta_minus_i);

  vector<Type> beta(K);
  int k = 0;
  for (int j = 0; j < K; j++) {
    if (j + 1 == i) { // +1 because C++ does zero-indexing
      beta(j) = beta_i;
    } else {
      beta(j) = beta_minus_i(k);
      k++;
    }
  }

  PARAMETER(l_tau_epsilon);
  PARAMETER(l_tau_nu);

  // Transformed parameters block
  Type tau_epsilon = exp(l_tau_epsilon);
  Type tau_nu = exp(l_tau_nu);
  Type sigma_epsilon = sqrt(1 / tau_epsilon);
  Type sigma_nu = sqrt(1 / tau_nu);
  vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
  vector<Type> lambda(exp(eta));

  // Initialise negative log-likelihood
  Type nll;
  nll = Type(0.0);

  // Priors
  // Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
  nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
  nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
  nll -= dnorm(beta, Type(0), Type(100), true).sum();

  // Likelihood
  nll -= dpois(y, lambda, true).sum();

  // ADREPORT
  ADREPORT(tau_epsilon);
  ADREPORT(tau_nu);

  return(nll);
}

4.4 `epil_index.cpp`

// epil_index.cpp

#include <TMB.hpp>

template <class Type>
Type objective_function<Type>::operator()()
{
  // Data block
  DATA_INTEGER(toggle) // 0 corresponds to beta, 1 to nu, 2 to epsilon
  DATA_INTEGER(i); // Index i
  DATA_INTEGER(N);
  DATA_INTEGER(J);
  DATA_INTEGER(K);
  DATA_MATRIX(X);
  DATA_VECTOR(y);
  DATA_MATRIX(E); // Epsilon matrix

  // Parameter block
  PARAMETER_VECTOR(beta);
  PARAMETER_VECTOR(nu);
  PARAMETER_VECTOR(epsilon);

  // If toggle == 0 then set-up minus i for beta
  if(toggle == 0) {

    PARAMETER(beta_i);
    PARAMETER_VECTOR(beta_minus_i);

    // vector<Type> beta(K);
    int k = 0;
    for (int j = 0; j < K; j++) {
      if (j + 1 == i) {
        beta(j) = beta_i;
      } else {
        beta(j) = beta_minus_i(k);
        k++;
      }
    }
  }

  // If toggle == 1 then set-up minus i for nu
  if(toggle == 1) {

    PARAMETER(nu_i);
    PARAMETER_VECTOR(nu_minus_i);

    // vector<Type> nu(N * J);
    int k = 0;
    for (int j = 0; j < (N * J); j++) {
      if (j + 1 == i) {
        nu(j) = nu_i;
      } else {
        nu(j) = nu_minus_i(k);
        k++;
      }
    }
  }

  // If toggle == 2 then set-up minus i for epsilon
  if(toggle == 2) {

    PARAMETER(epsilon_i);
    PARAMETER_VECTOR(epsilon_minus_i);

    // vector<Type> epsilon(N);
    int k = 0;
    for (int j = 0; j < N; j++) {
      if (j + 1 == i) {
        epsilon(j) = epsilon_i;
      } else {
        epsilon(j) = epsilon_minus_i(k);
        k++;
      }
    }
  }

  PARAMETER(l_tau_epsilon);
  PARAMETER(l_tau_nu);

  // Transformed parameters block
  Type tau_epsilon = exp(l_tau_epsilon);
  Type tau_nu = exp(l_tau_nu);
  Type sigma_epsilon = sqrt(1 / tau_epsilon);
  Type sigma_nu = sqrt(1 / tau_nu);
  vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
  vector<Type> lambda(exp(eta));

  // Initialise negative log-likelihood
  Type nll;
  nll = Type(0.0);

  // Priors
  // Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
  nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
  nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
  nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
  nll -= dnorm(beta, Type(0), Type(100), true).sum();

  // Likelihood
  nll -= dpois(y, lambda, true).sum();

  // ADREPORT
  ADREPORT(tau_epsilon);
  ADREPORT(tau_nu);

  return(nll);
}

Original computing environment

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 13.1
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] inf.utils_0.1.0      multi.utils_0.1.0    patchwork_1.1.2      tibble_3.1.8         tidyverse_1.3.1      aghq_0.4.0           tmbstan_1.0.4        rstan_2.21.5        
##  [9] StanHeaders_2.21.0-7 glmmTMB_1.1.4        TMB_1.9.1            stringr_1.5.0        purrr_0.3.5          tidyr_1.2.1          readr_2.1.3          INLA_22.05.07       
## [17] sp_1.5-1             foreach_1.5.2        Matrix_1.5-1         ggplot2_3.4.0        forcats_0.5.2        dplyr_1.0.10        
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4         colorspace_2.0-3    ellipsis_0.3.2      rprojroot_2.0.3     fs_1.5.2            rstudioapi_0.13     farver_2.1.1        MatrixModels_0.5-0  bit64_4.0.5        
##  [10] fansi_1.0.3         lubridate_1.8.0     xml2_1.3.3          codetools_0.2-18    splines_4.2.0       cachem_1.0.6        knitr_1.41          polynom_1.4-1       jsonlite_1.8.4     
##  [19] nloptr_2.0.3        broom_0.8.0         dbplyr_2.1.1        compiler_4.2.0      httr_1.4.4          backports_1.4.1     assertthat_0.2.1    fastmap_1.1.0       cli_3.4.1          
##  [28] htmltools_0.5.3     prettyunits_1.1.1   tools_4.2.0         gtable_0.3.1        glue_1.6.2          Rcpp_1.0.9          jquerylib_0.1.4     cellranger_1.1.0    vctrs_0.5.1        
##  [37] mvQuad_1.0-6        svglite_2.1.0       nlme_3.1-157        iterators_1.0.14    xfun_0.35           ps_1.7.0            lme4_1.1-30         rvest_1.0.2         lifecycle_1.0.3    
##  [46] statmod_1.4.36      orderly_1.4.3       ids_1.0.1           MASS_7.3-56         scales_1.2.1        ragg_1.2.2          hms_1.1.2           inline_0.3.19       yaml_2.3.6         
##  [55] memoise_2.0.1       gridExtra_2.3       sass_0.4.4          loo_2.5.1           RSQLite_2.2.14      stringi_1.7.8       highr_0.9           boot_1.3-28         pkgbuild_1.3.1     
##  [64] rlang_1.0.6         pkgconfig_2.0.3     systemfonts_1.0.4   matrixStats_0.62.0  evaluate_0.18       lattice_0.20-45     labeling_0.4.2      cowplot_1.1.1       bit_4.0.5          
##  [73] processx_3.5.3      tidyselect_1.2.0    bookdown_0.26       magrittr_2.0.3      R6_2.5.1            generics_0.1.3      DBI_1.1.3           pillar_1.8.1        haven_2.5.0        
##  [82] withr_2.5.0         modelr_0.1.8        crayon_1.5.2        uuid_1.1-0          utf8_1.2.2          tzdb_0.3.0          rmarkdown_2.18      grid_4.2.0          readxl_1.4.0       
##  [91] data.table_1.14.6   blob_1.2.3          callr_3.7.0         reprex_2.0.1        digest_0.6.30       webshot_0.5.3       numDeriv_2016.8-1.1 textshaping_0.3.6   openssl_2.0.5      
## [100] RcppParallel_5.1.5  stats4_4.2.0        munsell_0.5.0       viridisLite_0.4.1   kableExtra_1.3.4    bslib_0.4.1         askpass_1.1

Bibliography

Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92.

Epilepsy GLMM example

Adam Howes

1 Background

1.1 Model description

1.2 Data preparation

2 Inference methods

2.1 Stan

2.2 R-INLA

2.3 TMB

2.3.1 Check TMB objective is the same as Stan

2.4 glmmTMB

2.5 tmbstan

2.6 aghq

2.7 Custom Laplace method with aghq