Abstract
Background Rue, Martino, and Chopin (2009) use the epilespy example from the OpenBUGs manual as demonstration of the integrated nested Laplace approximation method. The model is a Poisson GLMM including both fixed and random effects.
Task We implement the epilepsy example in Stan, R-INLA
, TMB
, glmmTMB
, tmbstan
, aghq
, and a custom Laplace marginal version of aghq
. We then compare the results of the inference methods. This serves to compare inference methods, and check new methods are behaving as expected.
knitr::opts_chunk$set(
cache = TRUE,
autodep = TRUE,
cache.lazy = FALSE,
cache.comments = FALSE
)
options(scipen = 999)
cbpalette <- multi.utils::cbpalette()
library(tidyverse)
rstan::rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Section 5.2. of Rue, Martino, and Chopin (2009) demonstrates the application of R-INLA
to a generalised linear mixed model.
The data, previously included as an example in the OpenBUGs manual, concern the results of a clinical trial for a new epilepsy drug.
Patients \(i = 1, \ldots, 59\) are each either assigned the new drug \(\texttt{Trt}_i = 1\) or placebo \(\texttt{Trt}_i = 0\).
Each patient makes four visits the clinic \(j = 1, \ldots, 4\).
The observations \(y_{ij}\) give the number of seizures of the \(i\)th person in the two weeks preceeding their \(j\)th visit to the clinic.
The covariates are age \(\texttt{Age}_i\), baseline seizure counts \(\texttt{Base}_i\) and an indicator for the final clinic visit \(\texttt{V}_4\), which are all centered.
The observations are modeleed using a Poisson distribution \(y_{ij} \sim \text{Poisson}(e^{\eta_{ij}})\) with linear predictor \[\begin{align*} \eta_{ij} &= \beta_0 + \beta_\texttt{Base} \log(\texttt{Baseline}_j / 4) + \beta_\texttt{Trt} \texttt{Trt}_i + \beta_{\texttt{Trt} \times \texttt{Base}} \texttt{Trt}_i \times \log(\texttt{Baseline}_j / 4) \\ &+ \beta_\texttt{Age} \log(\texttt{Age}_i) + \beta_{\texttt{V}_4} {\texttt{V}_4}_j + \epsilon_i + \nu_{ij}, \quad i=1:59, \quad j=1:4, \end{align*}\] where the prior on each of the regression parameters, including the intercept, is \(\mathcal{N}(0, 100^2)\). The random effects are IID \(\epsilon_i \sim \mathcal{N}(0, 1/\tau_\epsilon)\) and \(\nu_{ij} \sim \mathcal{N}(0, 1/\tau_\nu)\) with precision priors \(\tau_\epsilon, \tau_\nu \sim \Gamma(0.001, 0.001)\).
The data is available within R-INLA
as follows:
data(Epil, package = "INLA")
head(Epil)
Epil %>% ggplot(aes(x = as.factor(Trt), y = y)) +
geom_boxplot(alpha = 0.5) +
geom_jitter(width = 0.1, alpha = 0.5) +
theme_minimal() +
labs(x = "Treatment", y = "Number of seizures")
We modify the Epil
dataframe by centering all of the covariates:
center <- function(x) (x - mean(x))
Epil <- Epil %>%
mutate(CTrt = center(Trt),
ClBase4 = center(log(Base/4)),
CV4 = center(V4),
ClAge = center(log(Age)),
CBT = center(Trt * log(Base/4)))
We will implement the model based upon a response variable of length \(N \times J\), where \(N\) is the number of patients and \(J\) is the number of visits to the clinic, and a model matrix with \(K\) predictors (including the intercept term):
N <- 59
J <- 4
K <- 6
X <- model.matrix(formula(~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT), data = Epil)
y <- Epil$y
For the individual specific random effect \(\epsilon_i\) we use a transformation matrix E
which repeats elements of vector of length N
each J
times.
make_epsilon_matrix <- function(N, J) {
t(outer(1:N, 1:(N * J), function(r, c) as.numeric((J*(r - 1) < c) & (c <= J*r))))
}
For example, with \(N = 3\) and \(J = 2\):
E <- make_epsilon_matrix(N = 3, J = 2)
E
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 1 0 0
## [3,] 0 1 0
## [4,] 0 1 0
## [5,] 0 0 1
## [6,] 0 0 1
t(E %*% c(1, 2, 3)) # Same as rep(1:3, 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 2 2 3 3
Multiplying this matrix \(E\) by the vector \(\epsilon\) allows it to be directly added to the linear predictor \(\eta = \beta X + \nu + E \epsilon\), where \(\beta\) is the vector of coefficients and \(X\) is the model matrix.
dat <- list(N = N, J = J, K = K, X = X, y = y, E = make_epsilon_matrix(N, J))
We consider a range of inference methods, described below.
First, Hamiltonian Monte Carlo (HMC) via the No-U-Turn Sampler (NUTS) algorithm, with four Markov chains of iter = 1000
samples, and warmup = 100
burn-in.
Markov chain Monte Carlo (MCMC) like NUTS are exact as the number of samples tends to infinity.
We provide the stan
function with file = "epil.stan"
which refers to a C++ file implementing the log-posterior in the Stan probabilistic programming language:
start_fit1 <- Sys.time()
fit1 <- stan(
file = "epil.stan",
data = dat,
chains = 4,
warmup = 100,
iter = 1000,
control = list(adapt_delta = 0.95),
refresh = 0
)
end_fit1 <- Sys.time()
time_fit1 <- end_fit1 - start_fit1
time_fit1
## Time difference of 19.43963 secs
It took 19.44 seconds to fit this model. All of the \(\hat R\) values are below 1.05, so it looks like the MCMC has reached convergence:
monitor1 <- rstan::monitor(fit1, print = FALSE)
monitor1 %>%
as.data.frame() %>%
tibble::rownames_to_column(var = "parameter") %>%
ggplot(aes(x = parameter, y = Rhat)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = 1.05, linetype = "dashed") +
labs(x = "Parameter") +
theme_minimal() +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
legend.position = "bottom"
)
R-INLA
R-INLA
is an R package which implements integrated nested Laplace approximations.
Rather than specifying the log-posterior via C++, the package uses a formula interface.
This makes it easy to specify common models, but does limit the flexibility of the package for more advanced users.
We specify the \(\Gamma(0.001, 0.001)\) prior on the random effects \(\tau_\epsilon\) and \(\tau_\nu\) below.
In R-INLA
, the precision is internally represented as log-precision, therefore we must set a loggamma
prior.
tau_prior <- list(prec = list(
prior = "loggamma",
param = c(0.001, 0.001),
initial = 1,
fixed = FALSE)
)
The variable Epil$rand
is gives the row number for each entry and the variable Epil$Ind
gives the patient number.
These variables can be used to define the IID random effects \(\nu_{ij}\) and \(\epsilon_i\) in R-INLA
as follows.
The usual R formula notation (e.g. y ~ x1 + x2
) is used for the rest of the linear predictor.
formula <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT +
f(rand, model = "iid", hyper = tau_prior) + #' Nu random effect
f(Ind, model = "iid", hyper = tau_prior) #' Epsilon random effect
We can use various settings ("gaussian"
, "simplified.laplace"
and "laplace"
) for the strategy
passed to control.inla
, which controls the approximation used for the latent field posterior marginals.
We create a function epil_inla
which takes as input the latent field approximation strategy, as well as the hyperparameter quadrature approach:
epil_inla <- function(strat, int_strat) {
inla(
formula,
control.fixed = list(mean = 0, prec = 1 / 100^2), #' Beta prior
family = "poisson",
data = Epil,
control.inla = list(strategy = strat, int.strategy = int_strat),
control.predictor = list(compute = TRUE)
)
}
start_fit2 <- Sys.time()
fit2 <- epil_inla(strat = "gaussian", int_strat = "eb")
end_fit2 <- Sys.time()
(time_fit2 <- end_fit2 - start_fit2)
## Time difference of 4.481125 secs
start_fit3 <- Sys.time()
fit3 <- epil_inla(strat = "simplified.laplace", int_strat = "eb")
end_fit3 <- Sys.time()
(time_fit3 <- end_fit3 - start_fit3)
## Time difference of 4.336194 secs
start_fit4 <- Sys.time()
fit4 <- epil_inla(strat = "laplace", int_strat = "eb")
end_fit4 <- Sys.time()
(time_fit4 <- end_fit4 - start_fit4)
## Time difference of 4.49743 secs
start_fit5 <- Sys.time()
fit5 <- epil_inla(strat = "gaussian", int_strat = "grid")
end_fit5 <- Sys.time()
(time_fit5 <- end_fit5 - start_fit5)
## Time difference of 4.40382 secs
start_fit6 <- Sys.time()
fit6 <- epil_inla(strat = "simplified.laplace", int_strat = "grid")
end_fit6 <- Sys.time()
(time_fit6 <- end_fit6 - start_fit6)
## Time difference of 4.440583 secs
start_fit7 <- Sys.time()
fit7 <- epil_inla(strat = "laplace", int_strat = "grid")
end_fit7 <- Sys.time()
(time_fit7 <- end_fit7 - start_fit7)
## Time difference of 6.758156 secs
TMB
Template Model Builder, or TMB
, is an R package which implements the Laplace approximation using automatic differentiation.
The user writes a C++ file, here epil.cpp
(see appendix), which contains the negative log-posterior for the model in question:
compile("epil.cpp")
## [1] 0
dyn.load(dynlib("epil"))
We need to specify the initialisation parameters:
param <- list(
beta = rep(0, K),
epsilon = rep(0, N),
nu = rep(0, N * J),
l_tau_epsilon = 0,
l_tau_nu = 0
)
We integrate the random effects, random
, out with a Laplace approximation.
Here these corresponds to c("beta", "epsilon", "nu")
:
#' Note: Perhaps you could argue about when the "start" and "end" of the model fitting are
#' For example, is compilation of the TMB template part of the model fitting?
start_fit8 <- Sys.time()
obj <- MakeADFun(
data = dat,
parameters = param,
random = c("beta", "epsilon", "nu"),
DLL = "epil",
silent = TRUE
)
The objective function obj$fn
and its gradient obj$gn
are a function of only the parameters, which in this instance are the logarithms of \(\tau_\epsilon\) and \(\tau_\nu\).
This can be checked with names(obj$par)
.
We optimise obj
using 1000 iterations of the the nlminb
optimiser, passing in the starting values start
, objective function objective
and its derivative gradient
:
its <- 1000 #' May converge before this
opt <- nlminb(
start = obj$par,
objective = obj$fn,
gradient = obj$gr,
control = list(iter.max = its, trace = 0)
)
sd_out <- sdreport(
obj,
par.fixed = opt$par,
getJointPrecision = TRUE
)
end_fit8 <- Sys.time()
(time_fit8 <- end_fit8 - start_fit8)
## Time difference of 0.1567409 secs
TMB
objective is the same as StanFor completeness, we verify that the objective functions as implemented in TMB
and Stan are identical.
This is actually quite difficult to do, so pat it’s worth checking, and giving yourself a pat on the back if they are.
To obtain the TMB
negative log-posterior we call MakeADFun
as before, but now do not specify any parameters to be integrated out.
For Stan, we create an empty model, then use rstan::log_prob
:
tmb_nll <- MakeADFun(data = dat, parameters = param, DLL = "epil")
stan_nll <- stan("epil.stan", data = dat, chains = 0)
## the number of chains is less than 1; sampling not done
We can test the NLL of the initialisation parameters param
:
c("TMB" = tmb_nll$fn(unlist(param)), "Stan" = -rstan::log_prob(object = stan_nll, unlist(param)))
## TMB Stan
## 4365.854 236.002
But these look different! This is just because the two functions are different by a constant, which we can show by evaluating the NLL at some of the other parameters explored during the MCMC:
pars_mat <- as.matrix(fit1)
pars_list <- apply(pars_mat, 1, function(x) relist(flesh = x, skeleton = fit1@inits[[1]]))
#' Note that Stan transformed constrained parameters in the parameters block to unconsrained parameters that can be sampled from
upars_list <- lapply(pars_list, function(x) rstan:::unconstrain_pars(fit1, x))
Using just a few of them, we can see that the difference between the TMB
and Stan objectives is a constant:
test_pars <- upars_list[500:510]
tmb_evals <- sapply(test_pars, tmb_nll$fn)
stan_evals <- -sapply(test_pars, FUN = rstan::log_prob, object = stan_nll)
data.frame(
"TMB" = tmb_evals,
"Stan" = stan_evals
) %>%
mutate(Difference = TMB - Stan) %>%
pull(Difference)
## [1] 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852 4129.852
Note: When there are errors in the C++ template code, usually to do with indexing (unlike Stan, in TMB
there is no requirement when defining variables to give them dimensions), calling the MakeADFun
function tends to crash the R session.
A workaround (courtesy of Kinh) for debugging without crashing the working R session is to use the following (which creates new R sessions which crash in preference to the working R session):
library(parallel)
testrun <- mcparallel({MakeADFun(data = dat,
parameters = param,
DLL = "epil")})
obj <- mccollect(testrun, wait = TRUE, timeout = 0, intermediate = FALSE)
glmmTMB
glmmTMB
is an R package written by Ben Bolker and collaborators which allows fitting generalised linear mixed models in TMB
without writing the C++ code manually as we have in epil.cpp
.
For example, rather than writing epil.cpp
, we could have called the following:
start_fit9 <- Sys.time()
formula9 <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + (1 | rand) + (1 | Ind)
fit9 <- glmmTMB(formula9, data = Epil, family = poisson(link = "log"))
end_fit9 <- Sys.time()
(time_fit9 <- end_fit9 - start_fit9)
## Time difference of 0.8262661 secs
This won’t be exactly the same model, in terms of priors and so on, but it may be close. We mention this package here in case it is of interest, but will not focus on it during the remainder of this notebook, as we would like to compare inferential methods on exactly the same model.
tmbstan
tmbstan
(Cole Monnahan and Kasper Kristensen) is another helpful TMB
package which allows you to pass the same C++ template you use in TMB
to Stan in order to perform NUTS (if you have standard C++ code then this can also likely be done using stanc
).
start_fit10 <- Sys.time()
fit10 <- tmbstan(obj = obj, chains = 4, refresh = 0)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
end_fit10 <- Sys.time()
(time_fit10 <- end_fit10 - start_fit10)
## Time difference of 59.5346 secs
The results from tmbstan
, up to stochasticity of the MCMC sampler, should be exactly the same as those we might generate from Stan – given that we have confirmed the objective functions they are sampling from are identical up to a constant.
For this reason, we do not need to focus on these results in the remainder of the notebook, but again highlight this package in case it is of interest.
aghq
aghq
is an R package written by Alex Stringer which implements adaptive Gauss-Hermite quadrature.
It also works smoothly with TMB
templates.
The parameter k = 3
determines the number of quadrature points per dimension:
start_fit11 <- Sys.time()
fit11 <- aghq::marginal_laplace_tmb(
obj,
k = 3,
startingvalue = c(param$l_tau_epsilon, param$l_tau_nu)
)
fit11_samples <- aghq::sample_marginal(fit11, M = 1000)$samps %>%
t() %>%
as.data.frame() %>%
inf.utils::replace_duplicate_colnames()
end_fit11 <- Sys.time()
(time_fit11 <- end_fit11 - start_fit11)
Another (non-standard) approach that is possible is to use glmmTMB
to get the TMB
template which can then be passed to aghq
.
glmm_model_info <- glmmTMB(formula9, data = Epil, family = poisson(link = "log"), doFit = FALSE)
glmm_ff <- with(glmm_model_info, {
TMB::MakeADFun(
data = data.tmb,
parameters = parameters,
random = names(parameters)[grep("theta", names(parameters), invert = TRUE)],
DLL = "glmmTMB",
silent = TRUE
)
})
glmm_quad <- aghq::marginal_laplace_tmb(glmm_ff, k = 3, startingvalue = glmm_ff$par)
This could be useful in situations where you don’t have a TMB
template written.
aghq
Ok, this next section is going to be a bit more involved.
Let’s just starting by trying to get the Laplace marginals for \(\beta\).
To do this, we create the TMB
template epil_beta_index.cpp
(see appendix) which has the beta
parameter replaced by beta_i
and beta_minus_i
.
As well, there is an additional DATA_INTEGER(i)
allowing us to pass in the index of i
that we would like to obtain a full Laplace approximation to.
compile("epil_beta_index.cpp")
## [1] 0
dyn.load(dynlib("epil_beta_index"))
We have already obtained the MAP estimate \(\hat \theta\) in the TMB
section above that we can use in our empirical Bayes procedure.
opt_theta <- opt
We create a function to get the Laplace marginal for the \(i\)th element of \(\beta\). Some key things to note about this:
map
argument to fix the hyperparameters to their MAP estimates, which requires passing factor(NA)
to map
, as well as setting the initialisation parameters for the hyperparameters to be their MAP estimates.random = c("beta_minus_i", "nu", "epsilon")
integrates out all of the random effects but beta_i
.aghq::aghq
to obtain a the normalised posterior marginal.xi_laplace_marginal_beta <- function(i, opt_theta, finegrid) {
dat$i <- i
theta_names <- unique(names(opt_theta$par))
map_fixed_theta <- list()
param_fixed_theta <- param
for(theta in theta_names) {
map_fixed_theta[[theta]] <- rep(factor(NA), length(param[[theta]]))
param_fixed_theta[[theta]] <- opt_theta$par[names(opt_theta$par) == theta]
}
#' Prepare the initialisation parameters
param_fixed_theta[["beta_i"]] <- param_fixed_theta$beta[i]
param_fixed_theta[["beta_minus_i"]] <- param_fixed_theta$beta[-i]
param_fixed_theta[["beta"]] <- NULL
obj_fixed_theta <- MakeADFun(
data = dat,
parameters = param_fixed_theta,
random = c("beta_minus_i", "nu", "epsilon"),
DLL = "epil_beta_index",
silent = TRUE,
map = map_fixed_theta
)
quad <- aghq::aghq(
ff = obj_fixed_theta,
k = 3,
startingvalue = 0,
control = aghq::default_control_tmb()
)
pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad, finegrid = fine_grid)[[1]]
return(pdf_and_cdf)
}
Now we can compute the Laplace marginal for all 6 indices of \(\beta\).
fine_grid <- seq(-5, 5, length.out = 500)
laplace_beta <- lapply(1:dat$K, xi_laplace_marginal_beta, opt_theta = opt_theta, finegrid = fine_grid) %>%
bind_rows(.id = "index") %>%
mutate(index = as.numeric(index))
Here are what the marginals look like:
laplace_beta %>%
ggplot(aes(x = theta, y = pdf)) +
geom_line() +
facet_wrap(~index) +
theme_minimal() +
labs(x = "beta", y = "Posterior")
Now that we have the Laplace marginals for beta
, we can think about what we would do if we wanted to get them for all of the elements of the latent field.
One approach would be to create separate TMB
templates for each named random effect (beta
, nu
and epsilon
).
To see what this would look like, have a look at epil_beta_index.cpp
, epil_nu_index.cpp
and epil_epsilon_index.cpp
.
What might be better is to do it in one TMB
template, epil_index.cpp
(see appendix):
compile("epil_index.cpp")
## [1] 0
dyn.load(dynlib("epil_index"))
We’re going to use a data structure to pass to epil_index
.
A toggle
value of 0
corresponds to beta
, a toggle value of 1
corresponds to nu
, and a toggle value of 2
corresponds to epsilon
:
index_controller <- data.frame(
"toggle" = c(rep(0, dat$K), rep(1, dat$N * dat$J), rep(2, dat$N)),
"i" = c(1:dat$K, 1:(dat$N * dat$J), 1:dat$N)
)
Create function which takes as input the index i
(from index_controller
), the results of optimisation for \(\theta\), and a grid of values for \(x_i\) to evaluate at:
xi_laplace_marginal <- function(i, opt_theta, finegrid) {
dat$toggle <- index_controller[i, "toggle"]
dat$i <- index_controller[i, "i"]
theta_names <- unique(names(opt_theta$par))
map_fixed_theta <- list()
param_fixed_theta <- param
for(theta in theta_names) {
map_fixed_theta[[theta]] <- rep(factor(NA), length(param[[theta]]))
param_fixed_theta[[theta]] <- opt_theta$par[names(opt_theta$par) == theta]
}
#' Prepare the initialisation parameters if it's beta that we're toggling
#' We fix the beta parameter to zero, as we're using beta_i and beta_minus_i
if(dat$toggle == 0) {
param_fixed_theta[["beta_i"]] <- param_fixed_theta$beta[i]
param_fixed_theta[["beta_minus_i"]] <- param_fixed_theta$beta[-i]
map_fixed_theta[["beta"]] <- rep(factor(NA), length(param[["beta"]]))
}
#' Or if it's epsilon that we're toggling
#' We fix the epsilon parameter to zero, as we're using epsilon_i and epsilon_minus_i
if(dat$toggle == 1) {
param_fixed_theta[["nu_i"]] <- param_fixed_theta$nu[i]
param_fixed_theta[["nu_minus_i"]] <- param_fixed_theta$nu[-i]
map_fixed_theta[["nu"]] <- rep(factor(NA), length(param[["nu"]]))
}
#' Or if it's epsilon that we're toggling
#' We fix the epsilon parameter to zero, as we're using epsilon_i and epsilon_minus_i
if(dat$toggle == 2) {
param_fixed_theta[["epsilon_i"]] <- param_fixed_theta$epsilon[i]
param_fixed_theta[["epsilon_minus_i"]] <- param_fixed_theta$epsilon[-i]
map_fixed_theta[["epsilon"]] <- rep(factor(NA), length(param[["epsilon"]]))
}
get_random <- function(dat) {
if(dat$toggle == 0) return(c("beta_minus_i", "nu", "epsilon"))
if(dat$toggle == 1) return(c("beta", "nu_minus_i", "epsilon"))
if(dat$toggle == 2) return(c("beta", "nu", "epsilon_minus_i"))
}
obj_fixed_theta <- MakeADFun(
data = dat,
parameters = param_fixed_theta,
random = get_random(dat),
DLL = "epil_index",
silent = TRUE,
map = map_fixed_theta
)
quad <- aghq::aghq(
ff = obj_fixed_theta,
k = 3,
startingvalue = 0,
control = aghq::default_control_tmb()
)
pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad, finegrid = fine_grid)[[1]]
return(list("pdf_and_cdf" = pdf_and_cdf, "quad" = quad))
}
Now we can compute the Laplace marginal for all 301 random effects of \((\beta, \nu, \epsilon)\).
fine_grid <- seq(-5, 5, length.out = 500)
#' It's possible to get errors with the initial value not being finite
safe_xi_laplace_marginal <- function(...) {
return(tryCatch(xi_laplace_marginal(...), error = function(e) NULL))
}
laplace_x <- lapply(1:nrow(index_controller), safe_xi_laplace_marginal, opt_theta = opt_theta, finegrid = fine_grid)
laplace_beta_summary <- lapply(head(laplace_x, n = dat$K), function(result) summary(result$quad)$summarytable) %>%
bind_rows()
rownames(laplace_beta_summary) <- paste0("beta[", 1:dat$K, "]")
Let’s look at the mean and standard deviation of each of the \(\beta\) parameters:
stan1 <- as.vector(t(summary(fit1)$summary[1:6, c(1, 3)]))
inla2 <- as.vector(t(fit2$summary.fixed[1:6, 1:2]))
inla3 <- as.vector(t(fit3$summary.fixed[1:6, 1:2]))
inla4 <- as.vector(t(fit4$summary.fixed[1:6, 1:2]))
inla5 <- as.vector(t(fit5$summary.fixed[1:6, 1:2]))
inla6 <- as.vector(t(fit6$summary.fixed[1:6, 1:2]))
inla7 <- as.vector(t(fit7$summary.fixed[1:6, 1:2]))
tmb8 <- as.vector(t(data.frame(sd_out$par.random[1:6], sqrt(sd_out$diag.cov.random[1:6]))))
glmmtmb9 <- as.vector(t(summary(fit9)$coefficients$cond[, c("Estimate", "Std. Error")]))
tmbstan10 <- as.vector(t(summary(fit10)$summary[1:6, c(1, 3)]))
fit11_beta_samples <- fit11_samples[, 1:6]
aghq11 <- as.vector(t(data.frame(mean = apply(fit11_beta_samples, 2, mean), sd = apply(fit11_beta_samples, 2, sd))))
adam12 <- as.vector(t(laplace_beta_summary[1:6, 1:2]))
df <- cbind(stan1, inla2, inla3, inla4, inla5, inla6, inla7, tmb8, glmmtmb9, tmbstan10, aghq11, adam12) %>%
as.data.frame() %>%
mutate(type = gl(2, 1, 12, labels = c("Mean", "SD")))
beta_i <- function(i) { c(paste0("beta_", i), paste0("sd(beta_", i, ")")) }
rownames(df) <- c(sapply(0:5, beta_i))
method_names <- c("Stan", "EB-G", "EB-SL", "EB-L", "Grid-G", "Grid-SL", "Grid-L", "TMB", "glmmTMB", "tmbstan", "aghq", "adam")
colnames(df) <- c(method_names, "type")
df %>%
select(-type) %>%
kableExtra::kable(digits = 3)
Stan | EB-G | EB-SL | EB-L | Grid-G | Grid-SL | Grid-L | TMB | glmmTMB | tmbstan | aghq | adam | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
beta_0 | 1.572 | 1.627 | 1.574 | 1.574 | 1.626 | 1.573 | 1.573 | 1.627 | 1.579 | 1.572 | 1.626 | 1.573 |
sd(beta_0) | 0.078 | 0.075 | 0.075 | 0.075 | 0.078 | 0.078 | 0.078 | 0.076 | 0.073 | 0.078 | 0.075 | 0.076 |
beta_1 | -0.955 | -0.926 | -0.953 | -0.954 | -0.927 | -0.954 | -0.956 | -0.926 | -0.949 | -0.953 | -0.937 | -0.955 |
sd(beta_1) | 0.414 | 0.409 | 0.409 | 0.408 | 0.419 | 0.419 | 0.419 | 0.414 | 0.396 | 0.426 | 0.427 | 0.415 |
beta_2 | 0.881 | 0.858 | 0.880 | 0.880 | 0.859 | 0.880 | 0.881 | 0.858 | 0.880 | 0.883 | 0.856 | 0.880 |
sd(beta_2) | 0.139 | 0.134 | 0.134 | 0.134 | 0.138 | 0.138 | 0.138 | 0.136 | 0.129 | 0.139 | 0.141 | 0.136 |
beta_3 | -0.100 | -0.100 | -0.103 | -0.103 | -0.101 | -0.103 | -0.104 | -0.101 | -0.103 | -0.104 | -0.096 | -0.103 |
sd(beta_3) | 0.088 | 0.086 | 0.086 | 0.086 | 0.086 | 0.086 | 0.086 | 0.086 | 0.086 | 0.087 | 0.085 | 0.086 |
beta_4 | 0.483 | 0.472 | 0.486 | 0.486 | 0.471 | 0.484 | 0.484 | 0.470 | 0.490 | 0.492 | 0.469 | 0.484 |
sd(beta_4) | 0.376 | 0.355 | 0.355 | 0.355 | 0.365 | 0.365 | 0.365 | 0.360 | 0.342 | 0.357 | 0.371 | 0.360 |
beta_5 | 0.349 | 0.340 | 0.350 | 0.350 | 0.340 | 0.350 | 0.351 | 0.340 | 0.349 | 0.348 | 0.345 | 0.350 |
sd(beta_5) | 0.211 | 0.208 | 0.208 | 0.208 | 0.214 | 0.214 | 0.213 | 0.210 | 0.200 | 0.214 | 0.219 | 0.211 |
saveRDS(df, "comparison-results.rds")
time_df <- data.frame(
time = c(time_fit1, time_fit2, time_fit3, time_fit4, time_fit5, time_fit6, time_fit7, time_fit8, time_fit9, time_fit10, time_fit11),
method = head(method_names, n = 11)
)
time_df %>% ggplot(aes(x = method, y = time)) +
geom_col() +
theme_minimal() +
labs(x = "Method", y = "Time taken")
## Don't know how to automatically pick scale for object of type <difftime>. Defaulting to continuous.
#' Stan histogram base-plot
plot_pdf_stan <- ggplot(data.frame(x = rstan::extract(fit1, pars = "beta[1]")[[1]]), aes(x = x)) +
geom_histogram(aes(y = ..density..), alpha = 0.6, fill = cbpalette[7], bins = 40) +
theme_minimal() +
labs(x = "beta[1]", y = "Posterior PDF")
beta0_inla_marginals <- bind_rows(
data.frame(fit2$marginals.fixed$`(Intercept)`) %>%
mutate(method = "Gaussian"),
data.frame(fit4$marginals.fixed$`(Intercept)`) %>%
mutate(method = "Laplace")
)
#' Add R-INLA results
plot_pdf_stan_inla <- plot_pdf_stan +
geom_line(data = beta0_inla_marginals, aes(x = x, y = y, col = method)) +
scale_color_manual(values = cbpalette) +
labs(col = "Method", title = "Inference from R-INLA")
plot_pdf_stan_aghq <- plot_pdf_stan +
geom_histogram(data = data.frame(x = fit11_beta_samples$`beta[1]`), aes(y = ..density..), alpha = 0.6, fill = cbpalette[1], bins = 40)
plot_pdf_stan_aghq <- plot_pdf_stan_aghq +
geom_line(data = laplace_x[[1]]$pdf_and_cdf, aes(x = theta, y = pdf), col = cbpalette[2]) +
lims(x = c(1.2, 2)) +
labs(title = "Inference from aghq and custom Laplace method") +
theme_minimal()
plot_intercept_comparison <- plot_pdf_stan_inla / plot_pdf_stan_aghq
plot_intercept_comparison
## Warning: Removed 460 rows containing missing values (`geom_line()`).
ggsave("intercept-comparison.pdf", plot = plot_intercept_comparison, h = 6, w = 6.25)
## Warning: Removed 460 rows containing missing values (`geom_line()`).
tmbstan
Assuming tmbstan
to be the ground truth, we can do two dimensional plots to check the fit
ggplot(df) +
geom_point(aes(x = tmbstan, y = TMB, col = type)) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", alpha = 0.5) +
theme_minimal()
We use symmetric KL. From the Google group: “… its the symmetric version between the posterior marginal computed using Gaussian approximation and the one use one of the Laplace based ones.”.
For "gaussian"
the kld
value is small but non-zero, why?
head(fit2$summary.random$rand)$kld
## [1] 0 0 0 0 0 0
For laplace
and "simplified.laplace"
we can look for the random effect with the highest SKLD:
skld <- function(fit) {
#' Nu random effect
id_rand <- which.max(fit$summary.random$rand$kld)
print(fit$summary.random$rand[id_rand, ])
#' Epsilon random effect
id_Ind <- which.max(fit$summary.random$Ind$kld)
print(fit$summary.random$Ind[id_Ind, ])
return(list(id_rand = id_rand, id_Ind = id_Ind))
}
skld3 <- skld(fit3)
## ID mean sd 0.025quant 0.5quant 0.975quant mode kld
## 222 222 0.645863 0.2480358 0.1561084 0.6472347 1.129674 NA 0.001331823
## ID mean sd 0.025quant 0.5quant 0.975quant mode kld
## 56 56 0.9044046 0.2309653 0.4488055 0.9055538 1.355366 NA 0.006909513
skld4 <- skld(fit4)
## ID mean sd 0.025quant 0.5quant 0.975quant mode kld
## 222 222 0.6429203 0.2476023 0.1546926 0.6439011 1.125762 NA 0.0007942658
## ID mean sd 0.025quant 0.5quant 0.975quant mode kld
## 56 56 0.9019214 0.2300221 0.4486929 0.9027 1.350875 NA 0.005744471
plot_marginals <- function(random_effect, index) {
marginal2 <- fit2$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
as.data.frame() %>%
mutate(method = "G")
marginal3 <- fit3$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
as.data.frame() %>%
mutate(method = "SL")
marginal4 <- fit4$marginals.random[[random_effect]][paste0("index.", index)][[1]] %>%
as.data.frame() %>%
mutate(method = "L")
marginals <- bind_rows(marginal2, marginal3, marginal4)
ggplot(marginals, aes(x = x, y = y, group = method, col = method)) +
geom_line() +
scale_color_manual(values = cbpalette) +
theme_minimal()
}
plot_marginals("rand", index = skld3$id_rand)
plot_marginals("Ind", index = skld3$id_Ind)
epil.cpp
// epil.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_INTEGER(N);
DATA_INTEGER(J);
DATA_INTEGER(K);
DATA_MATRIX(X);
DATA_VECTOR(y);
DATA_MATRIX(E); // Epsilon matrix
// Parameter block
PARAMETER_VECTOR(beta);
PARAMETER_VECTOR(epsilon);
PARAMETER_VECTOR(nu);
PARAMETER(l_tau_epsilon);
PARAMETER(l_tau_nu);
// Transformed parameters block
Type tau_epsilon = exp(l_tau_epsilon);
Type tau_nu = exp(l_tau_nu);
Type sigma_epsilon = sqrt(1 / tau_epsilon);
Type sigma_nu = sqrt(1 / tau_nu);
vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
vector<Type> lambda(exp(eta));
// Initialise negative log-likelihood
Type nll;
nll = Type(0.0);
// Priors
// Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
nll -= dnorm(beta, Type(0), Type(100), true).sum();
// Likelihood
nll -= dpois(y, lambda, true).sum();
// ADREPORT
ADREPORT(tau_epsilon);
ADREPORT(tau_nu);
return(nll);
}
epil.stan
// epil.stan
data {
int<lower=0> N; // Number of patients
int<lower=0> J; // Number of clinic visits
int<lower=0> K; // Number of predictors (inc. intercept)
matrix[N * J, K] X; // Design matrix
int<lower=0> y[N * J]; // Outcome variable
matrix[N * J, N] E; // Epsilon matrix
}
parameters {
vector[K] beta; // Vector of coefficients
vector[N] epsilon; // Patient specific errors
vector[N * J] nu; // Patient-visit errors
real<lower=0> tau_epsilon; // Precision of epsilon
real<lower=0> tau_nu; // Precision of nu
}
transformed parameters {
vector[N * J] eta = X * beta + nu + E * epsilon; // Linear predictor
}
model {
beta ~ normal(0, 100);
tau_epsilon ~ gamma(0.001, 0.001);
tau_nu ~ gamma(0.001, 0.001);
epsilon ~ normal(0, sqrt(1 / tau_epsilon));
nu ~ normal(0, sqrt(1 / tau_nu));
y ~ poisson_log(eta);
}
epil_beta_index.cpp
// epil_nu_index.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_INTEGER(i); // Index i
DATA_INTEGER(N);
DATA_INTEGER(J);
DATA_INTEGER(K);
DATA_MATRIX(X);
DATA_VECTOR(y);
DATA_MATRIX(E); // Epsilon matrix
// Parameter block
PARAMETER_VECTOR(nu);
PARAMETER_VECTOR(epsilon);
PARAMETER(beta_i);
PARAMETER_VECTOR(beta_minus_i);
vector<Type> beta(K);
int k = 0;
for (int j = 0; j < K; j++) {
if (j + 1 == i) { // +1 because C++ does zero-indexing
beta(j) = beta_i;
} else {
beta(j) = beta_minus_i(k);
k++;
}
}
PARAMETER(l_tau_epsilon);
PARAMETER(l_tau_nu);
// Transformed parameters block
Type tau_epsilon = exp(l_tau_epsilon);
Type tau_nu = exp(l_tau_nu);
Type sigma_epsilon = sqrt(1 / tau_epsilon);
Type sigma_nu = sqrt(1 / tau_nu);
vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
vector<Type> lambda(exp(eta));
// Initialise negative log-likelihood
Type nll;
nll = Type(0.0);
// Priors
// Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
nll -= dnorm(beta, Type(0), Type(100), true).sum();
// Likelihood
nll -= dpois(y, lambda, true).sum();
// ADREPORT
ADREPORT(tau_epsilon);
ADREPORT(tau_nu);
return(nll);
}
epil_index.cpp
// epil_index.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_INTEGER(toggle) // 0 corresponds to beta, 1 to nu, 2 to epsilon
DATA_INTEGER(i); // Index i
DATA_INTEGER(N);
DATA_INTEGER(J);
DATA_INTEGER(K);
DATA_MATRIX(X);
DATA_VECTOR(y);
DATA_MATRIX(E); // Epsilon matrix
// Parameter block
PARAMETER_VECTOR(beta);
PARAMETER_VECTOR(nu);
PARAMETER_VECTOR(epsilon);
// If toggle == 0 then set-up minus i for beta
if(toggle == 0) {
PARAMETER(beta_i);
PARAMETER_VECTOR(beta_minus_i);
// vector<Type> beta(K);
int k = 0;
for (int j = 0; j < K; j++) {
if (j + 1 == i) {
beta(j) = beta_i;
} else {
beta(j) = beta_minus_i(k);
k++;
}
}
}
// If toggle == 1 then set-up minus i for nu
if(toggle == 1) {
PARAMETER(nu_i);
PARAMETER_VECTOR(nu_minus_i);
// vector<Type> nu(N * J);
int k = 0;
for (int j = 0; j < (N * J); j++) {
if (j + 1 == i) {
nu(j) = nu_i;
} else {
nu(j) = nu_minus_i(k);
k++;
}
}
}
// If toggle == 2 then set-up minus i for epsilon
if(toggle == 2) {
PARAMETER(epsilon_i);
PARAMETER_VECTOR(epsilon_minus_i);
// vector<Type> epsilon(N);
int k = 0;
for (int j = 0; j < N; j++) {
if (j + 1 == i) {
epsilon(j) = epsilon_i;
} else {
epsilon(j) = epsilon_minus_i(k);
k++;
}
}
}
PARAMETER(l_tau_epsilon);
PARAMETER(l_tau_nu);
// Transformed parameters block
Type tau_epsilon = exp(l_tau_epsilon);
Type tau_nu = exp(l_tau_nu);
Type sigma_epsilon = sqrt(1 / tau_epsilon);
Type sigma_nu = sqrt(1 / tau_nu);
vector<Type> eta(X * beta + nu + E * epsilon); // Linear predictor
vector<Type> lambda(exp(eta));
// Initialise negative log-likelihood
Type nll;
nll = Type(0.0);
// Priors
// Note: dgamma() is parameterised as (shape, scale); INLA parameterised as (shape, rate)
nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true);
nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true);
nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum();
nll -= dnorm(nu, Type(0), sigma_nu, true).sum();
nll -= dnorm(beta, Type(0), Type(100), true).sum();
// Likelihood
nll -= dpois(y, lambda, true).sum();
// ADREPORT
ADREPORT(tau_epsilon);
ADREPORT(tau_nu);
return(nll);
}
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 13.1
##
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] inf.utils_0.1.0 multi.utils_0.1.0 patchwork_1.1.2 tibble_3.1.8 tidyverse_1.3.1 aghq_0.4.0 tmbstan_1.0.4 rstan_2.21.5
## [9] StanHeaders_2.21.0-7 glmmTMB_1.1.4 TMB_1.9.1 stringr_1.5.0 purrr_0.3.5 tidyr_1.2.1 readr_2.1.3 INLA_22.05.07
## [17] sp_1.5-1 foreach_1.5.2 Matrix_1.5-1 ggplot2_3.4.0 forcats_0.5.2 dplyr_1.0.10
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.3 fs_1.5.2 rstudioapi_0.13 farver_2.1.1 MatrixModels_0.5-0 bit64_4.0.5
## [10] fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3 codetools_0.2-18 splines_4.2.0 cachem_1.0.6 knitr_1.41 polynom_1.4-1 jsonlite_1.8.4
## [19] nloptr_2.0.3 broom_0.8.0 dbplyr_2.1.1 compiler_4.2.0 httr_1.4.4 backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0 cli_3.4.1
## [28] htmltools_0.5.3 prettyunits_1.1.1 tools_4.2.0 gtable_0.3.1 glue_1.6.2 Rcpp_1.0.9 jquerylib_0.1.4 cellranger_1.1.0 vctrs_0.5.1
## [37] mvQuad_1.0-6 svglite_2.1.0 nlme_3.1-157 iterators_1.0.14 xfun_0.35 ps_1.7.0 lme4_1.1-30 rvest_1.0.2 lifecycle_1.0.3
## [46] statmod_1.4.36 orderly_1.4.3 ids_1.0.1 MASS_7.3-56 scales_1.2.1 ragg_1.2.2 hms_1.1.2 inline_0.3.19 yaml_2.3.6
## [55] memoise_2.0.1 gridExtra_2.3 sass_0.4.4 loo_2.5.1 RSQLite_2.2.14 stringi_1.7.8 highr_0.9 boot_1.3-28 pkgbuild_1.3.1
## [64] rlang_1.0.6 pkgconfig_2.0.3 systemfonts_1.0.4 matrixStats_0.62.0 evaluate_0.18 lattice_0.20-45 labeling_0.4.2 cowplot_1.1.1 bit_4.0.5
## [73] processx_3.5.3 tidyselect_1.2.0 bookdown_0.26 magrittr_2.0.3 R6_2.5.1 generics_0.1.3 DBI_1.1.3 pillar_1.8.1 haven_2.5.0
## [82] withr_2.5.0 modelr_0.1.8 crayon_1.5.2 uuid_1.1-0 utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.18 grid_4.2.0 readxl_1.4.0
## [91] data.table_1.14.6 blob_1.2.3 callr_3.7.0 reprex_2.0.1 digest_0.6.30 webshot_0.5.3 numDeriv_2016.8-1.1 textshaping_0.3.6 openssl_2.0.5
## [100] RcppParallel_5.1.5 stats4_4.2.0 munsell_0.5.0 viridisLite_0.4.1 kableExtra_1.3.4 bslib_0.4.1 askpass_1.1