R-INLA
sparsity preserving BYM2 implementation in TMB
Abstract
This document shows how to efficiently implement the BYM2 model in Template Model Builder (TMB
) using a sparse representation.
R-INLA
parameterisation for the BYM2 modelThe BYM2 is a popular choice of model for spatial random effects. The Besag-York-Mollie (BYM) model decomposes a spatial random effect \(\mathbf{x}\) into a spatially structured component \(\mathbf{u}\) and an IID component \(\mathbf{v}\). The BYM2 model reparameterises the BYM in terms of a marginal standard deviation \(\sigma > 0\) and proportion \(\phi \in [0,1]\) specifying the weight to each component: \[ \begin{align} \mathbf{x} &= \sigma \cdot (\sqrt{\phi}\cdot \mathbf{u} + \sqrt{1-\phi}\cdot \mathbf{v}) \\ \mathbf{u} &\sim N(0, \mathbf{Q}^{-1}) \\ \mathbf{v} &\sim N(0, \mathbf{I}) \\ \end{align} \] where the matrix \(\mathbf{Q}\) is the scaled ICAR structure matrix such that \(\mathrm{Var}(u_i) \approx 1\).
In most TMB
implementations, we parameterise the BYM2 model in terms of the component random effects \(\mathbf{u}\) and \(\mathbf{v}\), as illustrated in the tutorial on intrinsic area models in Stan by Mitzi Morris.
The R-INLA
implementation of the BYM2 model directly parameterises the spatial random effect \(\mathbf{x}\), which has covariance matrix
\[\mathrm{Var}(\mathbf{x} | \sigma, \phi) = \sigma^2\cdot\left(\phi\cdot\mathbf{Q}^{-1} + (1-\phi)\cdot\mathbf{I}\right).\]
The inverse of \(\mathrm{Var}(\mathbf{x} | \sigma, \phi)\), the precision matrix, is no longer sparse.
To retain sparsity of the precision matrix, R-INLA
parameterises the joint distribution of \((\mathbf{x}, \mathbf{u})\), described in Section 3.4 of Reibler et al..
Conditioning \(\pi(\mathbf{x}, \mathbf{u}) = \pi(\mathbf{x} | \mathbf{u}) \pi(\mathbf{u})\) is used to derive that \(\pi(\mathbf{x}, \mathbf{u})\) is a multivariate normal distribution with mean \(\mathbf{0}\) and precision matrix
\[
\left(\begin{array}{cc}
\frac{1}{\sigma^2 (1-\phi)} \mathbf{I} & -\frac{\sqrt{\phi}}{\sigma (1-\phi)} \mathbf{I} \\
-\frac{\sqrt{\phi}}{\sigma (1-\phi)} \mathbf{I} & \mathbf{Q} + \frac{\phi}{1-\phi} \mathbf{I}
\end{array}\right).
\]
The normalising constant is apparent from the product of the normalising constants for the terms \(\pi(\mathbf{x} | \mathbf{u})\) and \(\pi(\mathbf{u})\): \[ (2\pi)^{-(n + \mathrm{rank}(\mathbf{Q}))/2} \cdot \left(\sigma \sqrt{1-\phi}\right)^{-n} \cdot |\mathbf{Q}|^{1/2}. \]
The terms \((2\pi)^{-(n + \mathrm{rank}(\mathbf{Q}))/2}\) and \(|\mathbf{Q}|^{1/2}\) typically may be omitted.
Further details of the derivation are on page 3 of the notes accompanying the R-INLA
documentation.
TMB
, like R-INLA
, implements a two step optimisation process.
The “outer” optimisation step maximises the marginal posterior of the hyperparameters (referred to as “fixed” parameters in TMB
).
For each “outer” optimisation step, an “inner” optimisation maximises the latent field parameters (“random” parameters in TMB
parlance) conditional on the hyperparameters, which is used to obtain a Gaussian approximation to the latent field posterior and thereby a Laplace approximation to the marginal posterior of the hyperparameters.
The starting values for each inner optimisation are the final optimised values from the previous iteration. Using the conventional parameterisation from the Morris tutorial, the linear predictor for an observation depends on the values of the hyperparameters, for example \[ \mu_i = \beta_0 + \sigma\cdot\left(\sqrt{\phi}\cdot u_i + \sqrt{1-\phi}\cdot v_i\right). \] Thus a large step for the hyperparameters will affect the the value for the linear predictor \(\mu_i\), potentially moving it far away from the optimal parameters.
If the random effect \(x_i\) is directly parameterised as in the R-INLA
implementation, for example \(\mu_i = \beta_0 + x_i,\) the starting value for \(\mu_i\) is unchanged by the updated hyperparameters.
This change in parameterisation does not change the model (i.e. the inference for hyperparameters does not change), but the inner optimisation step may be more efficient by starting at parameter values closer to the final values.
TMB
R-INLA
implements the BYM2 model parameterisation as a single vector \(\mathbf{w} = (\mathbf{x}, \mathbf{u})\) of length \(2n\), and constructs the full \(2n \times 2n\) sparse precision matrix described above.
Coding the sparse precision matrix and parsing the vector \(\mathbf{w}\) in the TMB
model template is an unnecessary hassle.
Instead, it is more convenient to decompose the block matrix multiplication: \[ \begin{align*} \log \pi(\mathbf{x}, \mathbf{u} | \sigma,\phi) &= -\frac{n+\mathrm{rank}(\mathbf{Q})}{2} \log(2\pi) - n\log(\sigma) - \frac{n}{2} \log(1-\phi) + \frac{1}{2}\log|\mathbf{Q}| - \frac{1}{2} \left(\begin{array}{cc}\mathbf{x}' & \mathbf{u}'\end{array}\right) \left(\begin{array}{cc} \frac{1}{\sigma^2 (1-\phi)} \mathbf{I} & -\frac{\sqrt{\phi}}{\sigma (1-\phi)} \mathbf{I} \\ -\frac{\sqrt{\phi}}{\sigma (1-\phi)} \mathbf{I} & \mathbf{Q} + \frac{\phi}{1-\phi} \mathbf{I} \end{array}\right) \left(\begin{array}{c} \mathbf{x} \\ \mathbf{u}\end{array}\right) \\ &= C - \frac{1}{2}\left[\mathbf{x}' \left(\frac{1}{\sigma^2 (1-\phi)} \mathbf{I}\right)\mathbf{x} - 2\mathbf{x}'\left(\frac{\sqrt{\phi}}{\sigma (1-\phi)} \mathbf{I}\right)\mathbf{u} + \mathbf{u}'\left(\mathbf{Q} + \frac{\phi}{1-\phi} \mathbf{I}\right) \mathbf{u}\right] \\ &= -\frac{n+\mathrm{rank}(\mathbf{Q})}{2} \log(2\pi) - n\log(\sigma) - \frac{n}{2} \log(1-\phi) + \frac{1}{2}\log|\mathbf{Q}| - \frac{\mathbf{x}'\mathbf{x}}{2\sigma^2(1-\phi)} + \frac{\sqrt{\phi}\mathbf{x}'\mathbf{u}}{\sigma (1-\phi)} - \frac{\phi \mathbf{u}'\mathbf{u}}{2\cdot(1-\phi)} - \frac{1}{2}\mathbf{u}'\mathbf{Q}\mathbf{u} \end{align*} \] The constant terms \(-\frac{n+\mathrm{rank}(\mathbf{Q})}{2}\log(2\pi)\) and \(\frac{1}{2}\log|\mathbf{Q}|\) typically need not be computed.
Load libraries and utility functions.
library(tidyverse)
library(INLA)
library(TMB)
The function `TMB
_compile_and_load()is a utility function that accepts a
TMBmodel as a string
code`, compiles and loads the model, and return a path to the DLL.
tmb_compile_and_load <- function(code) {
f <- tempfile(fileext = ".cpp")
writeLines(code, f)
TMB::compile(f)
dyn.load(TMB::dynlib(tools::file_path_sans_ext(f)))
basename(tools::file_path_sans_ext(f))
}
Use the version of Scottish lip cancer dataset used by Mitzi Morris for the tutorial on ICAR models in Stan.
source("https://raw.githubusercontent.com/stan-dev/example-models/master/knitr/car-iar-poisson/scotland_data.R")
scotlip <- data
The list scotlip
consists of items:
N
: number of regions,y
: observed counts of lip cancer cases per county,E
: the expected number of cases, used as an offset,x
: continuous covariate representing the proportion of the population employed in agriculture, fishing, or forestry (AFF),adj
: a vector of region ids for adjacent regions,weights
: weight for each edge (all 1
s),num
: the number of neighbors for each region, used to split adj
.Scale the covariate x
as in Morris tutorial.
scotlip$x_scaled <- 0.1 * scotlip$x
Parse the adjacency list into an adjacency matrix.
nblist <- cbind(rep(seq_len(scotlip$N), times = scotlip$num),
scotlip$adj)
adj <- matrix(0, nrow = scotlip$N, ncol = scotlip$N)
adj[nblist] <- 1
Construct the scaled structure matrix for ICAR model precision.
Q <- diag(rowSums(adj)) - adj
Q_scaled <- inla.scale.model(Q, constr = list(A = matrix(1, ncol = ncol(Q)), e = 0))
Prepare a list of TMB
model inputs.
tmbdata <- list(region = 1:scotlip$N,
y = scotlip$y,
x = scotlip$x_scaled,
E = scotlip$E,
Q = Q_scaled)
The TMB
model template mod1
below implements the model
\[
\begin{align}
y_i &\sim \mathrm{Poisson}(\eta_i) \\
\log \eta_i &= \beta_0 + \beta_1 \cdot x + b_i + \log E_i \\
\mathbf{b} &= \sigma \cdot \left(\sqrt{\phi}\cdot \mathbf{u} + \sqrt{1-\phi}\cdot \mathbf{v}\right) \\
\mathbf{u} &\sim N(0, \mathbf{Q}^{-1}) \\
\mathbf{v} &\sim N(0, \mathbf{I}) \\
\sigma &\sim N^{+}(0, 1) \\
\phi &\sim \mathrm{Beta}(0.5, 0.5)
\end{align}
\]
mod1 <- '
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_VECTOR(y);
DATA_VECTOR(x);
DATA_VECTOR(E);
DATA_SPARSE_MATRIX(Q); // Structure matrix for ICAR area model
Type val(0);
PARAMETER(beta0); // intercept
PARAMETER(beta1); // slope
// beta0 ~ 1
// beta1 ~ 1
PARAMETER(log_sigma); // marginal standard deviation
Type sigma(exp(log_sigma));
val -= dnorm(sigma, Type(0.0), Type(1.0), true) + log_sigma;
PARAMETER(logit_phi);
Type phi(invlogit(logit_phi));
val -= log(phi) + log(1 - phi); // change of variables: logit_phi -> phi
val -= dbeta(phi, Type(0.5), Type(0.5), true);
PARAMETER_VECTOR(u); // spatially correlated component
val -= Type(-0.5) * (u * (Q * u)).sum();
// soft sum-to-zero constraint
val -= dnorm(u.sum(), Type(0.0), Type(0.001) * u.size(), true);
PARAMETER_VECTOR(v); // unstructured component
val -= dnorm(v, Type(0.0), Type(1.0), true).sum();
// combined spatial effect
vector<Type> b(sigma * (sqrt(phi) * u + sqrt(1 - phi) * v));
vector<Type> mu(beta0 + beta1 * x + b + log(E));
val -= dpois(y, exp(mu), true).sum();
ADREPORT(sigma);
ADREPORT(phi);
ADREPORT(b);
ADREPORT(mu);
return val;
}
'
Compile and fit the TMB
model.
dll1 <- tmb_compile_and_load(mod1)
Initial values for parameters.
tmbpar1 <- list(beta0 = 0,
beta1 = 0,
log_sigma = 0,
logit_phi = 0,
u = numeric(scotlip$N),
v = numeric(scotlip$N))
Create TMB
object and optimise parameters.
obj1 <- TMB::MakeADFun(data = tmbdata,
parameters = tmbpar1,
random = c("beta0", "beta1", "u", "v"),
DLL = dll1)
tmbfit1 <- nlminb(obj1$par, obj1$fn, obj1$gr)
sdr1 <- TMB::sdreport(obj1)
Estimates for intercept and slope and hyperparameters.
summary(sdr1, "all") %>%
.[rownames(.) %in% c("log_sigma", "logit_phi", "beta0", "beta1"), ]
## Estimate Std. Error
## log_sigma -0.6863323 0.1647692
## logit_phi 1.8638959 1.4347334
## beta0 -0.1912772 0.1253730
## beta1 0.3771592 0.1305491
R-INLA
parameterisation: \(\mathbf{b}\) and \(\mathbf{u}\)The TMB
model template mod2
below implements the R-INLA
parameterisation
of the same model described in Section 3.4 of Reibler et al.:
\[
\begin{align}
y_i &\sim \mathrm{Poisson}(\eta_i) \\
\log \eta_i &= \beta_0 + \beta_1 \cdot x + b_i + \log E_i \\
(\mathbf{b}, \mathbf{u}) &\sim N(0, \mathbf{\Sigma}(\sigma, \phi)) \\
\sigma &\sim N^{+}(0, 1) \\
\phi &\sim \mathrm{Beta}(0.5, 0.5)
\end{align}
\]
where the precision matrix \(\mathbf{\Sigma}^{-1}(\sigma, \phi)\) is as defined above.
mod2 <- '
#include <TMB.hpp>
template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_VECTOR(y);
DATA_VECTOR(x);
DATA_VECTOR(E);
DATA_SPARSE_MATRIX(Q); // Structure matrix for ICAR area model
Type val(0);
PARAMETER(beta0); // intercept
PARAMETER(beta1); // intercept
// beta0 ~ 1
// beta1 ~ 1
PARAMETER(log_sigma); // marginal standard deviation
Type sigma(exp(log_sigma));
val -= dnorm(sigma, Type(0.0), Type(1.0), true) + log_sigma;
PARAMETER(logit_phi);
Type phi(invlogit(logit_phi));
val -= log(phi) + log(1 - phi); // change of variables: logit_phi -> phi
val -= dbeta(phi, Type(0.5), Type(0.5), true);
PARAMETER_VECTOR(b); // combined spatial effect
PARAMETER_VECTOR(u); // spatially correlated component
val -= dnorm(u.sum(), Type(0.0), Type(0.001) * u.size(), true); // soft sum-to-zero constraint
// -(n + rank(Q)) / 2 * log(2*pi) and log|Q| / 2 terms omitted
val -= -0.5 * b.size() * (2 * log_sigma + log(1 - phi)); // normalising constant
val -= -0.5 * (b * b).sum() / (sigma * sigma * (1 - phi));
val -= (b * u).sum() * sqrt(phi) / (sigma * (1 - phi));
val -= -0.5 * (u * u).sum() * phi / (1 - phi);
val -= -0.5 * (u * (Q * u)).sum();
vector<Type> mu(beta0 + beta1 * x + b + log(E));
val -= dpois(y, exp(mu), true).sum();
ADREPORT(sigma);
ADREPORT(phi);
ADREPORT(mu);
return val;
}
'
Compile and fit model 2.
dll2 <- tmb_compile_and_load(mod2)
tmbpar2 <- list(beta0 = 0,
beta1 = 0,
log_sigma = 0,
logit_phi = 0,
b = numeric(scotlip$N),
u = numeric(scotlip$N))
obj2 <- TMB::MakeADFun(data = tmbdata,
parameters = tmbpar2,
random = c("beta0", "beta1", "b", "u"),
DLL = dll2)
tmbfit2 <- nlminb(obj2$par, obj2$fn, obj2$gr)
sdr2 <- TMB::sdreport(obj2)
Estimates for intercept and slope and hyperparameters match exactly with the Morris parameterisation.
summary(sdr1, "all") %>%
.[rownames(.) %in% c("log_sigma", "logit_phi", "beta0", "beta1"), ]
## Estimate Std. Error
## log_sigma -0.6863323 0.1647692
## logit_phi 1.8638959 1.4347334
## beta0 -0.1912772 0.1253730
## beta1 0.3771592 0.1305491
summary(sdr2, "all") %>%
.[rownames(.) %in% c("log_sigma", "logit_phi", "beta0", "beta1"), ]
## Estimate Std. Error
## log_sigma -0.6863323 0.1647692
## logit_phi 1.8638959 1.4347334
## beta0 -0.1912772 0.1253730
## beta1 0.3771592 0.1305491
The scatter plot of point estimates and posterior standard deviation for the random effects \(b_i\) and \(u_i\) show these are also identical.
TMB
results with R-INLA
The TMB
models above conduct the same inference as R-INLA
using the options control.inla = list(strategy = "gaussian", int.strategy = "eb")
.
priors <- list(phi = list(prior = "logitbeta", params = c(0.5, 0.5)),
prec = list(prior = "logtnormal", params = c(0, 1.0)))
inla_formula <- y ~ 1 + x +
f(region, model = "bym2", graph = adj, hyper = priors, constr = TRUE)
inla_eb <- inla(inla_formula, family = "poisson", E = E,
data = tmbdata[c("region", "y", "x", "E")],
control.fixed = list(prec = 1/25, prec.intercept = 1/25),
control.inla = list(strategy = "gaussian", int.strategy = "eb"))
Estimates and standard errors for the fixed effects and hyperparameters from TMB
are close to the mode and sd reported by R-INLA
. The log precision is -2 times the log_sigma
parameter in the TMB
models.
inla_eb$internal.summary.hyperpar
inla_eb$summary.fixed
Point estimates for random effect estimates are exactly the same for TMB
and the R-INLA
empirical Bayes estimates.
The posterior standard deviations are highly correlated but slightly larger for the TMB
estimates than R-INLA
.
## Warning: Removed 112 rows containing missing values (geom_point).
The point estimates from the TMB
and R-INLA
models above are slightly different from those reported in the Morris tutorial using full Bayesian inference in Stan or R-INLA
.
These are reproduced by changing the strategy to control.inla = list(strategy="laplace")
.
inla_full <- inla(inla_formula, family = "poisson", E = E,
data = tmbdata[c("region", "y", "x", "E")],
control.fixed = list(prec = 1/25, prec.intercept = 1/25),
control.inla = list(strategy="laplace"))
inla_full$summary.fixed
inla_full$internal.summary.hyperpar
The package tmbstan
can be used to sample from the posterior distribution with the TMB
model objects created above.
library(tmbstan)
stanfit1 <- tmbstan(obj1, refresh = 0)
stanfit2 <- tmbstan(obj2, refresh = 0)
print(stanfit1, digits = 3, par = c("beta0", "beta1", "log_sigma", "logit_phi", "lp__"))
## Inference for Stan model: file18022163e345.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5%
## beta0 -0.220 0.003 0.124 -0.468 -0.301 -0.219 -0.137 0.024
## beta1 0.368 0.004 0.129 0.108 0.283 0.369 0.454 0.631
## log_sigma -0.678 0.007 0.160 -0.996 -0.783 -0.678 -0.574 -0.358
## logit_phi 3.110 0.094 2.267 -0.165 1.536 2.706 4.224 8.825
## lp__ -243.431 0.281 8.889 -261.710 -249.285 -243.222 -237.306 -226.838
## n_eff Rhat
## beta0 1283 1.001
## beta1 1121 1.002
## log_sigma 489 1.013
## logit_phi 581 1.009
## lp__ 1003 1.000
##
## Samples were drawn using NUTS(diag_e) at Wed Oct 19 13:48:51 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
print(stanfit2, digits = 3, par = c("beta0", "beta1", "log_sigma", "logit_phi", "lp__"))
## Inference for Stan model: file180224432a09.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5%
## beta0 -0.223 0.003 0.128 -0.474 -0.310 -0.224 -0.138 0.023
## beta1 0.373 0.004 0.133 0.103 0.286 0.378 0.460 0.627
## log_sigma -0.678 0.004 0.162 -1.003 -0.786 -0.675 -0.573 -0.359
## logit_phi 2.410 0.165 1.338 -0.085 1.467 2.363 3.299 5.017
## lp__ -81.471 4.016 31.951 -135.900 -105.626 -83.772 -60.819 -14.913
## n_eff Rhat
## beta0 1547 1.002
## beta1 1378 1.003
## log_sigma 1519 1.001
## logit_phi 66 1.063
## lp__ 63 1.069
##
## Samples were drawn using NUTS(diag_e) at Wed Oct 19 13:49:46 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The estimates from stanfit1
, using the Morris tutorial parameterisation, match closely to the full R-INLA
results.
For the second parameterisation, the \(\hat R\) estimate for the logit_phi
parameter is large and the number of effective samples is low, indicating poor convergence for this parameter.
This highlights that direct parameterisation of \(\mathbf{b}\) is not optimal for HMC inference, at least in this case.