Abstract
TMB
.knitr::opts_chunk$set(
echo = TRUE,
cache = TRUE,
autodep = TRUE,
cache.comments = FALSE
)
library(tidyverse)
library(TMB)
library(INLA)
library(rmarkdown)
midblue <- "#3D9BD0"
midgreen <- "#00855A"
midpink <- "#B3608E"
Consider a three-stage model \[\begin{alignat}{2} &\text{(Observations)} & y_i &\sim p(y_i \, | \, x_i, \theta), \quad i \in \mathcal{I}, \\ &\text{(Latent field)} & x &\sim \mathcal{N}(x \, | \, 0, Q(\theta)^{-1}), \\ &\text{(Hyperparameters)} & \qquad \theta &\sim p(\theta), \label{eq:hyper} \end{alignat}\] with posterior distribution \[\begin{equation} p(x, \theta \, | \, y) \propto p(y, x, \theta) = p(y \, | \, x, \theta) p(x \, | \, \theta) p(\theta). \end{equation}\]
After writing the negative log-target in a C++ file
model.cpp
it can be compiled and loaded into R:
compile("model.cpp")
dyn.load(dynlib("model"))
Let \(f(x, \theta) \equiv - \log p(y, x,
\theta)\) where \(y\) is assumed
to be fixed. This objective (f$fn
) and its derivative
(f$gn
) can be created (sketch) in TMB
using:
f <- MakeADFun(data = dat, parameters = param, DLL = "model")
There are four stages to replicating the INLA method.
Approximate the posterior marginal \(p(\theta \, | \, y) \propto L(\theta)\) using the Laplace approximation \(L(\theta) \approx L^\star(\theta)\): \[\begin{equation} \tilde p(\theta \, | \, y) \propto \frac{p(y \, | \, x, \theta) p(x \, | \, \theta) p(\theta)}{\tilde p_G(x \, | \, \theta, y)} \Big\rvert_{x = \mu(\theta)} \equiv L^\star(\theta) \end{equation}\]
Using TMB
if we integrate out a Gaussian approximation
to the latent field \(x\) using the
argument random = "x"
in MakeADFun
then the
result is the negative log of the Laplace approximation to the marginal
likelihood \(h(\theta) \equiv - \log
L^\star(\theta)\):
h <- MakeADFun(data = dat, parameters = param, DLL = "model", random = "x")
Therefore, exp(-h$fn(theta))
is proportional to \(\tilde p(\theta \, | \, y)\):
Integration points \(\{\theta_k\}\) with weights \(\sum_{k = 1}^K \Delta_k = 1\).
\(K = 1\) with integration point
\(\hat \theta = \text{argmin}_{\theta} (-\log
L^\star_f(\theta))\). This can be found by minimising
h
using optimisers like nlminb
or
optim
. Sometimes methods like this are referred to as “Empirical
Bayes” where the hyperparameters are estimated with a fixed point
estimate.
R-INLA
grid methodInstead of just taking a single point, we might want to explore the
posterior approximation to \(p(\theta \, | \,
y)\) more fully. In R-INLA
a few choices are
available as to how to do this, including the grid integration strategy
as follows:
TMB
using h\(env\)spHess`)Under this approach, all weights would be equal \(\Delta_k = 1 / K\). For a more detailed run through of the grid integration method, see the INLA two dimensional grid replication notebook.
We could also use an adaptive Gauss-Hermite quadrature (AGHQ) rule.
See the Implementing
Approximate Bayesian Inference using Adaptive Quadrature: the
aghq
Package paper. AGHQ is attractive because it is
automatic, works well for integrating densities close to being
Gaussian.
The simplest approach would be to directly obtain the marginals of \(\tilde p_{\text{G}}(x \, | \, \theta, y)\) from above.
Compute the Laplace approximation \(\tilde p_{\text{LA}}(x_i \, | \, \theta, y)\).
This is the key innovation in the 2009 INLA paper.
Combine the Laplace approximation to the hyperparameter posterior from Stage 1, the quadrature nodes and weights from Stage 2, and the chosen posterior latent field marginal approximation from Stage 3 together in one equation: \[\begin{equation} \tilde p(x_i \, | \, y) = \sum_{k = 1}^K \tilde p(x_i \, | \, \theta^{(k)}, y) \times \tilde p(\theta^{(k)} \, | \, y) \times \Delta^{(k)} \end{equation}\]
This is the example from the book Spatial
and Spatio-temporal Bayesian Models with R-INLA
.
Consider the following simple model for i.i.d. observations \(y = (y_1, \ldots, y_n)\) \[\begin{align}
y_i \, | \, x, \theta &\sim \mathcal{N}(x, 1/\theta), \\
x &\sim \mathcal{N}(x_0 = -3, 1/\tau_0 = 4), \\
\theta &\sim \text{Gamma}(a = 1.6, b = 0.4).
\end{align}\]
We start by loading the data manually:
y <- c(
1.2697, 7.7637, 2.2532, 3.4557, 4.1776, 6.4320, -3.6623, 7.7567, 5.9032, 7.2671,
-2.3447, 8.0160, 3.5013, 2.8495, 0.6467, 3.2371, 5.8573, -3.3749, 4.1507, 4.3092,
11.7327, 2.6174, 9.4942, -2.7639, -1.5859, 3.6986, 2.4544, -0.3294, 0.2329, 5.2846
)
n <- length(y)
y_bar <- mean(y)
The observations (ordered by index, though this has no interpretation) and prior (mean shown as dashed line, which is significantly different to the data):
data.frame(index = 1:30, y = y) %>%
ggplot(aes(x = index, y = y)) +
geom_point() +
geom_hline(yintercept = -3, col = "#666666", lty = "dashed") +
theme_minimal()
We use the following parameters of the prior distribution:
x_0 <- -3
tau_0 <- 1/4
a <- 1.6
b <- 0.4
R-INLA
This model is simple to fit using the R-INlA
software
directly:
formula <- y ~ 1
dat <- list(y = y)
theta_prior <- list(prec = list(prior = "loggamma", param = c(a, b)))
fit <- inla(
formula,
data = dat,
control.family = list(hyper = theta_prior),
control.fixed = list(mean.intercept = x_0, prec.intercept = tau_0)
)
By construction, the distribution \(p(x \, | \, \theta, y)\) is exact:
# Exact distribution of x given theta
inner_loop <- function(theta) {
tau_n <- n * theta + tau_0
x_n <- (theta * n * y_bar + tau_0 * x_0) / tau_n
return(list(x_n = x_n, tau_n = tau_n))
}
Functions with \(\theta > 0\) constraint and on the \(\log(\theta)\) scale to avoid this constraint:
nl_full_conditional_x <- function(x, theta, log_input = FALSE) {
if(log_input == TRUE) theta <- exp(theta) # Convert from log(theta) to theta
par <- inner_loop(theta)
return(-dnorm(x, par$x_n, sqrt(1 / par$tau_n), log = TRUE))
}
nl_post_marginal_theta <- function(theta, log_input = FALSE) {
target <- 0
if(log_input == TRUE) {
target <- target + theta # Increment by Jacobian correction (theta is l_theta here)
theta <- exp(theta) # Convert from log(theta) to theta
}
par <- inner_loop(theta)
# (^)
target <- target + -0.5 * log(par$tau_n) +
dgamma(theta, shape = a, rate = b, log = TRUE) + # theta prior
dnorm(par$x_n, x_0, sqrt(1 / tau_0), log = TRUE) + # x prior
sum(dnorm(y, par$x_n, sqrt(1 / theta), log = TRUE)) # y likelihood
return(-target)
}
Note that the code above in the section marked (^)
is
very similar to that in blangiardo.cpp
. Recall that the
Laplace approximation \(\tilde p(\theta \, |
\, y)\), which in this instance coincides with the exact
posterior, is given by \[\begin{equation}
\tilde p(\theta \, | \, y) \propto \frac{p(y, \mu(\theta),
\theta)}{\det(Q(\theta))^{1/2}}.
\end{equation}\] This form can be seen in (^)
:
dgamma
, dnorm
and
sum of dnorm
evaluated at \(\mu(\theta)\) which here is simply
par$x_n
(for the Normal distribution the mean of the
distribution is the same as the mode).-0.5 * log(par$tau_n)
.It should be the case that inner_loop
produces parameter
values for the latent field which are those found by TMB
using the optimisation inner loop (hence the name).
de_nl <- function(f, ...) exp(-f(...)) # Versions which are not negative logarithms
full_conditional_x <- function(x, theta, log_input = FALSE) {
de_nl(nl_full_conditional_x, x, theta, log_input = FALSE)
}
post_marginal_theta <- function(theta, log_input = FALSE) {
de_nl(nl_post_marginal_theta, theta, log_input = FALSE)
}
Simple grids (one dimensional):
trapezoid_rule <- function(x, spacing) {
0.5 * spacing * (2 * sum(x) - x[1] - x[2])
}
eval_grid <- function(grid = NULL, spacing = NULL, uniform = FALSE, K = NULL, min = NULL, max = NULL, f) {
if(uniform) {
grid <- seq(min, max, length.out = K)
spacing <- (max - min) / K
}
df <- data.frame(input = grid, output = sapply(grid, f))
df <- mutate(df, norm_output = output / trapezoid_rule(output, spacing))
return(df)
}
blangiardo_theta <- eval_grid(
uniform = TRUE,
K = 25, min = 0.001, max = 0.3,
f = post_marginal_theta
)
dense_theta <- eval_grid(
uniform = TRUE,
K = 500, min = 0.001, max = 0.3,
f = post_marginal_theta
)
ggplot(dense_theta) +
geom_line(aes(x = input, y = output), col = midblue) +
geom_point(data = blangiardo_theta, aes(x = input, y = output), shape = 4) +
theme(axis.text.y=element_blank()) +
labs(title = "Blangiardo integration points for the hyperparameters") +
theme_minimal()
Optimisation on the log scale (to avoid constrained optimisation):
its <- 1000
r_nlminb <- nlminb(start = 0,
objective = nl_post_marginal_theta,
log_input = TRUE,
control = list(iter.max = its, trace = 0))
r_optim <- optim(par = 0,
fn = nl_post_marginal_theta,
log_input = TRUE,
method = "Brent",
lower = -100, # Have to specify upper and lower bounds
upper = 100, # when using optimize (1D function)
control = list(maxit = its, trace = 0))
c(r_nlminb$par, r_optim$par) # The same
## [1] -2.640129 -2.640129
r_opt <- r_nlminb
For INLA’s grid strategy, starting from the mode, we take steps of size \(\delta_z\) checking that each point meets the criteria \[\begin{equation} \log \tilde p(\theta(0) \, | \, y) - \log \tilde p(\theta(z) \, | \, y) < \delta_\pi. \label{eq:criteria} \end{equation}\] In the one dimensional case there is no need to do the \(z\)-parametrisation. Choosing \(\delta_z\) and \(\delta_\pi\) based upon manual tuning1:
delta_z <- 0.05
delta_pi <- 3
The following is very inefficient R programming but just an idea as to how it could be done:
# Increasing
points <- c(r_opt$par) # On the log theta scale
i <- 0
condition <- TRUE
while(condition) {
i <- i + 1
proposal <- r_opt$par + i * delta_z
statistic <- nl_post_marginal_theta(theta = proposal, log_input = TRUE) -
nl_post_marginal_theta(theta = r_opt$par, log_input = TRUE)
condition <- (statistic < delta_pi)
if(condition){
points <- c(points, proposal)
}
}
# Decreasing
i <- 0
condition <- TRUE
while(condition) {
i <- i + 1
proposal <- r_opt$par - i * delta_z
statistic <- nl_post_marginal_theta(theta = proposal, log_input = TRUE) -
nl_post_marginal_theta(theta = r_opt$par, log_input = TRUE)
condition <- (statistic < delta_pi)
if(condition){
points <- c(points, proposal)
}
}
inla_theta <- eval_grid(exp(points), f = post_marginal_theta, spacing = delta_z)
ggplot(dense_theta) +
geom_line(aes(x = input, y = output), col = midblue) +
geom_point(data = inla_theta, aes(x = input, y = output), shape = 4) +
theme(axis.text.y=element_blank()) +
labs(title = "INLA integration points for the hyperparameters") +
annotate(
"text", x = 0.175, y = 5e-41,
label = "Now the points are concentrated \n where there is higher density"
) +
theme_minimal()
One of the INLA vignettes discusses how the user can set their own
integration points – see browseVignettes(package = "INLA")
.
The integration points that R-INLA
uses can be found using
fit$joint.hyper
:
internals_inla_theta <- data.frame(
input = exp(fit$joint.hyper$`Log precision for the Gaussian observations`),
output = exp(fit$joint.hyper$`Log posterior density`)
)
internals_inla_theta$input
## [1] 0.12662611 0.07130574 0.04015372 0.05888323 0.03315835 0.08634902 0.02738167 0.10456596 0.15334025 0.04862490
We take the set of points defined by inla_theta
to be
our \(\{\theta^{(k)}\}\). \(K\) is given by length(points)
which equals 26.
Define a function nl_joint_post
by taking the negative
log of the joint posterior \(p(x, \theta \, |
\, y) = p(x \, | \, \theta, y) p(\theta \, | \, y)\) to give
\[\begin{equation}
- \log p(x, \theta \, | \, y) = - \log p(x \, | \, \theta, y) - \log
p(\theta \, | \, y).
\end{equation}\]
nl_joint_post <- function(x, theta, log_input = FALSE) {
nl_full_conditional_x(x, theta, log_input) +
nl_post_marginal_theta(theta, log_input)
}
For any given input \(x\) you can do quadrature according to \[\begin{equation} \tilde p(x \, | \, y) = \sum_{k = 1}^K p(x \, | \, \theta^{(k)}, y) \times p(\theta^{(k)} \, | \, y) \times \Delta^{(k)}, \end{equation}\] but how should the range of \(x\) be chosen? Blangiardo use a naive grid for demonstration:
x_start <- -8
x_end <- 5
x_out <- 50
blangiardo_x <- seq(x_start, x_end, length.out = x_out)
Modify nl_joint_post
to accept and return vectors using
the base R function Vectorize
, then apply the outer
product:
v_nl_joint_post <- Vectorize(nl_joint_post, vectorize.args = c("x", "theta"))
nl_joint_post <- outer(blangiardo_x, points, v_nl_joint_post, log_input = TRUE)
50 rows (the number of \(x\) integration points) and 26 columns (the number of \(\theta\) integration points):
dim(nl_joint_post)
## [1] 50 26
The \(x\) and \(y\)-axis do not correspond to values here, just indices:
reshape2::melt(nl_joint_post) %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
theme_minimal() +
labs(fill = "", x = "x", y = "theta") +
theme(legend.position = "none")
Calculating the log posterior by integrating over the hyperparameters for each value of \(x\):
K <- length(points)
nl_post_marginal_x <- rowSums(nl_joint_post) / K
df <- data.frame(
x = blangiardo_x,
log_post = -1 * nl_post_marginal_x
)
ggplot(df, aes(x = x, y = log_post)) +
geom_line() +
theme_minimal() +
labs(x = "x", y = "log(Posterior)")
Try normalising:
x_spacing <- (x_end - x_start) / x_out
safe_exp <- function(x) exp(x - max(x))
df$post <- safe_exp(df$log_post)
norm <- trapezoid_rule(df$post, x_spacing)
df$norm_post <- df$post / norm
ggplot(df, aes(x = x, y = norm_post)) +
geom_line() +
theme_minimal() +
labs(x = "x", y = "Posterior")
TMB
C++ for the negative log joint posterior is given by:
// blangiardo.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_VECTOR(y);
// Parameter block
PARAMETER(x);
PARAMETER(l_theta);
// Transformed parameters block
Type sigma = sqrt(1 / exp(l_theta));
Type nll;
nll = Type(0.0);
// Priors
nll -= dlgamma(l_theta, Type(1.6), Type(1 / 0.4), true);
nll -= dnorm(x, Type(-3.0), Type(2.0), true);
// Likelihood
nll -= dnorm(y, x, sigma, true).sum();
// ADREPORT
ADREPORT(x)
ADREPORT(sigma);
return(nll);
}
TMB::compile("blangiardo.cpp")
dyn.load(dynlib("blangiardo"))
param <- list(x = 0, l_theta = 0)
h
with Laplace approximation is a function of \(\dim(\theta)\) inputs, integrating out
\(x\) by specifying
random = c("x")
:
h <- MakeADFun(
data = dat,
parameters = param,
random = c("x"),
DLL = "blangiardo",
hessian = TRUE,
silent = TRUE
)
f
without Laplace approximation is a function of \(\dim(\theta) + \dim(x)\) inputs:
f <- MakeADFun(
data = dat,
parameters = param,
DLL = "blangiardo",
hessian = TRUE,
silent = TRUE
)
Optimisation using nlminb
and optim
gets
the same results:
its <- 1000
tmb_nlminb <- nlminb(
start = h$par,
objective = h$fn,
gradient = h$gr,
control = list(iter.max = its, trace = 0)
)
tmb_optim <- optim(
par = h$par,
fn = h$fn,
gr = h$gn,
method = "Brent",
lower = -100, # Have to specify upper and lower bounds
upper = 100, # when using optimize (1D function)
control = list(maxit = its, trace = 0)
)
c(tmb_optim$par, tmb_nlminb$par) # The same
## l_theta
## -2.640129 -2.640129
tmb_opt <- tmb_nlminb
sd_out <- sdreport(h, par.fixed = tmb_opt$par, getJointPrecision = TRUE)
Comparison of the posterior mode and standard deviation of \(\log(\theta)\):
#' For some reason the mode in the INLA result below is NA, how to fix this?
#' According to Havard, this is a feature not a bug, and he hasn't found a good way to compute the mode yet:
#' https://groups.google.com/g/r-inla-discussion-group/c/mbSsNO_bydo
kable_data <- data.frame(
"Mode" = c(NA, r_opt$par, tmb_opt$par),
"Variance" = c(fit$internal.summary.hyperpar$sd^2, sd_out$cov.fixed, NA)
)
rownames(kable_data) <- c("$\\texttt{R-INLA}$", "R", "$\\texttt{TMB}$")
kableExtra::kable(kable_data, booktabs = TRUE, escape = FALSE, align = "c")
Mode | Variance | |
---|---|---|
\(\texttt{R-INLA}\) | NA | 0.0665250 |
R | -2.640129 | 0.0651227 |
\(\texttt{TMB}\) | -2.640129 | NA |
What about for \(x\):
sd_out$par.random # TMB
## x
## 2.679973
fit$summary.fixed # INLA
plot(fit$marginals.fixed$`(Intercept)`, type = "l")
abline(v = sd_out$par.random)
Could make a version of h$fn
on the same scale as
post_marginal_theta
for a plot:
tmb_post_marginal_theta <- function(theta) {
de_nl(h$fn, x = log(theta))
} # (Have tested that this function does as it should)
tmb_dense_theta <- eval_grid(
uniform = TRUE,
K = 500, min = 0.001, max = 0.3,
f = tmb_post_marginal_theta
)
# Normalised dense_theta and tmb_dense_theta since they include different constants
ggplot(dense_theta) +
geom_line(aes(x = input, y = norm_output), col = midblue) +
geom_line(data = tmb_dense_theta, aes(x = input, y = norm_output), col = midpink) +
theme(axis.text.y = element_blank()) +
annotate("text", x = 0.1, y = 0.01, label = "TMB", col = midpink) +
annotate("text", x = 0.045, y = 0.012, label = "R", col = midblue) +
theme_minimal()
h$fn
the same as in RTesting to see that h$fn
is the same as
nl_post_marginal_theta
with log_input = TRUE
up to a constant:
vals <- seq(-5, 0, by = 1)
tmb_vals <- sapply(vals, h$fn) # Replications show it not to be stochastic
r_vals <- sapply(vals, nl_post_marginal_theta, log_input = TRUE)
rbind(tmb_vals, r_vals, tmb_vals - r_vals)
## [,1] [,2] [,3] [,4] [,5] [,6]
## tmb_vals 115.9473778 103.3850776 94.9432186 98.0263651 133.1950528 256.0811622
## r_vals 116.8663164 104.3040161 95.8621572 98.9453036 134.1139914 257.0001008
## -0.9189385 -0.9189385 -0.9189385 -0.9189385 -0.9189385 -0.9189385
get_tmb_inner_optimised_value <- function(l_theta) {
sd_out <- sdreport(h, par.fixed = l_theta)
reml <- sd_out$par.random # The REML estimator for x from TMB
return(reml)
}
tmb_inner <- sapply(vals, get_tmb_inner_optimised_value)
r_inner <- sapply(exp(vals), function(theta) inner_loop(theta)$x_n)
rbind(tmb_inner, r_inner)
## x x x x x x
## tmb_inner -0.1640675 1.359735 2.433834 2.975408 3.202838 3.290922
## r_inner -0.1640675 1.359735 2.433834 2.975408 3.202838 3.290922
Looks like this part is correct.
It would be good to know how INLA selects these numbers in general↩︎