Implementing simplified INLA into `aghq`

Adam Howes

Abstract

Background Wood (2020) simplify the INLA method by approximating the latent field posterior marginals using a reduced cost Laplace approximation which does not rely on sparsity assumptions. In spatio-temporal statistics, usually the latent field corresponds to spatio-temporal locations – so these marginal posteriors are of central scientific interest.

Task We implement the method in R making use of Template Model Builder (TMB) for Gaussian approximations, in a way compatible with the aghq package for adaptive Gauss-Hermite quadrature. We are interested in efficiently approximating latent field posterior marginals $p(x_i \, | \, \theta, y) \approx \tilde p(x_i \, | \, \theta, y)$ for each index $i$. To build up to implementing the simplified INLA approach, we will start with two simpler methods (1) taking the marginal of a joint Gaussian approximation (2) doing a full Laplace approximation.

Warning: this report was created during the process of learning, and for that reason is not concise or well-structured. I would not recommend you reading it, but if you do proceed at your own caution!

1 Background

An AGHQ approximation to the joint posterior of the latent field is given by \[ \tilde p(x \, | \, y) = \sum_z \tilde p_\text{G}(x \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z), \tag{1.1} \] where $\theta(z) \in \Theta$ and $\omega(z) \in \mathbb{R}$ are nodes and weights generated by adaption to $\tilde p_\text{LA}(\theta \, | \, y)$. In particular, let \[ \mathcal{Q}(1, k) = \{z_d \in \mathbb{R}: H_k(z_d) = 0\} \] be the set of nodes in one-dimension, exactly the zeros of the Hermite polynomials $H_k(z_d)$ for $k \in \mathbb{N}$, and \[ \omega(z_d) = \frac{k!}{H_{k + 1}(z_d)^2 \times \phi(z_d)}, \] be the associated weights. Define $\mathcal{Q}(d, k) = \mathcal{Q}(1, k)^d$ to be the set of nodes $z$ in $d$-dimensions, of which there will be $k^d$. Then these points are adapted via $\theta(z) = \hat \theta + Lz$, where $\hat \theta$ is the mode and $L$ is the lower Cholesky triangle of the curvature at the mode such that $LL^\top = H^{-1}$.

This can be evaluated at any point, though in practice we work with it by drawing samples $x \sim p_\text{G}(x \, | \, \theta, y)$ using a method from sampling from a Gaussian given its mode and precision from Rue (in particular you solve a linear equation using the lower Cholesky triangle of the curvature at a randomly chosen quadrature point). For the $i$th marginal, we just take the appropriate samples from the joint. The approximation is therefore \[ \begin{equation} \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{G}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z), \tag{1.2} \end{equation} \] where $p_\text{G}(x_i \, | \, \theta, y)$ just takes the $i$th elements of the mode vector and $(i, i)$th element of the covariance matrix.

To understand how this is implemented in aghq, the two most important functions are aghq::marginal_laplace_tmb and aghq::sample_marginal. aghq::marginal_laplace_tmb calculates an object representing $\tilde p(x \, | \, \theta, y)$. aghq::sample_marginal draws samples from this Gaussian, given values of $z$ drawn from a multinomial distribution, in order to evaluate Equation (1.2). For a line-by-line walkthrough of how these functions work, see the notebook Code walkthrough (which uses the model from the notebook Epilepsy example).

The goal of this notebook (Implementing simplified INLA into aghq) is to create a new approximation swapping out $\tilde p_\text{G}(x_i \, | \, \theta, y)$ from the above for the simplified INLA approximation $\tilde p_{\text{SINLA}}(x_i \, | \, \theta, y)$ as follows \[ \tilde p(x_i \, | \, y) = \sum_z \tilde p_{\text{SINLA}}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z). \] We want to do this because the new approximation should be more accurate than taking marginals of a joint Gaussian approximation. At the extreme, you could just compute a Laplace approximation for each marginal, integrating out all other elements of the latent field, $x_{-i}$, together with the fixed effects $\theta$. This would be an accurate approach, but not computationally feasible for large models. What we hope to achieve is an approximation which gets to most of the accuracy benefits of the Laplace approximation approach, but without the high computational cost.

2 Ground truth

We will use Model 1 from the Prevalence, ANC, ART model inference case-study as a running example. To provide a ground truth to compare the methods that follow to, we generate “gold-standard” inferences using a relatively long run of MCMC as implemented by tmbstan.

We begin by loading in testing data, as generated by prev-anc-art_sim. sim_data contains many replicates of simulated data. Here we only need one, so will will just take the first as given by:

sim_data <- readRDS("depends/sim_data.rds")
data <- sim_data[[1]]
dat <- list(n = data$n, y_prev = data$y_prev, m_prev = data$m_prev)
dat

## $n
## [1] 36
## 
## $y_prev
##  [1] 35  7 34 19 41 20 32 12 22 17 13 53 16 13 35 12 24 17 20 34 25 19 35 13 22 15 14 21 55 52  8 45 11 12 21 24
## 
## $m_prev
##  [1] 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250

Compile and load the template (see Appendix 6 for the TMB code):

compile("model1.cpp")

## [1] 0

dyn.load(dynlib("model1"))

We create an objective function h integrating the random effects $x$ (which here correspond to the spatial random effects phi_prev). The values of param intitialise the objective function:

param <- list(
  beta_prev = -2,
  phi_prev = rep(0, data$n),
  log_sigma_phi_prev = -1
)

h <- MakeADFun(
  data = dat,
  parameters = param,
  random = "phi_prev",
  DLL = "model1",
  silent = TRUE
)

It is simple to fit the model using MCMC via the tmbstan package. We use 4 chains each of 5000 iter. Note that although h is passed to tmbstan::tmbstan, the Laplace approximation is disabled by setting laplace = FALSE by default (useful to know in case it’s ever of interest to sample from the Laplace approximation).

mcmc <- tmbstan::tmbstan(h, chains = 4, iter = 5000, refresh = 0)

The posterior marginal $p(\phi_1 \, | \, y)$ can be examined by extracting the corresponding Markov chains, and plotting a histogram of the draws:

samples_tmbstan <- extract(mcmc, pars = "phi_prev[1]")[[1]]

plot_pdf <- ggplot(data.frame(x = samples_tmbstan), aes(x = x)) +
    geom_histogram(aes(y = ..density..), alpha = 0.8, fill = cbpalette[7]) +
    theme_minimal() +
    labs(x = "phi_prev[1]", y = "Posterior PDF")

ecdf_tmbstan <- ecdf(samples_tmbstan)
grid <- seq(min(samples_tmbstan), max(samples_tmbstan), by = 0.1)

plot_cdf <- data.frame(x = grid, y = ecdf_tmbstan(grid)) %>%
  ggplot(aes(x = x, y = y)) +
    geom_line(col = cbpalette[7]) +
    theme_minimal() +
    labs(x = "phi_prev[1]", y = "Posterior CDF")

cowplot::plot_grid(plot_cdf, plot_pdf)

3 Marginal of joint Gaussian approximation

Now we will compute Equation (1.1) taking the marignals of a joint Gaussian approximation $p_\text{G}(x \, | \, \theta, y)$. This is the approach currently used in the aghq package (done implicitly, via sampling). This approach is also used by method = "gaussian" in R-INLA.

3.1 $\tilde p(\theta \, | \, y)$

The Gaussian approximation $p_\text{G}(x \, | \, \theta, y)$ is previously used in computing the Laplace approximation to $p(\theta \, | \, y)$ \[ \begin{equation} \tilde p(\theta \, | \, y) \propto \frac{p(x, \theta, y)}{\tilde p_\text{G}(x \, | \, \theta, y)} \Big\rvert_{x = \hat x(\theta)} = \frac{p(y, \hat x(\theta), \theta)}{\det \left(\frac{1}{2 \pi} H(\theta) \right)^{1/2}}, \tag{3.1} \end{equation} \] such that one easy way to calculate it is to set random = "x" in the TMB template to integrate the latent field out, then recover the Gaussian approximation from there, as we have done in the function h above. Let’s optimise h to obtain estimates $\hat \theta$ which maximise $\tilde p(\theta \, | \, y)$:

its <- 1000

opt_theta <- nlminb(
  start = h$par,
  objective = h$fn,
  gradient = h$gr,
  control = list(iter.max = its, trace = 0)
)

The values opt_theta$par below correspond to $\hat \theta$:

opt_theta$par

##          beta_prev log_sigma_phi_prev 
##         -2.3486271         -0.6141847

We can also look at the parameters – that is, mode vector and precision matrix – of the Gaussian approximation at the mode opt_theta$par above. Note that in aghq, what we would do is create a quadrature grid, then get the mode vector and precision matrix for each value of the quadrature grid over $\theta$. By just sticking to the mode of $\theta$, we are effectively doing aghq with $k = 1$ in each dimension, i.e. one grid point, corresponding to empirical Bayes type of inference.

#' Notice that last.par contains opt_theta$par as the fixed effects
#' It's the last.par because the last thing we've done with the template is the optimsation
#' This seems a pretty shakey way to go about things, but here we are...
mm <- h$env$last.par

#' These are the indices of the random effects that we want
h$env$random

##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

x_hat <- mm[h$env$random]
x_hat <- rlang::duplicate(x_hat) #' Insurance...
length(x_hat) #' The length is 30

## [1] 36

#' Hessian evaluated at the mode
x_H <- h$env$spHess(mm, random = TRUE)
x_H <- rlang::duplicate(x_H)
dim(x_H) #' With dimension [30 x 30]

## [1] 36 36

3.2 Additional checks

Description

These tabs contain additional sanity checks used to build understanding. Feel free to skip them.

Reproducing output from `h`

We would like to reproduce h using the right hand side of Equation (3.1). To do this, we start by creating an objective function f (see Appendix 6 for the TMB code) to evaluate $- \log p(x, \theta, y)$ (without any Laplace approximations!). This is what we will be using for the numerator of Equation (3.1):

f <- MakeADFun(
  data = dat,
  parameters = param,
  DLL = "model1",
  silent = TRUE
)

Given $\theta$, we compute the mode and H of $x$ (corresponding to phi_prev) using h. We then concatenate and evaluate using f to get the numerator, multiplying by -1 because f$fn gives the negative log-likelihood, and we want the log-likelihood. In the denominator, we use determinant on the log-scale as it is more numerically stable than e.g. log(det(...)).

calculate_rhs <- function(theta) {
  #' Calculate mode and H for those theta
  #' TMB template remembers what you've called -- spooky
  h$fn(theta)
  mm <- h$env$last.par
  mode <- mm[h$env$random]
  H <- h$env$spHess(mm, random = TRUE)
  
  #' Append the calculated latent field mode
  param <- theta
  param$phi_prev <- as.numeric(mode)
  param <- param[c("beta_prev", "phi_prev", "log_sigma_phi_prev")]
  
  #' Evaluate the log-numerator of the RHS
  numer <- -1 * f$fn(unlist(param))
  
  #' Evaluate the log-denominator of the RHS
  denom <- 0.5 * (determinant(H, logarithm = TRUE)$modulus - (nrow(H) * log(2 * pi)))
  
  #' Return their difference
  numer - denom
}

theta <- param
theta$phi_prev <- NULL #' Setting the value of x to be NULL

calculate_rhs(theta)

## [1] -155.5523
## attr(,"logarithm")
## [1] TRUE

As compared with what we got from h:

-1 * h$fn(unlist(theta))

## [1] -155.5523
## attr(,"logarithm")
## [1] TRUE

3.3 $\tilde p(x_i \, | \, y)$

Now let’s have a go at computing Equation (1.2) so that we can compare to the posterior distribution from tmbstan we obtained in Section 2. As described above, we simplify matters by considering just one quadrature point, $\hat \theta$, such that the sum over $z$ collapses to: \[ \begin{align} \tilde p(x_i \, | \, y) &= \tilde p_\text{G}(x_i \, | \, \hat \theta, y) \tilde p_\text{LA}(\hat \theta \, | \, y) \\ &\propto \tilde p_\text{G}(x_i \, | \, \hat \theta, y) \end{align} \]

We can evaluate this for $i = 1$ via:

i <- 1
mode_i <- as.numeric(x_hat[i])
var_i <- solve(x_H)[i, i]

Because this is a Gaussian distribution, we do not have do normalise it (as would usually be the case) because we know it’s a density. We can compare it to our MCMC results as follows:

plot_pdf2 <- plot_pdf +
  stat_function(
    data = data.frame(x = c(0.5, 3.5)), aes(x, col = "Joint Gaussian marginal"), fun = dnorm, n = 101, 
    args = list(mean = mode_i, sd = sqrt(var_i))
  ) +
  labs(x = "prev_phi[1]", y = "Posterior", col = "Approximation") +
  scale_colour_manual(values = cbpalette) +
  theme_minimal()

M <- 1000
samples_gaussian <- rnorm(M, mode_i, sqrt(var_i)) 

ecdf_gaussian <- ecdf(samples_gaussian)
grid <- seq(min(samples_gaussian), max(samples_gaussian), by = 0.1)

plot_cdf2 <- plot_cdf +
  geom_line(
    data = data.frame(x = grid, y = ecdf_gaussian(grid)),
    aes(x = x, y = y), col = cbpalette[1]
  )

cowplot::plot_grid(plot_cdf2, plot_pdf2, rel_widths = c(1, 1.5))

ks_gaussian_tmbstan <- inf.utils::ks_test(samples_tmbstan, samples_gaussian)
ks_gaussian_tmbstan

## $D
## [1] 0.0617
## 
## $l
## [1] 0.6635031

The maximum difference between the two ECDFS is 0.0617 which occurs at the point 0.6635031.

4 Full Laplace approximation

In this section we will continue to use the approximation (3.1) for $\tilde p(\theta \, | \, y)$.

4.1 For one index $i$

4.1.1 $\tilde p(x_i \, | \, \theta, y)$

For the approximation to the latent field marginal posteriors, another approach we could take is to calculate the full Laplace approximation \[ \begin{align*} \tilde p(x_i \, | \, \theta, y) &\propto \frac{p(x, \theta, y)}{p_\text{G}(x_{-i} \, | \, x_i, \theta, y)} \Big\rvert_{x_{-i} = \hat x_{-i}(x_i, \theta)} \tag{4.1} \\ &= \frac{p(y, x_i, \hat x_{-i}(x_i, \theta), \theta)}{\det(\frac{1}{2 \pi} H_{-i}(x_i, \theta))^{1/2}} \end{align*} \] integrating out $x_{-i}$ with a Gaussian approximation. This is not practical in some settings as it involves recomputing the Gaussian approximation $p_\text{G}(x_{-i} \, | \, x_i, \theta, y)$ for each value of $(x_i, \theta)$ which can be too computationally expensive. We will still go through the steps for implementing it here though for demonstration. With of a bit of effort, it can be implemented in TMB by passing in data where elements $i$ and $-i$ of the latent field are explicitly named (see Appendix 6 for the TMB code):

compile("model1_index.cpp")

## [1] 0

dyn.load(dynlib("model1_index"))

The prepare_dat function takes dat and makes it compatible with model1_index for particular choice of marginal i.

prepare_dat <- function(dat, i) {
  dat[["y_prev_i"]] <- dat$y_prev[i]
  dat[["y_prev_minus_i"]] <- dat$y_prev[-i]
  dat[["m_prev_i"]] <- dat$m_prev[i]
  dat[["m_prev_minus_i"]] <- dat$m_prev[-i]
  dat[c("n", "y_prev_i", "y_prev_minus_i", "m_prev_i", "m_prev_minus_i")]
}

Let’s consider the first marginal, that is $\tilde p_\text{LA}(x_1 \, | \, \theta, y)$. In the call to MakeADFun we set random = "phi_prev_minus_i" to integrate all of the random effects but that of the particular index.

i <- 1

param_i <- list(
  beta_prev = 0,
  phi_prev_i = 0,
  phi_prev_minus_i = rep(0, data$n - 1),
  log_sigma_phi_prev = 0
)

dat_i <- prepare_dat(dat, i)

#' random are integrated out with a Laplace approximation
obj <- MakeADFun(
  data = dat_i,
  parameters = param_i,
  random = "phi_prev_minus_i",
  DLL = "model1_index",
  silent = TRUE
)

opt <- nlminb(
  start = obj$par,
  objective = obj$fn,
  gradient = obj$gr,
  control = list(iter.max = its, trace = 0)
)

The mode and precision matrix for the Gaussian approximation $\tilde p_\text{G}(x_{-i} \, | \, x_i, \theta, y)$ are:

get_mode_and_H <- function(obj) {
  mm <- obj$env$last.par
  mode <- mm[obj$env$random]
  mode <- rlang::duplicate(mode)
  H <- obj$env$spHess(mm, random = TRUE)
  H <- rlang::duplicate(H)
  return(list(mode = mode, H = H))
}

xminusi <- get_mode_and_H(obj)

Check here that obj$fn indeed takes as input $(x_i, \theta)$:

transform_param <- function(param) {
  list(
    beta_prev = param$beta_prev,
    phi_prev_i = param$phi_prev[i],
    phi_prev_minus_i = param$phi_prev[-i],
    log_sigma_phi_prev = param$log_sigma_phi_prev
  )
}

transformed_param <- transform_param(param)
transformed_param$phi_prev_minus_i <- NULL #' Remove the integrated out effects

#' transformed_param contains certain values for (x_i, theta) suitable for input to obj$fn
obj$fn(transformed_param)

## [1] 156.06
## attr(,"logarithm")
## [1] TRUE

4.1.2 $\tilde p(x_i \, | \, y)$

Let’s have a go at computing Equation (1.2) so that we can compare to the posterior distribution from tmbstan. We simplify matters by considering just one quadrature point, $\hat \theta$, such that the sum over $z$ collapses to¹: \[ \begin{align} \tilde p(x_i \, | \, y) &= \tilde p_\text{LA}(x_i \, | \, \hat \theta, y) \tilde p_\text{LA}(\hat \theta \, | \, y) \\ &\propto \tilde p_\text{LA}(x_i \, | \, \hat \theta, y) \end{align} \]

A basic way we could approach this is to define a grid of points that we would like to evaluate the posterior at. We can then create a dataframe where each row contains the values opt_theta$par for $\theta$ and a grid value for $x_i$. Each row can then be passed into -obj$fn() to calculate the log-posterior. Note that it’s crucial the arguments are in the right order when we pass them to obj$fn() – this has caused headaches!

spacing <- 0.05
x_grid <- seq(from = -5, to = 5, by = spacing)
x_grid_length <- length(x_grid)

df <- t(opt_theta$par) %>%
  as.data.frame() %>%
  slice(rep(1:n(), each = x_grid_length)) %>%
  mutate(phi_prev_i = x_grid) %>%
  rowwise() %>%
  mutate(logpost = -obj$fn(c(beta_prev, phi_prev_i, log_sigma_phi_prev)))

We’ve computed the unnormalised log-posterior – how can we go from that to a normalised posterior? Given these grid of points, we can do the trapezoid rule here.

Safe exponential trapezoid rule

The safe exponential function computes $\exp(x - \text{max}(x))$.

safe_exp <- function(x) exp(x - max(x))

A simple way to write the trapezoid rule is as follows:

trapezoid_rule <- function(x, spacing) {
  0.5 * spacing * (2 * sum(x) - x[1] - x[2])
}

It can also be defined like this, which is more convenient for thinking about it like other quadrature rules based on weights:

trapezoid_rule <- function(x, spacing) {
  w <- rep(spacing, length(x))
  w[1] <- w[1] / 2
  w[length(x)] <- w[length(x)] / 2
  sum(w * x)
}

df$post <- safe_exp(df$logpost)
norm_trapezoid <- trapezoid_rule(df$post, spacing = spacing)
df$post_norm <- df$post / norm_trapezoid

Check that this method correctly computes the normalising constant for a density (which is known to integrate to one):

abs(trapezoid_rule(dnorm(seq(-10, 10, by = 0.1)), spacing = 0.1) - 1) < 10E-6

## [1] TRUE

`logSumExp` trapezoid rule

What about with another method? We can calculate the logarithm of the normalising constant directly, then just subtract it from the log-posterior:

\[ \begin{align} \tilde f(x) &= \frac{f(x)}{\int f(x) \text{d}x} = \frac{f(x)}{C} \implies \\ \log\tilde f(x) &= \log f(x) - \log C \end{align} \]

The matrixStats::logSumExp function computes $\text{LSE}(x) = \log(\exp(x_1) + \cdots + \exp(x_n))$ in a computationally stable way. Another way to compute the trapezoid rule, taking input on the log-scale, is as follows:

trapezoid_rule_log <- function(x, spacing) {
  w <- rep(spacing, length(x))
  w[1] <- w[1] / 2
  w[length(x)] <- w[length(x)] / 2
  matrixStats::logSumExp(log(w) + x)
}

norm_trapezoid_log <- trapezoid_rule_log(df$logpost, spacing)
df$post_norm_alt <- exp(df$logpost - norm_trapezoid_log)

Check that this method correctly computes the normalising constant for a density (which is known to integrate to one):

abs(trapezoid_rule_log(dnorm(seq(-10, 10, by = 0.1), log = TRUE), spacing = 0.1) - log(1)) < 10E-6

## [1] TRUE

4.1.3 Normalisation with `aghq`

The two ways of calculating the normalised posterior above are the same up to numerical error:

max(abs(df$post_norm - df$post_norm_alt)) < 10E-6

## [1] TRUE

So how might we approach this in a more sophisticated way? We could do the normalisation directly in aghq!

We could start by normalising $\tilde p_\text{LA}(\theta, y)$. Note that sometimes we have been calling this $\tilde p_\text{LA}(\theta \, | \, y)$ because they are equivalent up to a constant of proportionality. Define \[ \tilde p_\text{AQ}(\theta \, | \, y) = \frac{\tilde p_\text{LA}(\theta, y)}{\tilde p_\text{AQ}(y)} \] or on a log-scale \[ \log \tilde p_\text{AQ}(\theta \, | \, y) = \log \tilde p_\text{LA}(\theta, y) - \log \tilde p_\text{AQ}(y) \]

First we have to add a numerical Hessian that’s missing from h:

h$he <- function(theta) numDeriv::jacobian(h$gr, theta, method = "Richardson")

Now we can obtain $\log \tilde p_\text{AQ}(y)$:

quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst

## [1] -143.7878

Remember that h is the negative log-likelihood, so we want to add the normalising constant rather than subtract it:

h_norm <- function(theta) h$fn(theta) + quad$normalized_posterior$lognormconst

Now we can put this into \[ \tilde p(x_i, \hat \theta \, | \, y) = \tilde p_\text{LA}(x_i \, | \, \hat \theta, y) \tilde p_\text{AQ}(\hat \theta \, | \, y) \]

But perhaps there is no reason to, because we will just have to normalise it again, so why do it twice? Let’s just try to normalise it directly. What we need to do is create a function in TMB where we fix some of the inputs to the function. This can be done using the map argument of MakeADFun. You set each parameter to factor(NA) that you want to be fixed, then they will retain the value you input in parameters:

obj$par %>% names()

## [1] "beta_prev"          "phi_prev_i"         "log_sigma_phi_prev"

map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))

param_i_fixed_theta <- param_i

theta_names <- names(opt_theta$par)
for(theta in theta_names) {
  param_i_fixed_theta[[theta]] <- opt_theta$par[[theta]]
}

dat_i <- prepare_dat(dat, i)

obj_fixed_theta <- MakeADFun(
  data = dat_i,
  parameters = param_i_fixed_theta,
  random = "phi_prev_minus_i",
  DLL = "model1_index",
  silent = TRUE,
  map = map_fixed_theta
)

Now we can normalise this function $p_\text{LA}(x_i, \theta = \hat \theta, y)$ using aghq:

quad <- aghq::aghq(ff = obj_fixed_theta, k = 3, startingvalue = 0, control = aghq::default_control_tmb())

The value of the normalising constant is (approximately) the same as we get using our trapezoid rule:

abs(quad$normalized_posterior$lognormconst - norm_trapezoid_log) < 10E-3

## [1] TRUE

4.1.4 Comparison

We now add our Laplace approximation to the existing joint Gaussian marginal and MCMC results:

plot_pdf3 <- plot_pdf2 +
  geom_line(data = df, aes(x = phi_prev_i, y = post_norm, col = "Laplace"))

df$ecdf <- cumsum(df$post_norm) * c(0, diff(df$phi_prev_i))

plot_cdf3 <- plot_cdf2 +
  geom_line(
    data = df, aes(x = phi_prev_i, y = ecdf), col = cbpalette[2]
  )

cowplot::plot_grid(plot_cdf3, plot_pdf3, rel_widths = c(1, 1.5))

We use the inverse CDF method to sample from the Laplace marginal, then we can calculate the KS statistic:

samples_laplace <- approxfun(df$ecdf, df$phi_prev_i)(runif(M))

## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm): collapsing to unique 'x' values

ks_laplace_tmbstan <- inf.utils::ks_test(samples_tmbstan, samples_laplace)

kable_data <- data.frame(
  "Distance" = c(ks_gaussian_tmbstan$D, ks_laplace_tmbstan$D),
  "Location" = c(ks_gaussian_tmbstan$l, ks_laplace_tmbstan$l)
)

rownames(kable_data) <- c("$\\text{KS}(\\texttt{tmbstan}, \\texttt{gaussian})$", "$\\text{KS}(\\texttt{tmbstan}, \\texttt{laplace})$")

kableExtra::kable(kable_data, booktabs = TRUE, escape = FALSE, align = "c")

	Distance	Location
$\text{KS}(\texttt{tmbstan}, \texttt{gaussian})$	0.0617	0.6635031
$\text{KS}(\texttt{tmbstan}, \texttt{laplace})$	0.0402	1.0389550

4.2 Looping over index $i$

So we’ve done the approximation for the first index $i = 1$! Now let’s try computing $\tilde p(x_i \, | \, y)$ for every index $i$, using a for-loop to generate the required Laplace approximations, and tidying up and condensing the code used:

xi_laplace_marginal <- function(i) {
  dat_i <- prepare_dat(dat, i)
  map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
  
  param_i_fixed_theta <- list(
    beta_prev = 0, 
    phi_prev_i = 0, 
    phi_prev_minus_i = rep(0, data$n - 1), 
    log_sigma_phi_prev = 0
  )
  
  for(theta in names(opt_theta$par)) param_i_fixed_theta[[theta]] <- opt_theta$par[[theta]]
  
  obj_fixed_theta <- MakeADFun(
    data = dat_i,
    parameters = param_i_fixed_theta,
    random = "phi_prev_minus_i",
    DLL = "model1_index",
    silent = TRUE,
    map = map_fixed_theta
  )
  
  quad <- aghq::aghq(ff = obj_fixed_theta, k = 3, startingvalue = 0, control = aghq::default_control_tmb())
  pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad)[[1]]
  
  return(pdf_and_cdf)
}

laplace_df <- lapply(1:dat$n, xi_laplace_marginal) %>%
  bind_rows(.id = "index") %>%
  mutate(index = as.numeric(index))

M <- 1000

samples_laplace <- lapply(1:dat$n, function(i) {
  laplace_df_i <- filter(laplace_df, index == i)
  samples <- approxfun(x = laplace_df_i$cdf, y = laplace_df_i$theta, ties = "ordered")(runif(M))
  data.frame("value" = samples)
}) %>%
  bind_rows(.id = "index")

Let’s also get everything we need for the joint Gaussian marginal for each index:

x_Sigma <- solve(x_H)

gaussian_df <- lapply(1:dat$n, function(i) {
  x <- seq(-5, 5, length.out = 1000)
  mode_i <- as.numeric(x_hat[i])
  sd_i <- as.numeric(sqrt(x_Sigma[i, i]))
  data.frame(index = i, x = x, pdf = dnorm(x, mean = mode_i, sd = sd_i))
}) %>%
  bind_rows()

samples_gaussian <- lapply(1:dat$n, function(i) {
  mode_i <- as.numeric(x_hat[i])
  sd_i <- as.numeric(sqrt(x_Sigma[i, i]))
  samples <- rnorm(M, mode_i, sd_i)
  data.frame("value" = samples)
}) %>%
  bind_rows(.id = "index")

4.3 Comparison

samples_tmbstan <- extract(mcmc, pars = "phi_prev") %>%
  as.data.frame() %>%
  pivot_longer(
    cols = everything(),
    names_to = "index",
    names_prefix = "phi_prev.",
    names_transform = as.integer
  )

plot <- ggplot(samples_tmbstan, aes(x = value)) +
  geom_histogram(aes(y = ..density..), alpha = 0.8, fill = cbpalette[7]) +
  facet_wrap(~index) +
  theme_minimal() +
  labs(x = "phi_prev", y = "Posterior PDF")

plot2 <- plot +
  geom_line(data = gaussian_df, aes(x = x, y = pdf), col = cbpalette[1])

plot3 <- plot2 +
  geom_line(data = laplace_df, aes(x = theta, y = pdf), col = cbpalette[2])

plot3

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

4.3.1 KS test

ks_gaussian_tmbstan <- lapply(1:length(x_hat), function(i) {
  samples_gaussian_i <- filter(samples_gaussian, index == i)$value
  samples_tmbstan_i <- filter(samples_tmbstan, index == i)$value
  inf.utils::ks_test(samples_gaussian_i, samples_tmbstan_i)
})

ks_laplace_tmbstan <- lapply(1:length(x_hat), function(i) {
  samples_laplace_i <- filter(samples_laplace, index == i)$value
  samples_tmbstan_i <- filter(samples_tmbstan, index == i)$value
  inf.utils::ks_test(samples_laplace_i, samples_tmbstan_i)
})

ks_results <- bind_rows(
  bind_rows(ks_gaussian_tmbstan, .id = "index") %>%
    mutate(type = "gaussian"),
  bind_rows(ks_laplace_tmbstan, .id = "index") %>%
    mutate(type = "laplace")
)

ks_results %>%
  select(index, D, type) %>%
  pivot_wider(
    names_from = type,
    values_from = D
  ) %>%
  ggplot(aes(x = laplace, y = gaussian)) +
    geom_point(alpha = 0.5) +
    geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
    lims(x = c(0, 1), y = c(0, 1)) +
    labs(x = "KS(laplace, tmbstan)", y = "KS(gaussian, tmbstan)") +
    theme_minimal()

#' Table of average KS distance by method
ks_results %>%
  group_by(type) %>%
  summarise(Distance = mean(D)) %>%
  select(Distance) %>%
  mutate(
    Method = c("$\\text{KS}(\\texttt{tmbstan}, \\texttt{gaussian})$",
               "$\\text{KS}(\\texttt{tmbstan}, \\texttt{laplace})$"),
    .before = Distance
  ) %>%
  kableExtra::kable(booktabs = TRUE, escape = FALSE, align = "c")

Method	Distance
$\text{KS}(\texttt{tmbstan}, \texttt{gaussian})$	0.0511306
$\text{KS}(\texttt{tmbstan}, \texttt{laplace})$	0.0519250

5 Extension beyond empirical Bayes

In Sections 3 and 4 when calculating $\tilde p(x_i \, | \, y)$ we fixed $\theta = \hat \theta$ ($K = 1$, empirical Bayes) rather than integrating $\theta$ out ($K > 1$). In this section we will use ahgq to integrate over $\theta$.

5.1 Marginal of joint Gaussian approximation

We’re going to start by normalising $\tilde p_\text{LA}(\theta, y)$. We did this above in the empirical Bayes case, but found that it wasn’t strictly necessary.

h$he <- function(theta) numDeriv::jacobian(h$gr, theta, method = "Richardson")
quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst

## [1] -143.7878

h_norm <- function(theta) h$fn(theta) + quad$normalized_posterior$lognormconst

We can get the mode and Hessian of the random effects at each quadrature point as follows:

distinctthetas <- quad$normalized_posterior$nodesandweights[, grep('theta', colnames(quad$normalized_posterior$nodesandweights))]
modesandhessians <- distinctthetas
thetanames <- make.unique(names(h$par), sep = "")
colnames(modesandhessians) <- thetanames

modesandhessians$mode <- vector(mode = "list", length = nrow(distinctthetas))
modesandhessians$H <- vector(mode = "list", length = nrow(distinctthetas))

for(i in 1:nrow(distinctthetas)) {
  theta <- as.numeric(modesandhessians[i, thetanames])
  h$fn(theta)
  mm <- h$env$last.par
  modesandhessians[i, "mode"] <- list(list(mm[h$env$random]))
  H <- h$env$spHess(mm, random = TRUE)
  H <- rlang::duplicate(H)
  modesandhessians[i, "H"] <- list(list(H))
}

modesandhessians$weights <- quad$normalized_posterior$nodesandweights$weights

Now we want to evaluate \[ \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{G}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z). \]

To start with, let’s evaluate this on the log-scale for i = 1, x = x_test_value and the first value of z:

The weight $\omega(z)$ will be modesandhessians$weights[z]
The normalised Laplace approximation $\tilde p_\text{LA}(\theta(z) \, | \, y)$ will be - h_norm(as.numeric(modesandhessians[z, thetanames]))
For the normalised Gaussian approximation $p_\text{G}(x_i \, | \, \theta(z), y)$ on the log-scale we get the relevant subcomponents of the joint Gaussian then use ndorm with log = TRUE

x_test_value <- 0

#' Note that in some sense z actually runs over nodes, not indices
z <- 1 
i <- 1

mode_i <- as.numeric(modesandhessians$mode[[z]][i])
Sigma <- solve(modesandhessians$H[[z]])
sd_i <- as.numeric(sqrt(Sigma[i, i]))

dnorm(x_test_value, mean = mode_i, sd = sd_i, log = TRUE) + (- h_norm(theta)) + log(modesandhessians$weights[z])

## [1] -7.731226
## attr(,"logarithm")
## [1] TRUE

Now let’s sum over z so that we can obtain the log-probability for our x_test_value in marginal i:

gaussian_marginal <- function(x) {
  lp <- vector(mode = "numeric", length = nrow(modesandhessians))

  for(z in 1:nrow(modesandhessians)) {
    theta <- as.numeric(modesandhessians[z, thetanames])
    mode_i <- as.numeric(modesandhessians$mode[[z]][i])
    Sigma <- solve(modesandhessians$H[[z]])
    sd_i <- as.numeric(sqrt(Sigma[i, i]))
    lp[z] <- dnorm(x, mean = mode_i, sd = sd_i, log = TRUE) + (- h_norm(theta)) + log(modesandhessians$weights[z])
  }

  matrixStats::logSumExp(lp)
}

gaussian_marginal(x_test_value)

## [1] -1.803582

To pick a set of $x_i$ to evaluate the marginal at, we will use an approximation to the AGHQ nodes. We will take the hyperparameters to be fixed at their mode, then choose the appropriate marginals of the joint Gaussian:

#' Create a function that takes nodesandhessians and outputs the spline nodes for a particular marginal
spline_nodes <- function(modesandhessians, i, k = 7) {
  #' Get the row number of modesandhessians which contains that theta mode
  #' This is bad code (relying on objects outside function)
  theta_mode_location <- which.max(quad$normalized_posterior$nodesandweights$logpost_normalized)
  
  #' These are the mode and standard deviation from the Gaussian approximation at the node which is the mode of the Laplace approximation
  #' (Say that three times fast...)
  mode_i <- modesandhessians[[theta_mode_location, "mode"]][i]
  sd_i <- sqrt(1 / modesandhessians[[theta_mode_location, "H"]][i, i])

  #' Create Gauss-Hermite quadrature
  gg <- mvQuad::createNIGrid(dim = 1, type = "GHe", level = k)

  #' Adapt to mode_i and sd_i
  mvQuad::rescale(gg, m = mode_i, C = sd_i^2)

  #' These are our set of x input values
  gg$nodes
}

#' The log-probabilities at the set of input values
nodes <- spline_nodes(modesandhessians, i = 1, k = 7)
lps <- sapply(nodes, gaussian_marginal)

plot_marginal_spline <- function(nodes, lps) {
  #' Build a Lagrange polynomial interpolant of the marginal posterior
  ss <- splines::interpSpline(nodes, lps, bSpline = TRUE, sparse = TRUE)
  interpolant <- function(x) { as.numeric(stats::predict(ss, x)$y) }
  finegrid <- seq(-5, 5, by = 0.1)
  df <- data.frame(x = finegrid, y = exp(interpolant(finegrid)))

  ggplot(df, aes(x = x, y = y)) +
    geom_line() +
    theme_minimal() +
    labs(x = "phi_prev[1]", y = "Posterior")
}

plot_marginal_spline(nodes, lps)

5.2 Full Laplace approximation (old way)

As before, but with the full Laplace approximation in place of the Gaussian. Much of what we computed above in Section 5.1 is still relevant. The only thing that has changed is that we’re now using $\tilde p_\text{LA}(x_i \, | \, \theta(z), y)$ rather than $\tilde p_\text{G}(x_i \, | \, \theta(z), y)$ in \[ \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{LA}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z). \]

x_test_value <- 0
i <- 1
z <- 1

As before, the weight $\omega(z)$ is modesandhessians$weights[z] and the normalised Laplace approximation $\tilde p_\text{LA}(\theta(z) \, | \, y)$ is - h_norm(as.numeric(modesandhessians[z, thetanames])). Getting to the difference, the normalised Laplace approximation $p_\text{LA}(x_i \, | \, \theta(z), y)$ on the log-scale is:

dat_i <- prepare_dat(dat, i)
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
  
param_i_fixed_theta <- list(
  beta_prev = 0, 
  phi_prev_i = 0, 
  phi_prev_minus_i = rep(0, data$n - 1),
  log_sigma_phi_prev = 0
)

for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]

obj_fixed_theta <- MakeADFun(
  data = dat_i,
  parameters = param_i_fixed_theta,
  random = "phi_prev_minus_i",
  DLL = "model1_index",
  silent = TRUE,
  map = map_fixed_theta
)

#' Note that this is only a one-dimensional marginal, so perhaps we can afford to put k > 3 here
quad <- aghq::aghq(ff = obj_fixed_theta, k = 7, startingvalue = 0, control = aghq::default_control_tmb())

obj_fixed_theta_norm <- function(x) obj_fixed_theta$fn(x) + quad$normalized_posterior$lognormconst

- obj_fixed_theta_norm(x_test_value)

## [1] -5.16969
## attr(,"logarithm")
## [1] TRUE

Putting it together on the log-scale:

as.numeric(- obj_fixed_theta_norm(x_test_value)) + (- h_norm(theta)) + log(modesandhessians$weights[z])

## [1] -6.640522
## attr(,"logarithm")
## [1] TRUE

Sum over z:

old_laplace_marginal <- function(x) {
  lp <- vector(mode = "numeric", length = nrow(modesandhessians))

  for(z in 1:nrow(modesandhessians)) {
    theta <- as.numeric(modesandhessians[z, thetanames])
  
    dat_i <- prepare_dat(dat, i)
    map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
  
    param_i_fixed_theta <- list(
      beta_prev = 0, 
      phi_prev_i = 0, 
      phi_prev_minus_i = rep(0, data$n - 1),
      log_sigma_phi_prev = 0
    )

    for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]

    obj_fixed_theta <- MakeADFun(
      data = dat_i,
      parameters = param_i_fixed_theta,
      random = "phi_prev_minus_i",
      DLL = "model1_index",
      silent = TRUE,
      map = map_fixed_theta
    )
    
    quad <- aghq::aghq(ff = obj_fixed_theta, k = 7, startingvalue = 0, control = aghq::default_control_tmb())
    obj_fixed_theta_norm <- function(xx) obj_fixed_theta$fn(xx) + quad$normalized_posterior$lognormconst
    
    lp[z] <- as.numeric(- obj_fixed_theta_norm(x)) + (- h_norm(theta)) + log(modesandhessians$weights[z])
  }

  matrixStats::logSumExp(lp)
}

old_laplace_marginal(x_test_value)

## [1] -1.155867

lps <- sapply(nodes, old_laplace_marginal)
plot_marginal_spline(nodes, lps)

5.3 Full Laplace approximation (new way)

The method above should work, but it requires a lot of computation. Instead, we might be able to do this more cost effectively by integrating $\theta$ out before normalising $\tilde p_\text{LA}(x_i, \theta(z), y)$. This would look like first obtaining \[ \tilde p(x_i, y) = \sum_z \tilde p_\text{LA}(x_i, \theta(z), y) \omega(z), \] then normalising by the previously obtained $\tilde p_\text{AQ}(y)$ as follows \[ \tilde p(x_i \, | \, y) = \frac{\tilde p(x_i, y)}{\tilde p_\text{AQ}(y)}. \]

#' Make sure the log normalising constant is available
quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst

## [1] -143.7878

laplace_marginal <- function(x) {
  lp <- vector(mode = "numeric", length = nrow(modesandhessians))

  for(z in 1:nrow(modesandhessians)) {
    theta <- as.numeric(modesandhessians[z, thetanames])
  
    dat_i <- prepare_dat(dat, i)
    map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
  
    param_i_fixed_theta <- list(
      beta_prev = 0, 
      phi_prev_i = 0, 
      phi_prev_minus_i = rep(0, data$n - 1),
      log_sigma_phi_prev = 0
    )

    for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]

    obj_fixed_theta <- MakeADFun(
      data = dat_i,
      parameters = param_i_fixed_theta,
      random = "phi_prev_minus_i",
      DLL = "model1_index",
      silent = TRUE,
      map = map_fixed_theta
    )

    lp[z] <- as.numeric(- obj_fixed_theta$fn(x)) + log(modesandhessians$weights[z])
  }

  matrixStats::logSumExp(lp) - quad$normalized_posterior$lognormconst
}

laplace_marginal(x_test_value)

## [1] -2.61789

lps <- sapply(nodes, laplace_marginal)
plot_marginal_spline(nodes, lps)

5.4 Full Laplace method (new new method)

Try doing this without a new MakeADFun for each $z$:

new_laplace_marginal <- function(x, i) {
  dat_i <- prepare_dat(dat, i)
  
  param_i <- list(
    beta_prev = 0, 
    phi_prev_i = 0, 
    phi_prev_minus_i = rep(0, data$n - 1),
    log_sigma_phi_prev = 0
  )

  obj <- MakeADFun(
    data = dat_i,
    parameters = param_i,
    random = "phi_prev_minus_i",
    DLL = "model1_index",
    silent = TRUE
  )

  lp <- vector(mode = "numeric", length = nrow(modesandhessians))

  for(z in 1:nrow(modesandhessians)) {
    theta <- as.numeric(modesandhessians[z, thetanames])
    lp[z] <- as.numeric(- obj$fn(c(theta[1], x, theta[2]))) + log(modesandhessians$weights[z])
  }

  matrixStats::logSumExp(lp) - quad$normalized_posterior$lognormconst
}

We get the same value as we did with laplace_marginal!

new_laplace_marginal(x_test_value, i = 1)

## [1] -2.61789

lps <- sapply(nodes, new_laplace_marginal, i = 1)
plot_marginal_spline(nodes, lps)

Now let’s loop this over $i$ so we can compare to the other methods. The hypothesis is that this will be better than the empirical Bayes methods.

spline_nodes(modesandhessians, i = 1)

##                           [,1]
## [1,] -3.7504397177257429163433
## [2,] -2.3667594107345415466170
## [3,] -1.1544053947399681714359
## [4,] -0.0000000000000001498764
## [5,]  1.1544053947399681714359
## [6,]  2.3667594107345415466170
## [7,]  3.7504397177257429163433

new_laplace_marginal_spline_aghq <- function(i) {
  x <- spline_nodes(modesandhessians, i = i)
  lps <- sapply(x, new_laplace_marginal, i = i)
  ss <- splines::interpSpline(x, lps, bSpline = TRUE, sparse = TRUE)
  interpolant <- function(x) { as.numeric(stats::predict(ss, x)$y) }
  finegrid <- seq(-5, 5, by = 0.1)
  df <- data.frame(x = finegrid, y = exp(interpolant(finegrid)))
  return(df)
}

aghq_laplace_df <- lapply(1:dat$n, new_laplace_marginal_spline_aghq) %>%
  bind_rows(.id = "index") %>%
  mutate(index = as.numeric(index))

plot3 +
  geom_line(data = aghq_laplace_df, aes(x = x, y = y), col = cbpalette[3])

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#' Need to edit this so that there are cdf and theta columns in aghq_laplace_df
# samples_aghq_laplace <- lapply(1:dat$n, function(i) {
#   aghq_laplace_df_i <- filter(aghq_laplace_df, index == i)
#   samples <- approxfun(x = aghq_laplace_df_i$cdf, y = aghq_laplace_df_i$theta, ties = "ordered")(runif(M))
#   data.frame("value" = samples)
# }) %>%
#   bind_rows(.id = "index")

6 Appendix

6.1 Resources

6.2 Comparison of notation

Description	Here	Wood (2019) notation	Notes
Random effects	$x$	$\beta$	Stringer (2021) use $w$
Random effects which maximise (the / some) objective for given $\theta$	$\hat x (\theta)$	$\hat \beta$	Wood omits dependence on $\theta$
Random effects which maximise (the / some) objective for given $\theta$, subject to $x_i$ being fixed	$(x_i, \hat x_{-i}(x_i, \theta))$	$\tilde \beta$	We have $\hat x_{-i}(x_i, \theta))$ having dimension $n - 1$
Hessian / precision matrix of the negative log-likelihood with respect to $x$ at $\hat x(\theta)$	$H(\theta)$	$H$	Unsure if it’s preferable to use Hessian or precision notation here

6.3 `model1.cpp`

// model1.cpp

#include <TMB.hpp>

template <class Type>
Type objective_function<Type>::operator()()
{
  // Data block
  DATA_INTEGER(n);      // Number of regions
  DATA_VECTOR(y_prev);  // Vector of survey responses
  DATA_VECTOR(m_prev);  // Vector of survey sample sizes

  // Parameter block
  PARAMETER(beta_prev);            // Survey intercept
  PARAMETER_VECTOR(phi_prev);      // Survey spatial effect
  PARAMETER(log_sigma_phi_prev);   // Survey log standard deviation of spatial effects

  // Transformed parameters block
  Type sigma_phi_prev = exp(log_sigma_phi_prev);
  vector<Type> eta_prev(beta_prev + sigma_phi_prev * phi_prev);

  // Initialise negative log-likelihood
  Type nll(0.0);

  // Priors
  nll -= dnorm(sigma_phi_prev, Type(0), Type(2.5), true) + log_sigma_phi_prev;
  nll -= dnorm(beta_prev, Type(-2), Type(1), true);
  nll -= dnorm(phi_prev, Type(0), Type(1), true).sum();

  // Likelihood
  nll -= dbinom_robust(y_prev, m_prev, eta_prev, true).sum();

  // Generated quantities block
  vector<Type> rho_prev(invlogit(eta_prev));    // Posterior prevalence estimates
  Type tau_phi_prev = 1/pow(sigma_phi_prev, 2); // Precision of spatial effects

  return(nll);
}

6.4 `model1_index.cpp`

// model1index.cpp

#include <TMB.hpp>

template <class Type>
Type objective_function<Type>::operator()()
{
  // Data block
  DATA_INTEGER(n);             // Number of regions
  DATA_VECTOR(y_prev_i);       // Survey response for index i
  DATA_VECTOR(y_prev_minus_i); // Survey response for index -i
  DATA_VECTOR(m_prev_i);       // Survey sample size for index i
  DATA_VECTOR(m_prev_minus_i); // Survey sample size for index -i

  // Parameter block
  PARAMETER(beta_prev);               // Survey intercept
  PARAMETER(phi_prev_i);              // Survey spatial effect for index i
  PARAMETER_VECTOR(phi_prev_minus_i); // Survey spatial effect for index -i
  PARAMETER(log_sigma_phi_prev);      // Survey log standard deviation of spatial effects

  // Transformed parameters block
  Type sigma_phi_prev = exp(log_sigma_phi_prev);
  Type eta_prev_i = beta_prev + sigma_phi_prev * phi_prev_i;
  vector<Type> eta_prev_minus_i(beta_prev + sigma_phi_prev * phi_prev_minus_i);

  // Initialise negative log-likelihood
  Type nll(0.0);

  // Priors
  nll -= dnorm(sigma_phi_prev, Type(0), Type(2.5), true) + log_sigma_phi_prev;
  nll -= dnorm(beta_prev, Type(-2), Type(1), true);
  nll -= dnorm(phi_prev_i, Type(0), Type(1), true);
  nll -= dnorm(phi_prev_minus_i, Type(0), Type(1), true).sum();

  // Likelihood
  nll -= dbinom_robust(y_prev_i, m_prev_i, eta_prev_i, true).sum();
  nll -= dbinom_robust(y_prev_minus_i, m_prev_minus_i, eta_prev_minus_i, true).sum();

  // Generated quantities block
  Type tau_phi_prev = 1/pow(sigma_phi_prev, 2); // Precision of spatial effects

  return(nll);
}

7 Original computing environment

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 13.1
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] inf.utils_0.1.0      multi.utils_0.1.0    patchwork_1.1.2      tibble_3.1.8         tidyverse_1.3.1      aghq_0.4.0           tmbstan_1.0.4        rstan_2.21.5        
##  [9] StanHeaders_2.21.0-7 glmmTMB_1.1.4        TMB_1.9.1            stringr_1.5.0        purrr_0.3.5          tidyr_1.2.1          readr_2.1.3          INLA_22.05.07       
## [17] sp_1.5-1             foreach_1.5.2        Matrix_1.5-1         ggplot2_3.4.0        forcats_0.5.2        dplyr_1.0.10        
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4         colorspace_2.0-3    ellipsis_0.3.2      rprojroot_2.0.3     fs_1.5.2            rstudioapi_0.13     farver_2.1.1        MatrixModels_0.5-0  bit64_4.0.5        
##  [10] fansi_1.0.3         lubridate_1.8.0     xml2_1.3.3          codetools_0.2-18    splines_4.2.0       cachem_1.0.6        knitr_1.41          polynom_1.4-1       jsonlite_1.8.4     
##  [19] nloptr_2.0.3        broom_0.8.0         dbplyr_2.1.1        compiler_4.2.0      httr_1.4.4          backports_1.4.1     assertthat_0.2.1    fastmap_1.1.0       cli_3.4.1          
##  [28] htmltools_0.5.3     prettyunits_1.1.1   tools_4.2.0         gtable_0.3.1        glue_1.6.2          Rcpp_1.0.9          jquerylib_0.1.4     cellranger_1.1.0    vctrs_0.5.1        
##  [37] mvQuad_1.0-6        svglite_2.1.0       nlme_3.1-157        iterators_1.0.14    xfun_0.35           ps_1.7.0            lme4_1.1-30         rvest_1.0.2         lifecycle_1.0.3    
##  [46] statmod_1.4.36      orderly_1.4.3       ids_1.0.1           MASS_7.3-56         scales_1.2.1        ragg_1.2.2          hms_1.1.2           inline_0.3.19       yaml_2.3.6         
##  [55] memoise_2.0.1       gridExtra_2.3       sass_0.4.4          loo_2.5.1           RSQLite_2.2.14      stringi_1.7.8       highr_0.9           boot_1.3-28         pkgbuild_1.3.1     
##  [64] rlang_1.0.6         pkgconfig_2.0.3     systemfonts_1.0.4   matrixStats_0.62.0  evaluate_0.18       lattice_0.20-45     labeling_0.4.2      cowplot_1.1.1       bit_4.0.5          
##  [73] processx_3.5.3      tidyselect_1.2.0    bookdown_0.26       magrittr_2.0.3      R6_2.5.1            generics_0.1.3      DBI_1.1.3           pillar_1.8.1        haven_2.5.0        
##  [82] withr_2.5.0         modelr_0.1.8        crayon_1.5.2        uuid_1.1-0          utf8_1.2.2          tzdb_0.3.0          rmarkdown_2.18      grid_4.2.0          readxl_1.4.0       
##  [91] data.table_1.14.6   blob_1.2.3          callr_3.7.0         reprex_2.0.1        digest_0.6.30       webshot_0.5.3       numDeriv_2016.8-1.1 textshaping_0.3.6   openssl_2.0.5      
## [100] RcppParallel_5.1.5  stats4_4.2.0        munsell_0.5.0       viridisLite_0.4.1   kableExtra_1.3.4    bslib_0.4.1         askpass_1.1

Note that I’m uncertain if $\hat \theta$ (a constant) should still be included in the LHS of this.↩︎

Description	Here	Wood (2019) notation	Notes
Random effects	\(x\)	\(\beta\)	Stringer (2021) use \(w\)
Random effects which maximise (the / some) objective for given \(\theta\)	\(\hat x (\theta)\)	\(\hat \beta\)	Wood omits dependence on \(\theta\)
Random effects which maximise (the / some) objective for given \(\theta\), subject to \(x_i\) being fixed	\((x_i, \hat x_{-i}(x_i, \theta))\)	\(\tilde \beta\)	We have \(\hat x_{-i}(x_i, \theta))\) having dimension \(n - 1\)
Hessian / precision matrix of the negative log-likelihood with respect to \(x\) at \(\hat x(\theta)\)	\(H(\theta)\)	\(H\)	Unsure if it’s preferable to use Hessian or precision notation here

Implementing simplified INLA into `aghq`

Adam Howes

1 Background

2 Ground truth

3 Marginal of joint Gaussian approximation

3.1 \(\tilde p(\theta \, | \, y)\)

3.2 Additional checks

Description

Reproducing output from `h`

3.3 \(\tilde p(x_i \, | \, y)\)

4 Full Laplace approximation

4.1 For one index \(i\)

4.1.1 \(\tilde p(x_i \, | \, \theta, y)\)

4.1.2 \(\tilde p(x_i \, | \, y)\)

Safe exponential trapezoid rule

`logSumExp` trapezoid rule

4.1.3 Normalisation with `aghq`

4.1.4 Comparison

4.2 Looping over index \(i\)

4.3 Comparison

4.3.1 KS test

5 Extension beyond empirical Bayes

5.1 Marginal of joint Gaussian approximation

5.2 Full Laplace approximation (old way)

5.3 Full Laplace approximation (new way)

5.4 Full Laplace method (new new method)

6 Appendix

6.1 Resources

6.2 Comparison of notation

6.3 `model1.cpp`

6.4 `model1_index.cpp`

7 Original computing environment

	Distance	Location
\(\text{KS}(\texttt{tmbstan}, \texttt{gaussian})\)	0.0617	0.6635031
\(\text{KS}(\texttt{tmbstan}, \texttt{laplace})\)	0.0402	1.0389550

Implementing simplified INLA into aghq

Adam Howes

1 Background

2 Ground truth

3 Marginal of joint Gaussian approximation

3.1 \(\tilde p(\theta \, | \, y)\)

3.2 Additional checks

Description

Reproducing output from h

3.3 \(\tilde p(x_i \, | \, y)\)

4 Full Laplace approximation

4.1 For one index \(i\)

4.1.1 \(\tilde p(x_i \, | \, \theta, y)\)

4.1.2 \(\tilde p(x_i \, | \, y)\)

Safe exponential trapezoid rule

logSumExp trapezoid rule

4.1.3 Normalisation with aghq

4.1.4 Comparison

4.2 Looping over index \(i\)

4.3 Comparison

4.3.1 KS test

5 Extension beyond empirical Bayes

5.1 Marginal of joint Gaussian approximation

5.2 Full Laplace approximation (old way)

5.3 Full Laplace approximation (new way)

5.4 Full Laplace method (new new method)

6 Appendix

6.1 Resources

6.2 Comparison of notation

6.3 model1.cpp

6.4 model1_index.cpp

7 Original computing environment

Implementing simplified INLA into `aghq`

Reproducing output from `h`

`logSumExp` trapezoid rule

4.1.3 Normalisation with `aghq`

6.3 `model1.cpp`

6.4 `model1_index.cpp`