aghq
Abstract
Background Wood (2020) simplify the INLA method by approximating the latent field posterior marginals using a reduced cost Laplace approximation which does not rely on sparsity assumptions. In spatio-temporal statistics, usually the latent field corresponds to spatio-temporal locations – so these marginal posteriors are of central scientific interest.
Task We implement the method in R making use of Template Model Builder (TMB
) for Gaussian approximations, in a way compatible with the aghq
package for adaptive Gauss-Hermite quadrature. We are interested in efficiently approximating latent field posterior marginals \(p(x_i \, | \, \theta, y) \approx \tilde p(x_i \, | \, \theta, y)\) for each index \(i\). To build up to implementing the simplified INLA approach, we will start with two simpler methods (1) taking the marginal of a joint Gaussian approximation (2) doing a full Laplace approximation.
Warning: this report was created during the process of learning, and for that reason is not concise or well-structured. I would not recommend you reading it, but if you do proceed at your own caution!
An AGHQ approximation to the joint posterior of the latent field is given by \[ \tilde p(x \, | \, y) = \sum_z \tilde p_\text{G}(x \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z), \tag{1.1} \] where \(\theta(z) \in \Theta\) and \(\omega(z) \in \mathbb{R}\) are nodes and weights generated by adaption to \(\tilde p_\text{LA}(\theta \, | \, y)\). In particular, let \[ \mathcal{Q}(1, k) = \{z_d \in \mathbb{R}: H_k(z_d) = 0\} \] be the set of nodes in one-dimension, exactly the zeros of the Hermite polynomials \(H_k(z_d)\) for \(k \in \mathbb{N}\), and \[ \omega(z_d) = \frac{k!}{H_{k + 1}(z_d)^2 \times \phi(z_d)}, \] be the associated weights. Define \(\mathcal{Q}(d, k) = \mathcal{Q}(1, k)^d\) to be the set of nodes \(z\) in \(d\)-dimensions, of which there will be \(k^d\). Then these points are adapted via \(\theta(z) = \hat \theta + Lz\), where \(\hat \theta\) is the mode and \(L\) is the lower Cholesky triangle of the curvature at the mode such that \(LL^\top = H^{-1}\).
This can be evaluated at any point, though in practice we work with it by drawing samples \(x \sim p_\text{G}(x \, | \, \theta, y)\) using a method from sampling from a Gaussian given its mode and precision from Rue (in particular you solve a linear equation using the lower Cholesky triangle of the curvature at a randomly chosen quadrature point). For the \(i\)th marginal, we just take the appropriate samples from the joint. The approximation is therefore \[ \begin{equation} \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{G}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z), \tag{1.2} \end{equation} \] where \(p_\text{G}(x_i \, | \, \theta, y)\) just takes the \(i\)th elements of the mode vector and \((i, i)\)th element of the covariance matrix.
To understand how this is implemented in aghq
, the two most important functions are aghq::marginal_laplace_tmb
and aghq::sample_marginal
.
aghq::marginal_laplace_tmb
calculates an object representing \(\tilde p(x \, | \, \theta, y)\).
aghq::sample_marginal
draws samples from this Gaussian, given values of \(z\) drawn from a multinomial distribution, in order to evaluate Equation (1.2).
For a line-by-line walkthrough of how these functions work, see the notebook Code walkthrough (which uses the model from the notebook Epilepsy example).
The goal of this notebook (Implementing simplified INLA into aghq
) is to create a new approximation swapping out \(\tilde p_\text{G}(x_i \, | \, \theta, y)\) from the above for the simplified INLA approximation \(\tilde p_{\text{SINLA}}(x_i \, | \, \theta, y)\) as follows
\[
\tilde p(x_i \, | \, y) = \sum_z \tilde p_{\text{SINLA}}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z).
\]
We want to do this because the new approximation should be more accurate than taking marginals of a joint Gaussian approximation.
At the extreme, you could just compute a Laplace approximation for each marginal, integrating out all other elements of the latent field, \(x_{-i}\), together with the fixed effects \(\theta\).
This would be an accurate approach, but not computationally feasible for large models.
What we hope to achieve is an approximation which gets to most of the accuracy benefits of the Laplace approximation approach, but without the high computational cost.
We will use Model 1 from the Prevalence, ANC, ART model inference case-study as a running example.
To provide a ground truth to compare the methods that follow to, we generate “gold-standard” inferences using a relatively long run of MCMC as implemented by tmbstan
.
We begin by loading in testing data, as generated by prev-anc-art_sim
.
sim_data
contains many replicates of simulated data.
Here we only need one, so will will just take the first as given by:
sim_data <- readRDS("depends/sim_data.rds")
data <- sim_data[[1]]
dat <- list(n = data$n, y_prev = data$y_prev, m_prev = data$m_prev)
dat
## $n
## [1] 36
##
## $y_prev
## [1] 35 7 34 19 41 20 32 12 22 17 13 53 16 13 35 12 24 17 20 34 25 19 35 13 22 15 14 21 55 52 8 45 11 12 21 24
##
## $m_prev
## [1] 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
Compile and load the template (see Appendix 6 for the TMB
code):
compile("model1.cpp")
## [1] 0
dyn.load(dynlib("model1"))
We create an objective function h
integrating the random effects \(x\) (which here correspond to the spatial random effects phi_prev
).
The values of param
intitialise the objective function:
param <- list(
beta_prev = -2,
phi_prev = rep(0, data$n),
log_sigma_phi_prev = -1
)
h <- MakeADFun(
data = dat,
parameters = param,
random = "phi_prev",
DLL = "model1",
silent = TRUE
)
It is simple to fit the model using MCMC via the tmbstan
package.
We use 4 chains
each of 5000 iter
.
Note that although h
is passed to tmbstan::tmbstan
, the Laplace approximation is disabled by setting laplace = FALSE
by default (useful to know in case it’s ever of interest to sample from the Laplace approximation).
mcmc <- tmbstan::tmbstan(h, chains = 4, iter = 5000, refresh = 0)
The posterior marginal \(p(\phi_1 \, | \, y)\) can be examined by extracting the corresponding Markov chains, and plotting a histogram of the draws:
samples_tmbstan <- extract(mcmc, pars = "phi_prev[1]")[[1]]
plot_pdf <- ggplot(data.frame(x = samples_tmbstan), aes(x = x)) +
geom_histogram(aes(y = ..density..), alpha = 0.8, fill = cbpalette[7]) +
theme_minimal() +
labs(x = "phi_prev[1]", y = "Posterior PDF")
ecdf_tmbstan <- ecdf(samples_tmbstan)
grid <- seq(min(samples_tmbstan), max(samples_tmbstan), by = 0.1)
plot_cdf <- data.frame(x = grid, y = ecdf_tmbstan(grid)) %>%
ggplot(aes(x = x, y = y)) +
geom_line(col = cbpalette[7]) +
theme_minimal() +
labs(x = "phi_prev[1]", y = "Posterior CDF")
cowplot::plot_grid(plot_cdf, plot_pdf)
Now we will compute Equation (1.1) taking the marignals of a joint Gaussian approximation \(p_\text{G}(x \, | \, \theta, y)\).
This is the approach currently used in the aghq
package (done implicitly, via sampling).
This approach is also used by method = "gaussian"
in R-INLA
.
The Gaussian approximation \(p_\text{G}(x \, | \, \theta, y)\) is previously used in computing the Laplace approximation to \(p(\theta \, | \, y)\)
\[
\begin{equation}
\tilde p(\theta \, | \, y) \propto \frac{p(x, \theta, y)}{\tilde p_\text{G}(x \, | \, \theta, y)} \Big\rvert_{x = \hat x(\theta)} = \frac{p(y, \hat x(\theta), \theta)}{\det \left(\frac{1}{2 \pi} H(\theta) \right)^{1/2}}, \tag{3.1}
\end{equation}
\]
such that one easy way to calculate it is to set random = "x"
in the TMB
template to integrate the latent field out, then recover the Gaussian approximation from there, as we have done in the function h
above.
Let’s optimise h
to obtain estimates \(\hat \theta\) which maximise \(\tilde p(\theta \, | \, y)\):
its <- 1000
opt_theta <- nlminb(
start = h$par,
objective = h$fn,
gradient = h$gr,
control = list(iter.max = its, trace = 0)
)
The values opt_theta$par
below correspond to \(\hat \theta\):
opt_theta$par
## beta_prev log_sigma_phi_prev
## -2.3486271 -0.6141847
We can also look at the parameters – that is, mode vector and precision matrix – of the Gaussian approximation at the mode opt_theta$par
above.
Note that in aghq
, what we would do is create a quadrature grid, then get the mode vector and precision matrix for each value of the quadrature grid over \(\theta\).
By just sticking to the mode of \(\theta\), we are effectively doing aghq
with \(k = 1\) in each dimension, i.e. one grid point, corresponding to empirical Bayes type of inference.
#' Notice that last.par contains opt_theta$par as the fixed effects
#' It's the last.par because the last thing we've done with the template is the optimsation
#' This seems a pretty shakey way to go about things, but here we are...
mm <- h$env$last.par
#' These are the indices of the random effects that we want
h$env$random
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
x_hat <- mm[h$env$random]
x_hat <- rlang::duplicate(x_hat) #' Insurance...
length(x_hat) #' The length is 30
## [1] 36
#' Hessian evaluated at the mode
x_H <- h$env$spHess(mm, random = TRUE)
x_H <- rlang::duplicate(x_H)
dim(x_H) #' With dimension [30 x 30]
## [1] 36 36
These tabs contain additional sanity checks used to build understanding. Feel free to skip them.
h
We would like to reproduce h
using the right hand side of Equation (3.1).
To do this, we start by creating an objective function f
(see Appendix 6 for the TMB
code) to evaluate \(- \log p(x, \theta, y)\) (without any Laplace approximations!).
This is what we will be using for the numerator of Equation (3.1):
f <- MakeADFun(
data = dat,
parameters = param,
DLL = "model1",
silent = TRUE
)
Given \(\theta\), we compute the mode
and H
of \(x\) (corresponding to phi_prev
) using h
.
We then concatenate and evaluate using f
to get the numerator, multiplying by -1 because f$fn
gives the negative log-likelihood, and we want the log-likelihood.
In the denominator, we use determinant
on the log-scale as it is more numerically stable than e.g. log(det(...))
.
calculate_rhs <- function(theta) {
#' Calculate mode and H for those theta
#' TMB template remembers what you've called -- spooky
h$fn(theta)
mm <- h$env$last.par
mode <- mm[h$env$random]
H <- h$env$spHess(mm, random = TRUE)
#' Append the calculated latent field mode
param <- theta
param$phi_prev <- as.numeric(mode)
param <- param[c("beta_prev", "phi_prev", "log_sigma_phi_prev")]
#' Evaluate the log-numerator of the RHS
numer <- -1 * f$fn(unlist(param))
#' Evaluate the log-denominator of the RHS
denom <- 0.5 * (determinant(H, logarithm = TRUE)$modulus - (nrow(H) * log(2 * pi)))
#' Return their difference
numer - denom
}
theta <- param
theta$phi_prev <- NULL #' Setting the value of x to be NULL
calculate_rhs(theta)
## [1] -155.5523
## attr(,"logarithm")
## [1] TRUE
As compared with what we got from h
:
-1 * h$fn(unlist(theta))
## [1] -155.5523
## attr(,"logarithm")
## [1] TRUE
Now let’s have a go at computing Equation (1.2) so that we can compare to the posterior distribution from tmbstan
we obtained in Section 2.
As described above, we simplify matters by considering just one quadrature point, \(\hat \theta\), such that the sum over \(z\) collapses to:
\[
\begin{align}
\tilde p(x_i \, | \, y) &= \tilde p_\text{G}(x_i \, | \, \hat \theta, y) \tilde p_\text{LA}(\hat \theta \, | \, y) \\
&\propto \tilde p_\text{G}(x_i \, | \, \hat \theta, y)
\end{align}
\]
We can evaluate this for \(i = 1\) via:
i <- 1
mode_i <- as.numeric(x_hat[i])
var_i <- solve(x_H)[i, i]
Because this is a Gaussian distribution, we do not have do normalise it (as would usually be the case) because we know it’s a density. We can compare it to our MCMC results as follows:
plot_pdf2 <- plot_pdf +
stat_function(
data = data.frame(x = c(0.5, 3.5)), aes(x, col = "Joint Gaussian marginal"), fun = dnorm, n = 101,
args = list(mean = mode_i, sd = sqrt(var_i))
) +
labs(x = "prev_phi[1]", y = "Posterior", col = "Approximation") +
scale_colour_manual(values = cbpalette) +
theme_minimal()
M <- 1000
samples_gaussian <- rnorm(M, mode_i, sqrt(var_i))
ecdf_gaussian <- ecdf(samples_gaussian)
grid <- seq(min(samples_gaussian), max(samples_gaussian), by = 0.1)
plot_cdf2 <- plot_cdf +
geom_line(
data = data.frame(x = grid, y = ecdf_gaussian(grid)),
aes(x = x, y = y), col = cbpalette[1]
)
cowplot::plot_grid(plot_cdf2, plot_pdf2, rel_widths = c(1, 1.5))
ks_gaussian_tmbstan <- inf.utils::ks_test(samples_tmbstan, samples_gaussian)
ks_gaussian_tmbstan
## $D
## [1] 0.0617
##
## $l
## [1] 0.6635031
The maximum difference between the two ECDFS is 0.0617 which occurs at the point 0.6635031.
In this section we will continue to use the approximation (3.1) for \(\tilde p(\theta \, | \, y)\).
For the approximation to the latent field marginal posteriors, another approach we could take is to calculate the full Laplace approximation
\[
\begin{align*}
\tilde p(x_i \, | \, \theta, y) &\propto \frac{p(x, \theta, y)}{p_\text{G}(x_{-i} \, | \, x_i, \theta, y)} \Big\rvert_{x_{-i} = \hat x_{-i}(x_i, \theta)} \tag{4.1} \\
&= \frac{p(y, x_i, \hat x_{-i}(x_i, \theta), \theta)}{\det(\frac{1}{2 \pi} H_{-i}(x_i, \theta))^{1/2}}
\end{align*}
\]
integrating out \(x_{-i}\) with a Gaussian approximation.
This is not practical in some settings as it involves recomputing the Gaussian approximation \(p_\text{G}(x_{-i} \, | \, x_i, \theta, y)\) for each value of \((x_i, \theta)\) which can be too computationally expensive.
We will still go through the steps for implementing it here though for demonstration.
With of a bit of effort, it can be implemented in TMB
by passing in data where elements \(i\) and \(-i\) of the latent field are explicitly named (see Appendix 6 for the TMB
code):
compile("model1_index.cpp")
## [1] 0
dyn.load(dynlib("model1_index"))
The prepare_dat
function takes dat
and makes it compatible with model1_index
for particular choice of marginal i
.
prepare_dat <- function(dat, i) {
dat[["y_prev_i"]] <- dat$y_prev[i]
dat[["y_prev_minus_i"]] <- dat$y_prev[-i]
dat[["m_prev_i"]] <- dat$m_prev[i]
dat[["m_prev_minus_i"]] <- dat$m_prev[-i]
dat[c("n", "y_prev_i", "y_prev_minus_i", "m_prev_i", "m_prev_minus_i")]
}
Let’s consider the first marginal, that is \(\tilde p_\text{LA}(x_1 \, | \, \theta, y)\).
In the call to MakeADFun
we set random = "phi_prev_minus_i"
to integrate all of the random effects but that of the particular index.
i <- 1
param_i <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
dat_i <- prepare_dat(dat, i)
#' random are integrated out with a Laplace approximation
obj <- MakeADFun(
data = dat_i,
parameters = param_i,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE
)
opt <- nlminb(
start = obj$par,
objective = obj$fn,
gradient = obj$gr,
control = list(iter.max = its, trace = 0)
)
The mode and precision matrix for the Gaussian approximation \(\tilde p_\text{G}(x_{-i} \, | \, x_i, \theta, y)\) are:
get_mode_and_H <- function(obj) {
mm <- obj$env$last.par
mode <- mm[obj$env$random]
mode <- rlang::duplicate(mode)
H <- obj$env$spHess(mm, random = TRUE)
H <- rlang::duplicate(H)
return(list(mode = mode, H = H))
}
xminusi <- get_mode_and_H(obj)
Check here that obj$fn
indeed takes as input \((x_i, \theta)\):
transform_param <- function(param) {
list(
beta_prev = param$beta_prev,
phi_prev_i = param$phi_prev[i],
phi_prev_minus_i = param$phi_prev[-i],
log_sigma_phi_prev = param$log_sigma_phi_prev
)
}
transformed_param <- transform_param(param)
transformed_param$phi_prev_minus_i <- NULL #' Remove the integrated out effects
#' transformed_param contains certain values for (x_i, theta) suitable for input to obj$fn
obj$fn(transformed_param)
## [1] 156.06
## attr(,"logarithm")
## [1] TRUE
Let’s have a go at computing Equation (1.2) so that we can compare to the posterior distribution from tmbstan
.
We simplify matters by considering just one quadrature point, \(\hat \theta\), such that the sum over \(z\) collapses to1:
\[
\begin{align}
\tilde p(x_i \, | \, y) &= \tilde p_\text{LA}(x_i \, | \, \hat \theta, y) \tilde p_\text{LA}(\hat \theta \, | \, y) \\
&\propto \tilde p_\text{LA}(x_i \, | \, \hat \theta, y)
\end{align}
\]
A basic way we could approach this is to define a grid of points that we would like to evaluate the posterior at.
We can then create a dataframe where each row contains the values opt_theta$par
for \(\theta\) and a grid value for \(x_i\).
Each row can then be passed into -obj$fn()
to calculate the log-posterior.
Note that it’s crucial the arguments are in the right order when we pass them to obj$fn()
– this has caused headaches!
spacing <- 0.05
x_grid <- seq(from = -5, to = 5, by = spacing)
x_grid_length <- length(x_grid)
df <- t(opt_theta$par) %>%
as.data.frame() %>%
slice(rep(1:n(), each = x_grid_length)) %>%
mutate(phi_prev_i = x_grid) %>%
rowwise() %>%
mutate(logpost = -obj$fn(c(beta_prev, phi_prev_i, log_sigma_phi_prev)))
We’ve computed the unnormalised log-posterior – how can we go from that to a normalised posterior? Given these grid of points, we can do the trapezoid rule here.
The safe exponential function computes \(\exp(x - \text{max}(x))\).
safe_exp <- function(x) exp(x - max(x))
A simple way to write the trapezoid rule is as follows:
trapezoid_rule <- function(x, spacing) {
0.5 * spacing * (2 * sum(x) - x[1] - x[2])
}
It can also be defined like this, which is more convenient for thinking about it like other quadrature rules based on weights:
trapezoid_rule <- function(x, spacing) {
w <- rep(spacing, length(x))
w[1] <- w[1] / 2
w[length(x)] <- w[length(x)] / 2
sum(w * x)
}
df$post <- safe_exp(df$logpost)
norm_trapezoid <- trapezoid_rule(df$post, spacing = spacing)
df$post_norm <- df$post / norm_trapezoid
Check that this method correctly computes the normalising constant for a density (which is known to integrate to one):
abs(trapezoid_rule(dnorm(seq(-10, 10, by = 0.1)), spacing = 0.1) - 1) < 10E-6
## [1] TRUE
logSumExp
trapezoid ruleWhat about with another method? We can calculate the logarithm of the normalising constant directly, then just subtract it from the log-posterior:
\[ \begin{align} \tilde f(x) &= \frac{f(x)}{\int f(x) \text{d}x} = \frac{f(x)}{C} \implies \\ \log\tilde f(x) &= \log f(x) - \log C \end{align} \]
The matrixStats::logSumExp
function computes \(\text{LSE}(x) = \log(\exp(x_1) + \cdots + \exp(x_n))\) in a computationally stable way.
Another way to compute the trapezoid rule, taking input on the log-scale, is as follows:
trapezoid_rule_log <- function(x, spacing) {
w <- rep(spacing, length(x))
w[1] <- w[1] / 2
w[length(x)] <- w[length(x)] / 2
matrixStats::logSumExp(log(w) + x)
}
norm_trapezoid_log <- trapezoid_rule_log(df$logpost, spacing)
df$post_norm_alt <- exp(df$logpost - norm_trapezoid_log)
Check that this method correctly computes the normalising constant for a density (which is known to integrate to one):
abs(trapezoid_rule_log(dnorm(seq(-10, 10, by = 0.1), log = TRUE), spacing = 0.1) - log(1)) < 10E-6
## [1] TRUE
aghq
The two ways of calculating the normalised posterior above are the same up to numerical error:
max(abs(df$post_norm - df$post_norm_alt)) < 10E-6
## [1] TRUE
So how might we approach this in a more sophisticated way?
We could do the normalisation directly in aghq
!
We could start by normalising \(\tilde p_\text{LA}(\theta, y)\). Note that sometimes we have been calling this \(\tilde p_\text{LA}(\theta \, | \, y)\) because they are equivalent up to a constant of proportionality. Define \[ \tilde p_\text{AQ}(\theta \, | \, y) = \frac{\tilde p_\text{LA}(\theta, y)}{\tilde p_\text{AQ}(y)} \] or on a log-scale \[ \log \tilde p_\text{AQ}(\theta \, | \, y) = \log \tilde p_\text{LA}(\theta, y) - \log \tilde p_\text{AQ}(y) \]
First we have to add a numerical Hessian that’s missing from h
:
h$he <- function(theta) numDeriv::jacobian(h$gr, theta, method = "Richardson")
Now we can obtain \(\log \tilde p_\text{AQ}(y)\):
quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst
## [1] -143.7878
Remember that h
is the negative log-likelihood, so we want to add the normalising constant rather than subtract it:
h_norm <- function(theta) h$fn(theta) + quad$normalized_posterior$lognormconst
Now we can put this into \[ \tilde p(x_i, \hat \theta \, | \, y) = \tilde p_\text{LA}(x_i \, | \, \hat \theta, y) \tilde p_\text{AQ}(\hat \theta \, | \, y) \]
But perhaps there is no reason to, because we will just have to normalise it again, so why do it twice?
Let’s just try to normalise it directly.
What we need to do is create a function in TMB
where we fix some of the inputs to the function.
This can be done using the map
argument of MakeADFun
.
You set each parameter to factor(NA)
that you want to be fixed, then they will retain the value you input in parameters
:
obj$par %>% names()
## [1] "beta_prev" "phi_prev_i" "log_sigma_phi_prev"
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
param_i_fixed_theta <- param_i
theta_names <- names(opt_theta$par)
for(theta in theta_names) {
param_i_fixed_theta[[theta]] <- opt_theta$par[[theta]]
}
dat_i <- prepare_dat(dat, i)
obj_fixed_theta <- MakeADFun(
data = dat_i,
parameters = param_i_fixed_theta,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE,
map = map_fixed_theta
)
Now we can normalise this function \(p_\text{LA}(x_i, \theta = \hat \theta, y)\) using aghq
:
quad <- aghq::aghq(ff = obj_fixed_theta, k = 3, startingvalue = 0, control = aghq::default_control_tmb())
The value of the normalising constant is (approximately) the same as we get using our trapezoid rule:
abs(quad$normalized_posterior$lognormconst - norm_trapezoid_log) < 10E-3
## [1] TRUE
We now add our Laplace approximation to the existing joint Gaussian marginal and MCMC results:
plot_pdf3 <- plot_pdf2 +
geom_line(data = df, aes(x = phi_prev_i, y = post_norm, col = "Laplace"))
df$ecdf <- cumsum(df$post_norm) * c(0, diff(df$phi_prev_i))
plot_cdf3 <- plot_cdf2 +
geom_line(
data = df, aes(x = phi_prev_i, y = ecdf), col = cbpalette[2]
)
cowplot::plot_grid(plot_cdf3, plot_pdf3, rel_widths = c(1, 1.5))
We use the inverse CDF method to sample from the Laplace marginal, then we can calculate the KS statistic:
samples_laplace <- approxfun(df$ecdf, df$phi_prev_i)(runif(M))
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm): collapsing to unique 'x' values
ks_laplace_tmbstan <- inf.utils::ks_test(samples_tmbstan, samples_laplace)
kable_data <- data.frame(
"Distance" = c(ks_gaussian_tmbstan$D, ks_laplace_tmbstan$D),
"Location" = c(ks_gaussian_tmbstan$l, ks_laplace_tmbstan$l)
)
rownames(kable_data) <- c("$\\text{KS}(\\texttt{tmbstan}, \\texttt{gaussian})$", "$\\text{KS}(\\texttt{tmbstan}, \\texttt{laplace})$")
kableExtra::kable(kable_data, booktabs = TRUE, escape = FALSE, align = "c")
Distance | Location | |
---|---|---|
\(\text{KS}(\texttt{tmbstan}, \texttt{gaussian})\) | 0.0617 | 0.6635031 |
\(\text{KS}(\texttt{tmbstan}, \texttt{laplace})\) | 0.0402 | 1.0389550 |
So we’ve done the approximation for the first index \(i = 1\)! Now let’s try computing \(\tilde p(x_i \, | \, y)\) for every index \(i\), using a for-loop to generate the required Laplace approximations, and tidying up and condensing the code used:
xi_laplace_marginal <- function(i) {
dat_i <- prepare_dat(dat, i)
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
param_i_fixed_theta <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
for(theta in names(opt_theta$par)) param_i_fixed_theta[[theta]] <- opt_theta$par[[theta]]
obj_fixed_theta <- MakeADFun(
data = dat_i,
parameters = param_i_fixed_theta,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE,
map = map_fixed_theta
)
quad <- aghq::aghq(ff = obj_fixed_theta, k = 3, startingvalue = 0, control = aghq::default_control_tmb())
pdf_and_cdf <- aghq::compute_pdf_and_cdf(quad)[[1]]
return(pdf_and_cdf)
}
laplace_df <- lapply(1:dat$n, xi_laplace_marginal) %>%
bind_rows(.id = "index") %>%
mutate(index = as.numeric(index))
M <- 1000
samples_laplace <- lapply(1:dat$n, function(i) {
laplace_df_i <- filter(laplace_df, index == i)
samples <- approxfun(x = laplace_df_i$cdf, y = laplace_df_i$theta, ties = "ordered")(runif(M))
data.frame("value" = samples)
}) %>%
bind_rows(.id = "index")
Let’s also get everything we need for the joint Gaussian marginal for each index:
x_Sigma <- solve(x_H)
gaussian_df <- lapply(1:dat$n, function(i) {
x <- seq(-5, 5, length.out = 1000)
mode_i <- as.numeric(x_hat[i])
sd_i <- as.numeric(sqrt(x_Sigma[i, i]))
data.frame(index = i, x = x, pdf = dnorm(x, mean = mode_i, sd = sd_i))
}) %>%
bind_rows()
samples_gaussian <- lapply(1:dat$n, function(i) {
mode_i <- as.numeric(x_hat[i])
sd_i <- as.numeric(sqrt(x_Sigma[i, i]))
samples <- rnorm(M, mode_i, sd_i)
data.frame("value" = samples)
}) %>%
bind_rows(.id = "index")
samples_tmbstan <- extract(mcmc, pars = "phi_prev") %>%
as.data.frame() %>%
pivot_longer(
cols = everything(),
names_to = "index",
names_prefix = "phi_prev.",
names_transform = as.integer
)
plot <- ggplot(samples_tmbstan, aes(x = value)) +
geom_histogram(aes(y = ..density..), alpha = 0.8, fill = cbpalette[7]) +
facet_wrap(~index) +
theme_minimal() +
labs(x = "phi_prev", y = "Posterior PDF")
plot2 <- plot +
geom_line(data = gaussian_df, aes(x = x, y = pdf), col = cbpalette[1])
plot3 <- plot2 +
geom_line(data = laplace_df, aes(x = theta, y = pdf), col = cbpalette[2])
plot3
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ks_gaussian_tmbstan <- lapply(1:length(x_hat), function(i) {
samples_gaussian_i <- filter(samples_gaussian, index == i)$value
samples_tmbstan_i <- filter(samples_tmbstan, index == i)$value
inf.utils::ks_test(samples_gaussian_i, samples_tmbstan_i)
})
ks_laplace_tmbstan <- lapply(1:length(x_hat), function(i) {
samples_laplace_i <- filter(samples_laplace, index == i)$value
samples_tmbstan_i <- filter(samples_tmbstan, index == i)$value
inf.utils::ks_test(samples_laplace_i, samples_tmbstan_i)
})
ks_results <- bind_rows(
bind_rows(ks_gaussian_tmbstan, .id = "index") %>%
mutate(type = "gaussian"),
bind_rows(ks_laplace_tmbstan, .id = "index") %>%
mutate(type = "laplace")
)
ks_results %>%
select(index, D, type) %>%
pivot_wider(
names_from = type,
values_from = D
) %>%
ggplot(aes(x = laplace, y = gaussian)) +
geom_point(alpha = 0.5) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
lims(x = c(0, 1), y = c(0, 1)) +
labs(x = "KS(laplace, tmbstan)", y = "KS(gaussian, tmbstan)") +
theme_minimal()
#' Table of average KS distance by method
ks_results %>%
group_by(type) %>%
summarise(Distance = mean(D)) %>%
select(Distance) %>%
mutate(
Method = c("$\\text{KS}(\\texttt{tmbstan}, \\texttt{gaussian})$",
"$\\text{KS}(\\texttt{tmbstan}, \\texttt{laplace})$"),
.before = Distance
) %>%
kableExtra::kable(booktabs = TRUE, escape = FALSE, align = "c")
Method | Distance |
---|---|
\(\text{KS}(\texttt{tmbstan}, \texttt{gaussian})\) | 0.0511306 |
\(\text{KS}(\texttt{tmbstan}, \texttt{laplace})\) | 0.0519250 |
In Sections 3 and 4 when calculating \(\tilde p(x_i \, | \, y)\) we fixed \(\theta = \hat \theta\) (\(K = 1\), empirical Bayes) rather than integrating \(\theta\) out (\(K > 1\)).
In this section we will use ahgq
to integrate over \(\theta\).
We’re going to start by normalising \(\tilde p_\text{LA}(\theta, y)\). We did this above in the empirical Bayes case, but found that it wasn’t strictly necessary.
h$he <- function(theta) numDeriv::jacobian(h$gr, theta, method = "Richardson")
quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst
## [1] -143.7878
h_norm <- function(theta) h$fn(theta) + quad$normalized_posterior$lognormconst
We can get the mode and Hessian of the random effects at each quadrature point as follows:
distinctthetas <- quad$normalized_posterior$nodesandweights[, grep('theta', colnames(quad$normalized_posterior$nodesandweights))]
modesandhessians <- distinctthetas
thetanames <- make.unique(names(h$par), sep = "")
colnames(modesandhessians) <- thetanames
modesandhessians$mode <- vector(mode = "list", length = nrow(distinctthetas))
modesandhessians$H <- vector(mode = "list", length = nrow(distinctthetas))
for(i in 1:nrow(distinctthetas)) {
theta <- as.numeric(modesandhessians[i, thetanames])
h$fn(theta)
mm <- h$env$last.par
modesandhessians[i, "mode"] <- list(list(mm[h$env$random]))
H <- h$env$spHess(mm, random = TRUE)
H <- rlang::duplicate(H)
modesandhessians[i, "H"] <- list(list(H))
}
modesandhessians$weights <- quad$normalized_posterior$nodesandweights$weights
Now we want to evaluate \[ \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{G}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z). \]
To start with, let’s evaluate this on the log-scale for i = 1
, x = x_test_value
and the first value of z
:
modesandhessians$weights[z]
- h_norm(as.numeric(modesandhessians[z, thetanames]))
ndorm
with log = TRUE
x_test_value <- 0
#' Note that in some sense z actually runs over nodes, not indices
z <- 1
i <- 1
mode_i <- as.numeric(modesandhessians$mode[[z]][i])
Sigma <- solve(modesandhessians$H[[z]])
sd_i <- as.numeric(sqrt(Sigma[i, i]))
dnorm(x_test_value, mean = mode_i, sd = sd_i, log = TRUE) + (- h_norm(theta)) + log(modesandhessians$weights[z])
## [1] -7.731226
## attr(,"logarithm")
## [1] TRUE
Now let’s sum over z
so that we can obtain the log-probability for our x_test_value
in marginal i
:
gaussian_marginal <- function(x) {
lp <- vector(mode = "numeric", length = nrow(modesandhessians))
for(z in 1:nrow(modesandhessians)) {
theta <- as.numeric(modesandhessians[z, thetanames])
mode_i <- as.numeric(modesandhessians$mode[[z]][i])
Sigma <- solve(modesandhessians$H[[z]])
sd_i <- as.numeric(sqrt(Sigma[i, i]))
lp[z] <- dnorm(x, mean = mode_i, sd = sd_i, log = TRUE) + (- h_norm(theta)) + log(modesandhessians$weights[z])
}
matrixStats::logSumExp(lp)
}
gaussian_marginal(x_test_value)
## [1] -1.803582
To pick a set of \(x_i\) to evaluate the marginal at, we will use an approximation to the AGHQ nodes. We will take the hyperparameters to be fixed at their mode, then choose the appropriate marginals of the joint Gaussian:
#' Create a function that takes nodesandhessians and outputs the spline nodes for a particular marginal
spline_nodes <- function(modesandhessians, i, k = 7) {
#' Get the row number of modesandhessians which contains that theta mode
#' This is bad code (relying on objects outside function)
theta_mode_location <- which.max(quad$normalized_posterior$nodesandweights$logpost_normalized)
#' These are the mode and standard deviation from the Gaussian approximation at the node which is the mode of the Laplace approximation
#' (Say that three times fast...)
mode_i <- modesandhessians[[theta_mode_location, "mode"]][i]
sd_i <- sqrt(1 / modesandhessians[[theta_mode_location, "H"]][i, i])
#' Create Gauss-Hermite quadrature
gg <- mvQuad::createNIGrid(dim = 1, type = "GHe", level = k)
#' Adapt to mode_i and sd_i
mvQuad::rescale(gg, m = mode_i, C = sd_i^2)
#' These are our set of x input values
gg$nodes
}
#' The log-probabilities at the set of input values
nodes <- spline_nodes(modesandhessians, i = 1, k = 7)
lps <- sapply(nodes, gaussian_marginal)
plot_marginal_spline <- function(nodes, lps) {
#' Build a Lagrange polynomial interpolant of the marginal posterior
ss <- splines::interpSpline(nodes, lps, bSpline = TRUE, sparse = TRUE)
interpolant <- function(x) { as.numeric(stats::predict(ss, x)$y) }
finegrid <- seq(-5, 5, by = 0.1)
df <- data.frame(x = finegrid, y = exp(interpolant(finegrid)))
ggplot(df, aes(x = x, y = y)) +
geom_line() +
theme_minimal() +
labs(x = "phi_prev[1]", y = "Posterior")
}
plot_marginal_spline(nodes, lps)
As before, but with the full Laplace approximation in place of the Gaussian. Much of what we computed above in Section 5.1 is still relevant. The only thing that has changed is that we’re now using \(\tilde p_\text{LA}(x_i \, | \, \theta(z), y)\) rather than \(\tilde p_\text{G}(x_i \, | \, \theta(z), y)\) in \[ \tilde p(x_i \, | \, y) = \sum_z \tilde p_\text{LA}(x_i \, | \, \theta(z), y) \tilde p_\text{LA}(\theta(z) \, | \, y) \omega(z). \]
x_test_value <- 0
i <- 1
z <- 1
As before, the weight \(\omega(z)\) is modesandhessians$weights[z]
and the normalised Laplace approximation \(\tilde p_\text{LA}(\theta(z) \, | \, y)\) is - h_norm(as.numeric(modesandhessians[z, thetanames]))
.
Getting to the difference, the normalised Laplace approximation \(p_\text{LA}(x_i \, | \, \theta(z), y)\) on the log-scale is:
dat_i <- prepare_dat(dat, i)
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
param_i_fixed_theta <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]
obj_fixed_theta <- MakeADFun(
data = dat_i,
parameters = param_i_fixed_theta,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE,
map = map_fixed_theta
)
#' Note that this is only a one-dimensional marginal, so perhaps we can afford to put k > 3 here
quad <- aghq::aghq(ff = obj_fixed_theta, k = 7, startingvalue = 0, control = aghq::default_control_tmb())
obj_fixed_theta_norm <- function(x) obj_fixed_theta$fn(x) + quad$normalized_posterior$lognormconst
- obj_fixed_theta_norm(x_test_value)
## [1] -5.16969
## attr(,"logarithm")
## [1] TRUE
Putting it together on the log-scale:
as.numeric(- obj_fixed_theta_norm(x_test_value)) + (- h_norm(theta)) + log(modesandhessians$weights[z])
## [1] -6.640522
## attr(,"logarithm")
## [1] TRUE
Sum over z
:
old_laplace_marginal <- function(x) {
lp <- vector(mode = "numeric", length = nrow(modesandhessians))
for(z in 1:nrow(modesandhessians)) {
theta <- as.numeric(modesandhessians[z, thetanames])
dat_i <- prepare_dat(dat, i)
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
param_i_fixed_theta <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]
obj_fixed_theta <- MakeADFun(
data = dat_i,
parameters = param_i_fixed_theta,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE,
map = map_fixed_theta
)
quad <- aghq::aghq(ff = obj_fixed_theta, k = 7, startingvalue = 0, control = aghq::default_control_tmb())
obj_fixed_theta_norm <- function(xx) obj_fixed_theta$fn(xx) + quad$normalized_posterior$lognormconst
lp[z] <- as.numeric(- obj_fixed_theta_norm(x)) + (- h_norm(theta)) + log(modesandhessians$weights[z])
}
matrixStats::logSumExp(lp)
}
old_laplace_marginal(x_test_value)
## [1] -1.155867
lps <- sapply(nodes, old_laplace_marginal)
plot_marginal_spline(nodes, lps)
The method above should work, but it requires a lot of computation. Instead, we might be able to do this more cost effectively by integrating \(\theta\) out before normalising \(\tilde p_\text{LA}(x_i, \theta(z), y)\). This would look like first obtaining \[ \tilde p(x_i, y) = \sum_z \tilde p_\text{LA}(x_i, \theta(z), y) \omega(z), \] then normalising by the previously obtained \(\tilde p_\text{AQ}(y)\) as follows \[ \tilde p(x_i \, | \, y) = \frac{\tilde p(x_i, y)}{\tilde p_\text{AQ}(y)}. \]
#' Make sure the log normalising constant is available
quad <- aghq::aghq(ff = h, k = 3, startingvalue = c(0, 0), control = aghq::default_control_tmb())
quad$normalized_posterior$lognormconst
## [1] -143.7878
laplace_marginal <- function(x) {
lp <- vector(mode = "numeric", length = nrow(modesandhessians))
for(z in 1:nrow(modesandhessians)) {
theta <- as.numeric(modesandhessians[z, thetanames])
dat_i <- prepare_dat(dat, i)
map_fixed_theta <- list(beta_prev = factor(NA), log_sigma_phi_prev = factor(NA))
param_i_fixed_theta <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
for(theta_string in thetanames) param_i_fixed_theta[[theta_string]] <- modesandhessians[z, thetanames][[theta_string]]
obj_fixed_theta <- MakeADFun(
data = dat_i,
parameters = param_i_fixed_theta,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE,
map = map_fixed_theta
)
lp[z] <- as.numeric(- obj_fixed_theta$fn(x)) + log(modesandhessians$weights[z])
}
matrixStats::logSumExp(lp) - quad$normalized_posterior$lognormconst
}
laplace_marginal(x_test_value)
## [1] -2.61789
lps <- sapply(nodes, laplace_marginal)
plot_marginal_spline(nodes, lps)
Try doing this without a new MakeADFun
for each \(z\):
new_laplace_marginal <- function(x, i) {
dat_i <- prepare_dat(dat, i)
param_i <- list(
beta_prev = 0,
phi_prev_i = 0,
phi_prev_minus_i = rep(0, data$n - 1),
log_sigma_phi_prev = 0
)
obj <- MakeADFun(
data = dat_i,
parameters = param_i,
random = "phi_prev_minus_i",
DLL = "model1_index",
silent = TRUE
)
lp <- vector(mode = "numeric", length = nrow(modesandhessians))
for(z in 1:nrow(modesandhessians)) {
theta <- as.numeric(modesandhessians[z, thetanames])
lp[z] <- as.numeric(- obj$fn(c(theta[1], x, theta[2]))) + log(modesandhessians$weights[z])
}
matrixStats::logSumExp(lp) - quad$normalized_posterior$lognormconst
}
We get the same value as we did with laplace_marginal
!
new_laplace_marginal(x_test_value, i = 1)
## [1] -2.61789
lps <- sapply(nodes, new_laplace_marginal, i = 1)
plot_marginal_spline(nodes, lps)
Now let’s loop this over \(i\) so we can compare to the other methods. The hypothesis is that this will be better than the empirical Bayes methods.
spline_nodes(modesandhessians, i = 1)
## [,1]
## [1,] -3.7504397177257429163433
## [2,] -2.3667594107345415466170
## [3,] -1.1544053947399681714359
## [4,] -0.0000000000000001498764
## [5,] 1.1544053947399681714359
## [6,] 2.3667594107345415466170
## [7,] 3.7504397177257429163433
new_laplace_marginal_spline_aghq <- function(i) {
x <- spline_nodes(modesandhessians, i = i)
lps <- sapply(x, new_laplace_marginal, i = i)
ss <- splines::interpSpline(x, lps, bSpline = TRUE, sparse = TRUE)
interpolant <- function(x) { as.numeric(stats::predict(ss, x)$y) }
finegrid <- seq(-5, 5, by = 0.1)
df <- data.frame(x = finegrid, y = exp(interpolant(finegrid)))
return(df)
}
aghq_laplace_df <- lapply(1:dat$n, new_laplace_marginal_spline_aghq) %>%
bind_rows(.id = "index") %>%
mutate(index = as.numeric(index))
plot3 +
geom_line(data = aghq_laplace_df, aes(x = x, y = y), col = cbpalette[3])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#' Need to edit this so that there are cdf and theta columns in aghq_laplace_df
# samples_aghq_laplace <- lapply(1:dat$n, function(i) {
# aghq_laplace_df_i <- filter(aghq_laplace_df, index == i)
# samples <- approxfun(x = aghq_laplace_df_i$cdf, y = aghq_laplace_df_i$theta, ties = "ordered")(runif(M))
# data.frame("value" = samples)
# }) %>%
# bind_rows(.id = "index")
Description | Here | Wood (2019) notation | Notes |
---|---|---|---|
Random effects | \(x\) | \(\beta\) | Stringer (2021) use \(w\) |
Random effects which maximise (the / some) objective for given \(\theta\) | \(\hat x (\theta)\) | \(\hat \beta\) | Wood omits dependence on \(\theta\) |
Random effects which maximise (the / some) objective for given \(\theta\), subject to \(x_i\) being fixed | \((x_i, \hat x_{-i}(x_i, \theta))\) | \(\tilde \beta\) | We have \(\hat x_{-i}(x_i, \theta))\) having dimension \(n - 1\) |
Hessian / precision matrix of the negative log-likelihood with respect to \(x\) at \(\hat x(\theta)\) | \(H(\theta)\) | \(H\) | Unsure if it’s preferable to use Hessian or precision notation here |
model1.cpp
// model1.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_INTEGER(n); // Number of regions
DATA_VECTOR(y_prev); // Vector of survey responses
DATA_VECTOR(m_prev); // Vector of survey sample sizes
// Parameter block
PARAMETER(beta_prev); // Survey intercept
PARAMETER_VECTOR(phi_prev); // Survey spatial effect
PARAMETER(log_sigma_phi_prev); // Survey log standard deviation of spatial effects
// Transformed parameters block
Type sigma_phi_prev = exp(log_sigma_phi_prev);
vector<Type> eta_prev(beta_prev + sigma_phi_prev * phi_prev);
// Initialise negative log-likelihood
Type nll(0.0);
// Priors
nll -= dnorm(sigma_phi_prev, Type(0), Type(2.5), true) + log_sigma_phi_prev;
nll -= dnorm(beta_prev, Type(-2), Type(1), true);
nll -= dnorm(phi_prev, Type(0), Type(1), true).sum();
// Likelihood
nll -= dbinom_robust(y_prev, m_prev, eta_prev, true).sum();
// Generated quantities block
vector<Type> rho_prev(invlogit(eta_prev)); // Posterior prevalence estimates
Type tau_phi_prev = 1/pow(sigma_phi_prev, 2); // Precision of spatial effects
return(nll);
}
model1_index.cpp
// model1index.cpp
#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator()()
{
// Data block
DATA_INTEGER(n); // Number of regions
DATA_VECTOR(y_prev_i); // Survey response for index i
DATA_VECTOR(y_prev_minus_i); // Survey response for index -i
DATA_VECTOR(m_prev_i); // Survey sample size for index i
DATA_VECTOR(m_prev_minus_i); // Survey sample size for index -i
// Parameter block
PARAMETER(beta_prev); // Survey intercept
PARAMETER(phi_prev_i); // Survey spatial effect for index i
PARAMETER_VECTOR(phi_prev_minus_i); // Survey spatial effect for index -i
PARAMETER(log_sigma_phi_prev); // Survey log standard deviation of spatial effects
// Transformed parameters block
Type sigma_phi_prev = exp(log_sigma_phi_prev);
Type eta_prev_i = beta_prev + sigma_phi_prev * phi_prev_i;
vector<Type> eta_prev_minus_i(beta_prev + sigma_phi_prev * phi_prev_minus_i);
// Initialise negative log-likelihood
Type nll(0.0);
// Priors
nll -= dnorm(sigma_phi_prev, Type(0), Type(2.5), true) + log_sigma_phi_prev;
nll -= dnorm(beta_prev, Type(-2), Type(1), true);
nll -= dnorm(phi_prev_i, Type(0), Type(1), true);
nll -= dnorm(phi_prev_minus_i, Type(0), Type(1), true).sum();
// Likelihood
nll -= dbinom_robust(y_prev_i, m_prev_i, eta_prev_i, true).sum();
nll -= dbinom_robust(y_prev_minus_i, m_prev_minus_i, eta_prev_minus_i, true).sum();
// Generated quantities block
Type tau_phi_prev = 1/pow(sigma_phi_prev, 2); // Precision of spatial effects
return(nll);
}
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 13.1
##
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] inf.utils_0.1.0 multi.utils_0.1.0 patchwork_1.1.2 tibble_3.1.8 tidyverse_1.3.1 aghq_0.4.0 tmbstan_1.0.4 rstan_2.21.5
## [9] StanHeaders_2.21.0-7 glmmTMB_1.1.4 TMB_1.9.1 stringr_1.5.0 purrr_0.3.5 tidyr_1.2.1 readr_2.1.3 INLA_22.05.07
## [17] sp_1.5-1 foreach_1.5.2 Matrix_1.5-1 ggplot2_3.4.0 forcats_0.5.2 dplyr_1.0.10
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.3 fs_1.5.2 rstudioapi_0.13 farver_2.1.1 MatrixModels_0.5-0 bit64_4.0.5
## [10] fansi_1.0.3 lubridate_1.8.0 xml2_1.3.3 codetools_0.2-18 splines_4.2.0 cachem_1.0.6 knitr_1.41 polynom_1.4-1 jsonlite_1.8.4
## [19] nloptr_2.0.3 broom_0.8.0 dbplyr_2.1.1 compiler_4.2.0 httr_1.4.4 backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0 cli_3.4.1
## [28] htmltools_0.5.3 prettyunits_1.1.1 tools_4.2.0 gtable_0.3.1 glue_1.6.2 Rcpp_1.0.9 jquerylib_0.1.4 cellranger_1.1.0 vctrs_0.5.1
## [37] mvQuad_1.0-6 svglite_2.1.0 nlme_3.1-157 iterators_1.0.14 xfun_0.35 ps_1.7.0 lme4_1.1-30 rvest_1.0.2 lifecycle_1.0.3
## [46] statmod_1.4.36 orderly_1.4.3 ids_1.0.1 MASS_7.3-56 scales_1.2.1 ragg_1.2.2 hms_1.1.2 inline_0.3.19 yaml_2.3.6
## [55] memoise_2.0.1 gridExtra_2.3 sass_0.4.4 loo_2.5.1 RSQLite_2.2.14 stringi_1.7.8 highr_0.9 boot_1.3-28 pkgbuild_1.3.1
## [64] rlang_1.0.6 pkgconfig_2.0.3 systemfonts_1.0.4 matrixStats_0.62.0 evaluate_0.18 lattice_0.20-45 labeling_0.4.2 cowplot_1.1.1 bit_4.0.5
## [73] processx_3.5.3 tidyselect_1.2.0 bookdown_0.26 magrittr_2.0.3 R6_2.5.1 generics_0.1.3 DBI_1.1.3 pillar_1.8.1 haven_2.5.0
## [82] withr_2.5.0 modelr_0.1.8 crayon_1.5.2 uuid_1.1-0 utf8_1.2.2 tzdb_0.3.0 rmarkdown_2.18 grid_4.2.0 readxl_1.4.0
## [91] data.table_1.14.6 blob_1.2.3 callr_3.7.0 reprex_2.0.1 digest_0.6.30 webshot_0.5.3 numDeriv_2016.8-1.1 textshaping_0.3.6 openssl_2.0.5
## [100] RcppParallel_5.1.5 stats4_4.2.0 munsell_0.5.0 viridisLite_0.4.1 kableExtra_1.3.4 bslib_0.4.1 askpass_1.1
Note that I’m uncertain if \(\hat \theta\) (a constant) should still be included in the LHS of this.↩︎