Resources

  • Miller, Glennie, and Seaton (2020) (recommended by Samir)
  • Chapter 9 of Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny by Paula Moraga
  • Most of Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA by Elias T. Krainski and coauthors
  • Comment on Google group by Finn
  • Markdown document by Zhe Sha

Background

The solution to certain stochastic partial differential equations (SPDEs) is a Gaussian field with Matérn covariance (Whittle 1954). Lindgren, Lindström, and Rue (2010) propose to represent a Gaussian field with Matérn covariance as a GMRF by representing the solution of the SPDE using the finite element method. The Gaussian field \(f(x)\) with the Matérn covariance function is a solution to the linear fractional (since \(\alpha \neq 2\)) SPDE \[ (\kappa^2 - \Delta)^{\alpha/2} f(x) = \epsilon(x), \quad x \in \mathbb{R}^d, \quad \alpha = \nu + d/2, \quad \kappa > 0, \quad \nu > 0, \] where \((\kappa^2 - \Delta)^{\alpha/2}\) is a pseduo-differential operator, the innovation process \(\epsilon\) is spatial Gaussian white noise with unit variance and \(\Delta\) is the Laplacian \[ \Delta = \sum_{i=1}^d \frac{\delta^2}{\delta x_i^2}, \] and the marginal variance is \[ \sigma^2 = \frac{\Gamma(\nu)}{\Gamma(\nu + d/2)(4\pi)^{d/2}\kappa^{2\nu}} \]

Miller, Glennie, and Seaton (2020) explanation

SPDE equation:

\[Df = \epsilon\]

where \(D\) is a differential operator, \(f\) is a funciton and \(\epsilon\) is a stochastic process (often white noise). This corresponds to the integral form:

\[Df = \epsilon \iff \int Df(x) \phi(x) \text{d}x = \int \epsilon(x) \phi(x) \text{d}x \, \forall \phi\]

where \(\phi\) is known as a test function. Using the inner product notation \(\langle f, g \rangle = \int f(x) g(x) \text{d}x\) this is equivalent to:

\[\langle Df, \phi \rangle = \langle \epsilon, \phi \rangle\]

Solve for points on a mesh \(j = 1, \ldots, M\) and \(f(x) = \sum_{j = 1}^M \beta_j \psi_j (x)\) then:

\[ \langle D \sum_{j = 1}^M \beta_j \psi_j, \phi \rangle = \sum_{j = 1}^M \beta_j \langle D \psi_j, \phi \rangle = \langle \epsilon, \phi \rangle \]

Test only functions \(\phi\) of the form \(\sum_{j = 1}^M \beta_j \psi_j (x)\), equivalent to (showing it’s equal for all of the basis shows it’s equal for any linear combination of them) showing that:

\[ \sum_{j = 1}^M \beta_j \langle D \psi_j, \psi_i \rangle = \langle \epsilon, \psi_i \rangle \, \forall \psi_i \]

This can be written in matrix form as \(\mathbf{P} \mathbf{\beta} = \mathbf{e}\) where \(P_{i, j}\) is \(\langle D \psi_i, \psi_j \rangle\) and \(e_j = \langle \epsilon, \psi_j \rangle\). The random vector \(\mathbf{e} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}_e ^{-1})\) where the precision has \((i, j)\)th entry \(\langle \psi_i, \psi_j \rangle\). If \(\mathbf{P} \mathbf{\beta} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}_e ^{-1})\) then \(\mathbf{\beta} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q})\) where \(\mathbf{Q}\) is such that \(\mathbf{Q}^{-1} = \mathbf{P} \mathbf{Q}_e^{-1} \mathbf{P}^\top \implies \mathbf{Q} = \mathbf{P}^\top \mathbf{Q}_e \mathbf{P}\) (does \(\mathbf{P}^\top \mathbf{P} = I\)?).

The steps are to use finite element method to compute \(\mathbf{Q}\), then simulate \(\tilde \beta\) from the Gaussian, then you have a sample \(\tilde f\) from the stochastic process which is a solution to the SPDE.

Need to choose a mesh, grid, triangulation to do the FEM. R-INLA uses regular grid in 1D, or constrained Delaunay triangulation in 2D to produce a mesh. Piecewise linear basis functions (linear B-spline).

Connection to basis-penalty smoothing

Assume \(f\) to have the form \(f(x) = \sum_{j = 1}^M \beta_j \psi_j\). Observations give a log-likelihood of \(\ell(\beta)\). Point estimate \(\hat \beta\) leads to function \(\hat f\). Smoothing penalty \(J(\beta, \lambda)\) to give penalised log-likelihood \(\ell_p(\beta, \lambda) = \ell(\beta) - J(\beta, \lambda)\). Estimate both \(\lambda\) and \(\beta\) using REML.

Lots of choices for the penalty!

  • Second derivative
  • Thin-plate spline (total variation in the gradient of \(f\) including the interaction between coordinates). Ends up being \(J(\beta, \lambda) = \lambda \int (Df)^2 \text{d} x\) for some \(D\) (can also write using inner product). Can also write this in matrix form

You can interpret penalised likelihood as just Bayesian. This gets you to the following (improper) prior:

\[ p(\beta) \propto \text{exp}( - \lambda \beta^\top S \beta) \]

This is just a Gaussian with precision \(\lambda S\).

Approximate precision matrix for SPDE \(Df = \epsilon\) is the same as \(\lambda S\) computed using the smoothing penalty \(\langle Df, Df, \rangle\).

  1. Any SPDE of the form \(Df = \epsilon\) can be interpreted instead as a smoothing penalty and vice-versa
  2. Differences in the approaches are differences in implementation only

What smoothing penalty does the Matérn correspond to? This one:

\[\tau \int (\kappa^2 f - \Delta f)^2 \text{d} x\]

Trade off between the value of \(f\) and the second derivative in each direction. Higher \(\kappa\) means that \(\Delta f\) can be higher without incurring penalty. The other parameter \(\tau\) controls the overall smoothness. Similar question about how distinuishable these are as with lengthscale and noise (probably not very).

A short introduction on how to fit a SPDE model with INLA (Krainski and Rue)

library(INLA)

Generate fake data:

n <- 200
coo <- matrix(runif(2 * n), n)

plot(coo[,1], coo[,2])

s2u <- 0.5 # "sigma squared u"
k <- 10
r <- 2/k

R <- s2u * exp(-k * as.matrix(dist(coo)))

image(R)

Sample one multivariate Gaussian observation (note that rnorm(n) %*% chol(R) is equivalent to MASS::mvrnorm):

u <- drop(rnorm(n) %*% chol(R))
u2 <- MASS::mvrnorm(1, rep(0, n), Sigma = R)

plot(u)
points(u2, add = TRUE, col = "red")
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "add" is not a graphical parameter

Add a covariate effect and noise (nugget) terms:

x <- runif(n)
beta <- 1:2
s2e <- 0.2 # "sigma squared e"
lin.pred <- beta[1] + beta[2] * x + u # "linear predictor"
e <- rnorm(n, 0, sqrt(s2e)) # "noise"
y <- lin.pred + e

Mesh:

mesh <- inla.mesh.2d(
  coo, 
  cutoff = r/10, 
  max.edge = c(r/4, r/2),
  offset = c(r/2, r)
)

plot(mesh, asp = 1)
points(coo, col = "red")

Question as to if all of the points lie at verticies? It doesn’t look like it, but most of them are.

Create the projection matrix A:

A <- inla.spde.make.A(mesh = mesh, loc = coo)
class(A)
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"

A is a \(200 \times 1740\) sparse Matrix. Where do the dimensions come from? 200 is the number of data points, what about the 1740? I’d guess the number of mesh points but it’s not exactly right:

mesh$n
## [1] 1737

Set-up the SPDE:

spde <- inla.spde2.pcmatern(
  mesh = mesh,
  alpha = 1.5,
  prior.range = c(0.2, 0.5), # P(range < 0.2) = 0.5
  prior.sigma = c(1, 0.5)    # P(sigma > 1) = 0.5
)
stk.e <- inla.stack(
  tag = "est",
  data = list(y = y),
  A = list(1, A),
  effects = list(
    data.frame(b0 = 1, x = x),
    idx.u = 1:spde$n.spde)
)

pcprec <- list(
  hyper = list(theta = list(prior = "pc.prec", param = c(1, 0.1)))
)

mf <- y ~ 0 + b0 + x + f(idx.u, model = spde)

res <- inla(
  mf, 
  control.family = pcprec,
  data = inla.stack.data(stk.e),
  control.predictor = list(compute = TRUE, A = inla.stack.A(stk.e))
)

round(res$summary.fixed, 4)
pmd.s2e <- inla.tmarginal(
  function(x) sqrt(1/x), 
  res$marginals.hyperpar$`Precision for the Gaussian observations`
)

plot(pmd.s2e, type = "l", ylab = "Density", xlab = expression(sigma[e]))

Krainski et al. (2018) notes

First result

Consider a regular 2D lattice with number of sites tending to infinity then \[ \mathbb{E}(u_{i, j} \, | \, u_{-i, j}) = \frac{1}{a}(u_{i-1, j} + u_{i + 1, j} + u_{i, j - 1} + u_{i, j + 1}) \] and \(\mathbb{V}(u_{i, j} \, | \, u_{-i, j}) = 1/a\) with \(|a| > 4\). For \(\nu = 1\) and \(\nu = 2\) the GMRF representation is a convolution of processes with… (missing diagram here but it’s a Besag precision). As the smoothness \(\nu\) increases the precision matrix in GMRF representation becomes less sparse, the conditional distributions depend on a wider neighbourhood.

q1 <- INLA:::inla.rw1(n = 5)
q1
## 5 x 5 sparse Matrix of class "dgTMatrix"
##                    
## [1,]  1 -1  .  .  .
## [2,] -1  2 -1  .  .
## [3,]  . -1  2 -1  .
## [4,]  .  . -1  2 -1
## [5,]  .  .  . -1  1
crossprod(q1)
## 5 x 5 sparse Matrix of class "dsCMatrix"
##                    
## [1,]  2 -3  1  .  .
## [2,] -3  6 -4  1  .
## [3,]  1 -4  6 -4  1
## [4,]  .  1 -4  6 -3
## [5,]  .  .  1 -3  2
q2 <- INLA:::inla.rw2(n = 5)
q2
## 5 x 5 sparse Matrix of class "dgTMatrix"
##                    
## [1,]  1 -2  1  .  .
## [2,] -2  5 -4  1  .
## [3,]  1 -4  6 -4  1
## [4,]  .  1 -4  5 -2
## [5,]  .  .  1 -2  1

Second result

\[ u(s) = \sum_{k = 1}^m \psi_k(s) w_k \] where \(\psi_k\) are basis functions and \(w_k\) are Gaussian weights, \(m\) the number of points in the triangulation. The distribution of weights determines the full distribution in the continuous domain.

Projector matrix \(A\) works using barycentric coordinates of the point with respect to the coordinates of the triangle vertices.

Toy example

data(SPDEtoy)
str(SPDEtoy)
## 'data.frame':    200 obs. of  3 variables:
##  $ s1: num  0.0827 0.6123 0.162 0.7526 0.851 ...
##  $ s2: num  0.0564 0.9168 0.357 0.2576 0.1541 ...
##  $ y : num  11.52 5.28 6.9 13.18 14.6 ...
plot(x = SPDEtoy$s1, y = SPDEtoy$s2, cex = SPDEtoy$y / 10)

pl.dom <- cbind(c(0, 1, 1, 0.7, 0), c(0, 0, 0.7, 1, 1))
mesh5 <- inla.mesh.2d(loc.domain = pl.dom, max.e = c(0.092, 0.2))
plot(mesh5)

spde5 <- inla.spde2.pcmatern(
  mesh = mesh5, alpha = 2,     # Mesh and smoothness parameter
  prior.range = c(0.3, 0.5),   # P(practic.range < 0.3) = 0.5
  prior.sigma = c(10, 0.01)    # P(sigma > 10) = 0.01
) 

Projector matrix is sparse (maximum three elements in each row are non-zero, and each row sums to one).

coords <- as.matrix(SPDEtoy[, 1:2])
A5 <- inla.spde.make.A(mesh5, loc = coords)
dim(A5)
## [1] 200 489
table(rowSums(A5 > 0)) # Three non-zero elements
## 
##   3 
## 200
table(rowSums(A5))     # Which add to one
## 
##   1 
## 200
table(colSums(A5) > 0) # Columns which correspond to triangles with no points inside can be dropped
## 
## FALSE  TRUE 
##   237   252
# This is done automatically by inla.stack()

inla.stack() organises the data, covariates, indicies and projector matrices

stk5 <- inla.stack(
  data = list(resp = SPDEtoy$y),
  A = list(A5, 1), 
  effects = list(i = 1:spde5$n.spde, beta0 = rep(1, nrow(SPDEtoy))),
  tag = 'est'
)
dim(inla.stack.A(stk5)) # Number of columns in A5 which have non-zero entries, plus one column for the intercept
## [1] 200 253
res5 <- inla(
  resp ~ 0 + beta0 + f(i, model = spde5),
  data = inla.stack.data(stk5),
  control.predictor = list(A = inla.stack.A(stk5))
)

res5$summary.fixed

Now for predicting at new points.

pts3 <- rbind(c(0.1, 0.1), c(0.5, 0.55), c(0.7, 0.9))
A5pts3 <- inla.spde.make.A(mesh5, loc = pts3)
dim(A5pts3)
## [1]   3 489
jj3 <- which(colSums(A5pts3) > 0)
A5pts3[, jj3]
## 3 x 9 sparse Matrix of class "dgCMatrix"
##                                                                                                
## [1,] .         .         0.05440876 .         .         .         0.5020259 0.4435654 .        
## [2,] 0.2187568 .         .          0.3236197 .         0.4576234 .         .         .        
## [3,] .         0.1194153 .          .         0.2682321 .         .         .         0.6123526

Interpolation of the posterior mean:

drop(A5pts3 %*% res5$summary.random$i$mean)
## [1]  0.3112111  3.1013403 -2.7231595

Projection onto a grid:

pgrid0 <- inla.mesh.projector(mesh5, xlim = 0:1, ylim = 0:1, dims = c(101, 101))

prd0.m <- inla.mesh.project(pgrid0,  res5$summary.random$i$mean)
prd0.s <- inla.mesh.project(pgrid0,  res5$summary.random$i$sd)

image(prd0.m)

image(prd0.s)

Prediction of new data:

# Create a new stack
stk5.pmu <- inla.stack(
  data = list(resp = NA),
  A = list(A5pts3, 1), 
  effects = list(i = 1:spde5$n.spde, beta0 = rep(1, 3)), 
  tag = 'prd5.mu'
)

# Append it
stk5.full <- inla.stack(stk5, stk5.pmu)

r5pmu <- inla(
  resp ~ 0 + beta0 + f(i, model = spde5), 
  data = inla.stack.data(stk5.full), 
  control.mode = list(theta = res5$mode$theta, restart = FALSE), 
  control.predictor = list(A = inla.stack.A(stk5.full), compute = TRUE)
)
indd3r <- inla.stack.index(stk5.full, 'prd5.mu')$data
indd3r
## [1] 201 202 203
r5pmu$summary.fitted.values[indd3r, c(1:3, 5)]

Original computing environment

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Monterey 12.6
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] INLA_22.05.07        foreach_1.5.2        Matrix_1.5-1         knitr_1.39           rgdal_1.5-30        
##  [6] raster_3.5-15        sp_1.4-7             rstan_2.21.5         StanHeaders_2.21.0-7 sf_1.0-7            
## [11] bsae_0.2.7           multi.utils_0.1.0    stringr_1.4.0        purrr_0.3.4          readr_2.1.2         
## [16] tidyr_1.2.0          tibble_3.1.7         tidyverse_1.3.1      forcats_0.5.1        ggplot2_3.3.6       
## [21] dplyr_1.0.9          rmarkdown_2.14      
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_2.0-3   ellipsis_0.3.2     class_7.3-20       rprojroot_2.0.3    fs_1.5.2          
##  [6] rstudioapi_0.13    proxy_0.4-26       Deriv_4.1.3        MatrixModels_0.5-0 bit64_4.0.5       
## [11] fansi_1.0.3        lubridate_1.8.0    xml2_1.3.3         codetools_0.2-18   splines_4.2.0     
## [16] cachem_1.0.6       jsonlite_1.8.0     broom_0.8.0        dbplyr_2.1.1       compiler_4.2.0    
## [21] httr_1.4.3         backports_1.4.1    assertthat_0.2.1   fastmap_1.1.0      cli_3.3.0         
## [26] htmltools_0.5.2    prettyunits_1.1.1  tools_4.2.0        gtable_0.3.0       glue_1.6.2        
## [31] Rcpp_1.0.8.3       cellranger_1.1.0   jquerylib_0.1.4    vctrs_0.4.1        iterators_1.0.14  
## [36] xfun_0.31          ps_1.7.0           rvest_1.0.2        lifecycle_1.0.1    orderly_1.4.3     
## [41] terra_1.5-21       ids_1.0.1          MASS_7.3-56        scales_1.2.0       hms_1.1.1         
## [46] inline_0.3.19      rticles_0.23.6     yaml_2.3.5         memoise_2.0.1      gridExtra_2.3     
## [51] loo_2.5.1          sass_0.4.1         stringi_1.7.6      RSQLite_2.2.14     highr_0.9         
## [56] e1071_1.7-9        pkgbuild_1.3.1     rlang_1.0.2        pkgconfig_2.0.3    matrixStats_0.62.0
## [61] evaluate_0.15      lattice_0.20-45    bit_4.0.4          processx_3.5.3     tidyselect_1.1.2  
## [66] magrittr_2.0.3     R6_2.5.1           generics_0.1.2     DBI_1.1.2          pillar_1.7.0      
## [71] haven_2.5.0        withr_2.5.0        units_0.8-0        modelr_0.1.8       crayon_1.5.1      
## [76] uuid_1.1-0         KernSmooth_2.23-20 utf8_1.2.2         tzdb_0.3.0         viridis_0.6.2     
## [81] grid_4.2.0         readxl_1.4.0       blob_1.2.3         callr_3.7.0        reprex_2.0.1      
## [86] digest_0.6.29      classInt_0.4-3     openssl_2.0.1      RcppParallel_5.1.5 stats4_4.2.0      
## [91] munsell_0.5.0      viridisLite_0.4.0  bslib_0.3.1        askpass_1.1

Bibliography

Krainski, Elias T, Virgilio Gómez-Rubio, Haakon Bakka, Amanda Lenzi, Daniela Castro-Camilo, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2018. Advanced Spatial Modeling with Stochastic Partial Differential Equations Using r and INLA. CRC Press.
Lindgren, Finn, Johan Lindström, and Håvard Rue. 2010. An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The SPDE Approach. Mathematical Statistics, Centre for Mathematical Sciences, Faculty of ….
Miller, David L, Richard Glennie, and Andrew E Seaton. 2020. “Understanding the Stochastic Partial Differential Equation Approach to Smoothing.” Journal of Agricultural, Biological and Environmental Statistics 25 (1): 1–16.
Whittle, Peter. 1954. “On Stationary Processes in the Plane.” Biometrika, 434–49.