Welcome

This is the e-book version of my PhD thesis, submitted to Imperial College London in accordance with the requirements of the degree of Doctor of Philosophy in Modern Statistics and Statistical Machine Learning. If you would prefer, you can view the PDF version. The associated GitHub repository for this thesis is athowes/thesis. A concise introduction to the work is available via my thesis defense slides, or (slightly less concise) longer slides for a lab group meeting. The corrections for this thesis are also available online. If you notice any typos or other issues with the work, feel free to open an issue on GitHub, or submit a pull request.

Acknowledgments

I would first like to express my gratitude to Seth Flaxman and Jeff Imai-Eaton for their mentorship. Their guidance has been crucial in shaping this thesis, and my development as a scientist. Thanks to the HIV Inference Group at Imperial for exposing me to impact driven research, helping me to learn to present my work, and tolerating a statistician. I am grateful to have been a part of the Modern Statistics and Statistical Machine Learning Centre for Doctoral Training at Imperial and Oxford, and the Machine Learning and Global Health Network. Thanks to Antoine, Chris, Enrico, Phil, Yanni, Tim, Liza, and Theo for conversations, some of which were about research. This work was made possible by funding provided by the EPSRC and Bill & Melinda Gates Foundation. There are many worse ways to spend billions of dollars than fighting poverty and disease.

Thanks to Mike McLaren, Kevin Esvelt, the Nucleic Acid Observatory team, and the Sculpting Evolution lab for hosting my visit to the MIT Media Lab. I left Cambridge with appropriately raised aspirations, Google document templates, and only a little terrified about the future. Thanks to Trenton, Lenni, Lenny, Geetha, Janika, Simon, Phil, Frances, Leilani and Tammy.

Thanks to Alex Stringer, and the Department of Statistics and Actuarial Science, for hosting my visit to the University of Waterloo. Without Alex, Chapter 6 would not have been possible, and I’d still be waiting Markov chains began in Chapter 4 to converge. Tim Lucas and Patrick Brown put me in touch with Alex, and Håvard Rue and Finn Lindgren gave helpful answers on the R-INLA discussion group. Thanks also to Kate, my tour guide in Waterloo, and Midtown Yoga for helping me stay balanced.

My sense for what matters has been shaped, and arguably improved, by the Effective Altruism community. Thank you to the Meridian, Trajan, and LEAH offices for hosting me this final year. Thanks to my housemates in Hackney: August, Dewi, Henry, Jerome, Johnny, and Tamara. Not to be all Bay area, but I’m proud of the community we’ve built. Pınar believed in me and my research at times when I didn’t. Thanks to Mr Sam, and attendees of the Manshead grit salt, for conferring upon me the status of stats man. No thanks to Simon Marshall, he didn’t help, if anything he held me back. I extend my deepest thanks to my parents, Deborah and Karl, and my grandparents, Kath and Tony, whose love and support have granted me the privilege to pursue my interests.

Abbreviations

Abbreviation Definition
AIDS Acquired ImmunoDeficiency Syndrome
AIS AIDS Indicator Survey
ANC Antenatal Clinic
AGHQ Adaptive Gauss-Hermite Quadrature
ART Antiretroviral Therapy
BIC Bayesian Information Criterion
BF Bayes Factor
CAR Conditionally Auto-regressive
CCD Central Composite Design
CDC Centers for Disease Control and Prevention
CPO Conditional Predictive Ordinate
CRPS Continuous Ranked Probability Score
DALY Disability Adjusted Life Year
DDC Data Defect Correlation
DHS Demographic and Health Surveys
DIC Deviance Information Criterion
EB Empirical Bayes
ECDF Empirical Cumulative Difference Function
ELGM Extended Latent Gaussian Model
ESS Effective Sample Size
FSW Female Sex Worker(s)
GA Gaussian Process
GLM Generalised Linear Model
GLMM Generalised Linear Mixed effects Model
GMRF Gaussian Markov Random Field
Global Fund Global Fund to Fight AIDS, Tuberculosis, and Malaria
HMC Hamiltonian Monte Carlo
HIV Human Immunodeficiency Virus
ICAR Intrinsic Conditionally Auto-regressive
IID Independent and Identically Distributed
INLA Integrated Nested Laplace Approximation
LM Linear Model
LGM Latent Gaussian Model
LS Log Score
MCMC Markov Chain Monte Carlo
MSM Men who have Sex with Men
NUTS No-U-Turn Sampler
PEP Post-Exposure Prophylaxis
PEPFAR President’s Emergency Plan for AIDS Relief
PHIA Population-based HIV Impact Assessment
PIT Probability Integral Transform
PLHIV People Living with HIV
PPL Probabilistic Programming Language
PrEP Pre-Exposure Prophylaxis
PMTCT Prevention of Mother-to-Child Transmission
PWID People Who Inject Drugs
SAE Small-Area Estimation
SR Scoring Rule
SPSR Strictly Proper Scoring Rule
SSA Sub-Saharan Africa
STI Sexually Transmitted Infection
TGP Transgender People
TaSP Treatment as Prevention
UNAIDS The Joint United Nations Programme on HIV/AIDS
VI Variational Inference
VMMC Voluntary Medical Male Circumcision
WAIC Watanabe-Akaike Information Criterion

Notations

Notation Definition
\(\propto\) Proportional to.
\(\mathbb{R}\) The set of real numbers.
\(\mathbb{Z}\) The set of integers.
\(\mathbb{Z}^+\) The set of positive integers.
\(\rho\) HIV prevalence.
\(\lambda\) HIV incidence.
\(\alpha\) ART coverage.
\(\mathcal{S}\) Spatial study region \(\mathcal{S} \subseteq \mathbb{R}^2\).
\(s \in \mathcal{S}\) Point location.
\(\mathcal{T}\) Temporal study period \(\mathcal{T} \subseteq \mathbb{R}\).
\(t \in \mathcal{T}\) Time.
\(\mathbf{y}\) Data, a \(n\)-vector \((y_1, \ldots, y_n)\).
\(\boldsymbol{\phi}\) Parameters, a \(d\)-vector \((\phi_1, \ldots, \phi_d)\).
\(\mathbf{x}\) Latent field, a \(N\)-vector \((x_1, \ldots, x_N)\).
\(\boldsymbol{\theta}\) Hyperparameters, a \(m\)-vector \((\theta_1, \ldots, \theta_m)\).
\(x \sim p(x)\) \(x\) has the probability distribution \(p(x)\).
\(A_i\) Areal unit.
\(A_i \sim A_j\) Adjacency between areal units.
\(\mathbf{u}\) Random effects, often spatial.
\(\mathbf{H}\) Hessian matrix.
\(\mathbf{R}\) Structure matrix.
\(\mathbf{Q}\) Precision matrix.
\(\boldsymbol{\mathbf{\Sigma}}\) Covariance matrix.
\(\mathbf{M}^{-}\) The generalised inverse of a (potentially rank-deficient) matrix \(\mathbf{M}\).
\(\mathcal{N}\) Gaussian distribution.
\(k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}\) Kernel function on the space \(\mathcal{X}\).
\(A_i \sim A_j\) Adjacency between areal units.
\(\mathcal{Q}\) A set of quadrature nodes.
\(\omega: \mathcal{Q} \to \mathbb{R}\) A quadrature weighting function.
\(\mathcal{Q}(m, k)\) Gauss-Hermite quadrature points in \(m\) dimensions with \(k\) nodes per dimension, constructed according to a product rule.
\(\varphi\) A standard (multivariate) Gaussian density.