Bayesian spatio-temporal methods for small-area estimation of HIV indicators
Abstract
Progress towards ending AIDS as a public health threat by 2030 is not being made fast enough. Effective public health response requires accurate, timely, high-resolution estimates of epidemic and demographic indicators. Limitations of available data and statistical methodology make obtaining these estimates difficult. I developed and applied Bayesian spatio-temporal methods to meet this challenge. First, I used scoring rules to compare models for area-level spatial structure with both simulated and real data. Second, I estimated district-level HIV risk group proportions, enabling behavioural prioritisation of prevention services, as put forward in the UNAIDS Global AIDS Strategy. Third, I developed a novel deterministic Bayesian inference method, combining adaptive Gauss-Hermite quadrature with principal component analysis, motivated by the Naomi district-level model of HIV indicators. In developing this method, I implemented integrated nested Laplace approximations using automatic differentiation, enabling use of this algorithm for a wider class of models. Together, the contributions in this thesis help to guide precision HIV policy in sub-Saharan Africa, as well as advancing Bayesian methods for spatio-temporal data.
Welcome
This is the e-book version of my PhD thesis, submitted to Imperial College London in accordance with the requirements of the degree of Doctor of Philosophy in Modern Statistics and Statistical Machine Learning.
If you would prefer, you can view the PDF version.
The associated GitHub repository for this thesis is athowes/thesis
.
A concise introduction to the work is available via my thesis defense slides, or (slightly less concise) longer slides for a lab group meeting.
The corrections for this thesis are also available online.
If you notice any typos or other issues with the work, feel free to open an issue on GitHub, or submit a pull request.
Acknowledgements
I would first like to express my gratitude to Seth Flaxman and Jeff Imai-Eaton for their mentorship. Their guidance has been crucial in shaping this thesis, and my development as a scientist. Thanks to the HIV Inference Group at Imperial for exposing me to impact driven research, helping me to learn to present my work, and tolerating a statistician. I am grateful to have been a part of the Modern Statistics and Statistical Machine Learning Centre for Doctoral Training at Imperial and Oxford, and the Machine Learning and Global Health Network. Thanks to Antoine, Chris, Enrico, Phil, Yanni, Tim, Liza, and Theo for conversations, some of which were about research. This work was made possible by funding provided by the EPSRC and Bill & Melinda Gates Foundation. There are many worse ways to spend billions of dollars than fighting poverty and disease.
Thanks to Mike McLaren, Kevin Esvelt, the Nucleic Acid Observatory team, and the Sculpting Evolution lab for hosting my visit to the MIT Media Lab. I left Cambridge with appropriately raised aspirations, Google document templates, and only a little terrified about the future. Thanks to Trenton, Lenni, Lenny, Geetha, Janika, Simon, Phil, Frances, Leilani and Tammy.
Thanks to Alex Stringer, and the Department of Statistics and Actuarial Science, for hosting my visit to the University of Waterloo.
Without Alex, Chapter 6 would not have been possible, and I’d still be waiting Markov chains began in Chapter 4 to converge.
Tim Lucas and Patrick Brown put me in touch with Alex, and Håvard Rue and Finn Lindgren gave helpful answers on the R-INLA
discussion group.
Thanks also to Kate, my tour guide in Waterloo, and Midtown Yoga for helping me stay balanced.
My sense for what matters has been shaped, and arguably improved, by the Effective Altruism community. Thank you to the Meridian, Trajan, and LEAH offices for hosting me this final year. Thanks to my housemates in Hackney: August, Dewi, Henry, Jerome, Johnny, and Tamara. Not to be all Bay area, but I’m proud of the community we’ve built. Pınar believed in me and my research at times when I didn’t. Thanks to Mr Sam, and attendees of the Manshead grit salt, for conferring upon me the status of stats man. No thanks to Simon Marshall, he didn’t help, if anything he held me back. I extend my deepest thanks to my parents, Deborah and Karl, and my grandparents, Kath and Tony, whose love and support have granted me the privilege to pursue my interests.
Abbreviations
Abbreviation | Definition |
---|---|
AIDS | Acquired ImmunoDeficiency Syndrome |
AIS | AIDS Indicator Survey |
ANC | Antenatal Clinic |
AGHQ | Adaptive Gauss-Hermite Quadrature |
ART | Antiretroviral Therapy |
BIC | Bayesian Information Criterion |
BF | Bayes Factor |
CAR | Conditionally Auto-regressive |
CCD | Central Composite Design |
CDC | Centers for Disease Control and Prevention |
CPO | Conditional Predictive Ordinate |
CRPS | Continuous Ranked Probability Score |
DALY | Disability Adjusted Life Year |
DDC | Data Defect Correlation |
DHS | Demographic and Health Surveys |
DIC | Deviance Information Criterion |
EB | Empirical Bayes |
ECDF | Empirical Cumulative Difference Function |
ELGM | Extended Latent Gaussian Model |
ESS | Effective Sample Size |
FSW | Female Sex Worker(s) |
GA | Gaussian Process |
GLM | Generalised Linear Model |
GLMM | Generalised Linear Mixed effects Model |
GMRF | Gaussian Markov Random Field |
Global Fund | Global Fund to Fight AIDS, Tuberculosis, and Malaria |
HMC | Hamiltonian Monte Carlo |
HIV | Human Immunodeficiency Virus |
ICAR | Intrinsic Conditionally Auto-regressive |
IID | Independent and Identically Distributed |
INLA | Integrated Nested Laplace Approximation |
LM | Linear Model |
LGM | Latent Gaussian Model |
LS | Log Score |
MCMC | Markov Chain Monte Carlo |
MSM | Men who have Sex with Men |
NUTS | No-U-Turn Sampler |
PEP | Post-Exposure Prophylaxis |
PEPFAR | President’s Emergency Plan for AIDS Relief |
PHIA | Population-based HIV Impact Assessment |
PIT | Probability Integral Transform |
PLHIV | People Living with HIV |
PPL | Probabilistic Programming Language |
PrEP | Pre-Exposure Prophylaxis |
PMTCT | Prevention of Mother-to-Child Transmission |
PWID | People Who Inject Drugs |
SAE | Small-Area Estimation |
SR | Scoring Rule |
SPSR | Strictly Proper Scoring Rule |
SSA | Sub-Saharan Africa |
STI | Sexually Transmitted Infection |
TGP | Transgender People |
TaSP | Treatment as Prevention |
UNAIDS | The Joint United Nations Programme on HIV/AIDS |
VI | Variational Inference |
VMMC | Voluntary Medical Male Circumcision |
WAIC | Watanabe-Akaike Information Criterion |
Notations
Notation | Definition |
---|---|
\(\propto\) | Proportional to. |
\(\mathbb{R}\) | The set of real numbers. |
\(\mathbb{Z}\) | The set of integers. |
\(\mathbb{Z}^+\) | The set of positive integers. |
\(\rho\) | HIV prevalence. |
\(\lambda\) | HIV incidence. |
\(\alpha\) | ART coverage. |
\(\mathcal{S}\) | Spatial study region \(\mathcal{S} \subseteq \mathbb{R}^2\). |
\(s \in \mathcal{S}\) | Point location. |
\(\mathcal{T}\) | Temporal study period \(\mathcal{T} \subseteq \mathbb{R}\). |
\(t \in \mathcal{T}\) | Time. |
\(\mathbf{y}\) | Data, a \(n\)-vector \((y_1, \ldots, y_n)\). |
\(\boldsymbol{\phi}\) | Parameters, a \(d\)-vector \((\phi_1, \ldots, \phi_d)\). |
\(\mathbf{x}\) | Latent field, a \(N\)-vector \((x_1, \ldots, x_N)\). |
\(\boldsymbol{\theta}\) | Hyperparameters, a \(m\)-vector \((\theta_1, \ldots, \theta_m)\). |
\(x \sim p(x)\) | \(x\) has the probability distribution \(p(x)\). |
\(A_i\) | Areal unit. |
\(A_i \sim A_j\) | Adjacency between areal units. |
\(\mathbf{u}\) | Random effects, often spatial. |
\(\mathbf{H}\) | Hessian matrix. |
\(\mathbf{R}\) | Structure matrix. |
\(\mathbf{Q}\) | Precision matrix. |
\(\boldsymbol{\mathbf{\Sigma}}\) | Covariance matrix. |
\(\mathbf{M}^{-}\) | The generalised inverse of a (potentially rank-deficient) matrix \(\mathbf{M}\). |
\(\mathcal{N}\) | Gaussian distribution. |
\(k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}\) | Kernel function on the space \(\mathcal{X}\). |
\(A_i \sim A_j\) | Adjacency between areal units. |
\(\mathcal{Q}\) | A set of quadrature nodes. |
\(\omega: \mathcal{Q} \to \mathbb{R}\) | A quadrature weighting function. |
\(\mathcal{Q}(m, k)\) | Gauss-Hermite quadrature points in \(m\) dimensions with \(k\) nodes per dimension, constructed according to a product rule. |
\(\varphi\) | A standard (multivariate) Gaussian density. |