1 Introduction

This thesis is about applied and methodological Bayesian statistics. It is applied and methodological in that the primary concern is real world questions and the means to answer them. The statistical approach is Bayesian because probability theory is used to arrive at conclusions based on models for observed data.

The applied focus of this thesis is in obtaining the strategic information needed to plan the response to the HIV (human immunodeficiency virus) epidemic in sub-Saharan Africa (SSA). Over 40 years since the beginning of the epidemic, HIV is the largest annual cause of disability adjusted life years (DALYs) among non-infants in SSA [Global Burden of Disease Collaborative Network (2019); Figure 1.1]. Quantification of the epidemic using statistics is a crucial part of the public health response. Effective implementation of HIV prevention and treatment requires strategic information. However, producing suitable estimates of relevant indicators is made difficult by a range of statistical challenges.

HIV is the largest cause of annual DALYs among individuals aged >1 year in SSA (Global Burden of Disease Collaborative Network 2019). One DALY represents the loss of the equivalent of one year of full health, and is calculated by the sum of years of life lost and years lost due to disability. Weights used to account for disability vary between 0 (full health) and 1 (death) depending on the severity of the condition.

Figure 1.1: HIV is the largest cause of annual DALYs among individuals aged >1 year in SSA (Global Burden of Disease Collaborative Network 2019). One DALY represents the loss of the equivalent of one year of full health, and is calculated by the sum of years of life lost and years lost due to disability. Weights used to account for disability vary between 0 (full health) and 1 (death) depending on the severity of the condition.

The data used were gathered in national household surveys or routinely collected from healthcare facilities providing HIV services. An important feature of these data are the location and time at which observations were recorded. Spatio-temporal data have important recurring commonalities across a diverse range of application settings. The work conducted in this thesis uses, and aspires to contribute to, techniques from spatio-temporal statistics.

Computation is an essential part of modern statistical practice. Each project in this thesis, and the thesis itself, is accompanied by R (R Core Team 2022) code, hosted on GitHub at https://github.com/athowes. To facilitate reproducible research, the R package orderly (FitzJohn et al. 2023) was used to structure code repositories.

1.1 Chapter overview

This thesis is structured as follows:

  • Chapter 2 provides an overview of the HIV/AIDS epidemic, and describes the challenges faced by surveillance efforts.
  • Chapter 3 introduces the statistical concepts and notation used throughout the thesis, focusing on Bayesian modelling and computation, spatio-temporal statistics, and survey methods.
  • Chapter 4: The prevailing model for spatial structure used in small-area estimation (Besag, York, and Mollié 1991) was intended to analyse a grid of pixels. In disease mapping, areas correspond to the administrative divisions of a country, which are typically not a grid. I used simulation and survey data studies to evaluate the practical consequences of this concern.
  • Chapter 5: Adolescent girls and young women are a demographic group at disproportionate risk of HIV infection. The Global AIDS Strategy recommends prioritising interventions on the basis of behaviour to prevent the most new infections using the limited available resources. I estimated the size of behavioural risk groups across priority countries to enable implementation of this strategy. Additionally, I assessed the potential benefits of the strategy in terms of numbers of new infections prevented. This work (Howes et al. 2023) was included in the UNAIDS (Joint United Nations Programme on HIV/AIDS) Global AIDS Update 2022 and 2023.
  • Chapter 6: The Naomi small-area estimation model (Eaton et al. 2021) is used by countries to estimate district-level HIV indicators. First, to allow for compatibility with Naomi, I implemented the integrated nested Laplace approximations using automatic differentiation, opening the door to a new class of fast, flexible, and accurate Bayesian inference algorithms. The implementation was using models for a clinical trial of an epilepsy drug, and for the prevalence of the parasitic worm Loa loa. Second, I developed an approximate Bayesian inference method combining adaptive Gauss-Hermite quadrature with principal components analysis. I applied these methods to data from Malawi, and analysed the consequences of the inference method choice for policy relevant outcomes.
  • Chapter 7: Finally, I discuss contributions of the research, avenues for future work, and some broader reflections.

Though chronological order is recommended, Chapters 4, 5 and 6 may be read in any order, or as stand-alone studies, if preferred.

References

Besag, Julian, Jeremy York, and Annie Mollié. 1991. Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20.
Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788.
FitzJohn, Rich, Robert Ashton, Alex Hill, Martin Eden, Wes Hinsley, Emma Russell, and James Thompson. 2023. Orderly: Lightweight Reproducible Reporting.
Global Burden of Disease Collaborative Network. 2019. Global Burden of Disease Study 2019 (GBD 2019) Results. Institute for Health Metrics and Evaluation (IHME). https://vizhub.healthdata.org/gbd-results/.
Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org.