Chapter 1 Introduction

Tuberculosis (TB) is one of the oldest human diseases, with recorded cases in ancient Egypt, renaissance Europe, and in the modern day across the globe.[1] It is thought that roughly one-third of the world’s population has been infected with TB, with 1% of the world’s population being infected annually. However, the vast majority of these cases will never develop active disease. This reservoir of disease presents a challenge for control and eradication as, even if transmission can be halted, new cases will still occur for many years to come. While many people might consider TB to be a problem of the past in England, in 2017 there were 5,900 notified cases, the majority of which occurred in vulnerable populations; where incidence rates can be as much as 15 times higher than in the general population.[2] Globally, TB remains the leading cause of death from infectious disease.[3]

The Bacillus Calmette–Guérin (BCG) vaccine was developed in 1921 and was introduced to the UK in 1953. Globally, it has been shown to offer variable protection that may reduce over time.[3] However, there is strong evidence that BCG offers high levels of protection for children, and more generally within the UK born population.[4] It remains the only licensed TB vaccine with over 100 million doses given globally each year. Serious side effects are rare but scarring commonly occurs at the site of injection. In 2005, the UK withdrew the universal BCG program for those at school age and introduced a targeted program of vaccination for babies that were deemed to be at high risk.[5] This was motivated by several years of declining transmission, the evidence of high levels of protection in children and a belief that other control measures would be more cost-effective.[5] Since this change in policy, declining incidence appears to support this decision.[2] However, due to TB’s complex dynamics, the long-term effects are difficult to predict.

The availability of data is revolutionising the way we view the world; in few other areas has this revolution been felt more than in public health. In 2000, Public Health England (PHE) launched a routine surveillance system for TB, which records demographic, clinical, and microbiological information on all notified cases.[2] This dataset allows us to study the details of TB epidemiology in England more easily than ever before. Whilst this information would present much of interest by itself, by combining it with other datasets we can adjust for the changing demographics of the English population to study the trends in TB over time.

This thesis explores the impact of changing BCG policy in England, with the aim of informing global efforts to control TB. As a first step I outline some of the key background information and discuss the tooling that I developed in order to explore this information more fully. I then make use of the detailed PHE routine TB surveillance data to explore the current epidemiology of TB in England. Next, I use statistical models that make use of this data to explore the impact from the 2005 change in BCG policy. Finally I develop, and fit, a detailed, semi-sthochastic, mechanistic model of TB and BCG vaccination in England in order to forecast the ongoing impact of the change in policy versus multiple alternative scenarios. This thesis is also available as a website1, a pdf2, and a reproducible Rmarkdown document3

The remainder of this chapter outlines: the theoretical framework used in this thesis; the aims and objectives that were used to motivate this thesis; the chapter structure of this thesis; and the output from this thesis.

1.1 Theoretical framework

This thesis uses three main techniques to explore the impact of BCG vaccination on TB in England. These are: data exploration and visualisation, statistical modelling, and mechanistic modelling. Each of these techniques is outlined in the following sections.

1.1.1 Data exploration and visualisation

Data visualisation is often discounted in favour of more complex statistical or mechanistic approaches. However, as an exploratory tool it can be used to generate hypotheses that can then be evaluated using more complex techniques. It can also be used to visualise results from more complex methods that can then be used as a form of validation.

In this thesis, data visualisation is used in Chapter 2 to explore the epidemiology of TB globally and in Chapter 4 to explore the epidemiology of TB in England. Chapter 4 also uses visualisation to generate many of the hypotheses that are then explored in further detail throughout the rest of this thesis. The remaining thesis chapters use data visualisation to explore data and results.

1.1.2 Statistical modelling

At the most basic level a statistical model is a set of assumptions that outline the generative process of some sample data.[6] These assumptions describe a set of probability distributions, that approximate the population distribution from which the data has been sampled. Statistical models are usually specified using mathematical equations that relate one or more random variables to non-random variables. An example of this is a linear regression which maps a series of variables, using a linear relationship, to generate a numeric outcome variable. Statistical models can be used to represent complex multivariate relationships that would not be possible to visualise. They can also be used to test alternative scenarios without altering the underlying data, see Chapter 7 for an example of this.

In this thesis, a variety of statistical models are used to explore complex multivariate relationships. Uses cases include: adjusting for confounding variables when estimating the relationship between BCG vaccination and TB outcomes (Chapter 6); and estimating the impact on incidence rates from ending the BCG schools scheme after accounting for various confounders (Chapter 7).

1.1.3 Mechanistic modelling

Mechanistic mathematical models provide an assumption based framework for understanding the transmission of infectious diseases.[7] Mechanistic models can be used to simplify complex real-world systems, whilst retaining a linkage to real-world processes.[8] They are unlike statistical models, which instead focus on modelling the underlying structure of the data generally without reference to the real-world processes. There are multiple mechanistic modelling approaches, the most common being compartmental based models and individual based models.[7,8] Both of these approaches can be represented as deterministic, semi-stochastic (deterministic with some stochastic elements),[9] or fully stochastic - both of the latter include a random component whilst the former does not. Recently, mechanistic models have been combined with statistical models to account for the fact that events may be only partially observed.[10] Mechanistic models have an advantage over statistical models in that they can more easily be used to compare alternative scenarios over long periods of time for which observed data does not exist.

In this thesis, a partially observed semi-stochastic compartmental model of TB is developed that models demographic processes such as ageing, births and deaths, as well as vaccination and TB treatment (Chapter 8). Compartmental infectious disease models generally operate by separating a given population into a series of groups, most commonly susceptible, infectious and recovered populations.[7,8] Movements between these groups are then modeled using a series of differential equations. Transmission is modeled using mass action,[7,8] where infected cases are assumed to randomly interact with susceptible individuals at a rate dictated by the concentration of susceptibles in the population. Additional detail can be added to this model by stratifying the population further and adding additional parameters to modify the degree of mixing between populations. Transition between compartments is assumed to be exponential. See [7,8] for a theoretical introduction to infectious disease models and [10] for implementation details using R.

1.2 Aims and objectives of the thesis

1.2.1 Aim

To understand the impact of BCG vaccination on the epidemiology of TB in England, and to use this understanding to forecast the future effects of current and historic BCG vaccination policy.

1.2.2 Objectives

  • To describe the current epidemiology of TB in England, in the context of global TB epidemiology.
  • To assess some of the statistical modelling evidence used to justify the 2005 change in BCG vaccination policy in the UK.
  • To assess whether there is evidence in routinely-collected surveillance data that BCG vaccination impacts outcomes for TB cases in England.
  • To assess the effects of the 2005 change in vaccination policy on those eligible for vaccination.
  • To develop a parsimonious transmission dynamic model of TB that captures current, and historic, vaccination policy and reflects our current understanding of TB epidemiology in England.
  • To fit this model using all available data sources.
  • To investigate the effectiveness of universal school-age vs. universal neonatal vs. no vaccination using the previously developed transmission dynamic model.

1.3 Chapter overview

  • Chapter 2: Background information is given on TB and the BCG vaccine. This information helps motivate future chapters and may be useful for non-subject area experts.

  • Chapter 3: getTBinR, an R package that facilitates downloading TB relevant data from the World Health Organization and provides functionality for visualising the downloaded data, is introduced. The motivation and context for this package as part of the wider thesis is also outlined.

  • Chapter 4: This chapter describes the epidemiology of TB in England, using routine surveillance datasets. Focusing on: the impact of missing data; the mechanisms underlying that missing date; seasonal trends; the role of age; UK birth status; BCG status; trends in TB incidence rates over time; and TB outcomes in England using case rates. These data are used in all subsequent chapters in this thesis.

  • Chapter 5: This chapter recreates a simulation based statistical model that was used as part of the decision making process that led to the 2005 change in BCG vaccination policy. It extends the previously implemented model by capturing parameter and model uncertainty, and updating the underlying data. It then estimates the impact in real-terms of the change in policy using this updated model.

  • Chapter 6: This chapter uses regression analysis to explore the evidence that BCG vaccination is associated with positive outcomes for active TB cases in England. Any evidence that this is the case may strengthen the case for extending BCG vaccination coverage.

  • Chapter 7: This chapter uses a series of multilevel statistical models to assess the effects of the 2005 change in BCG vaccination policy on the populations targeted by each vaccination scheme.

  • Chapter 8: In this chapter a mechanistic model of TB and BCG vaccination in England is developed. The model structure is justified based on the known epidemiology of TB in England. Model parameters are given prior distributions based on routine surveillance data (Chapter 4), the published literature, and assumptions based on expert knowledge where no other source exists.

  • Chapter 9: In this chapter the model developed in the previous chapter is fitted to the routine surveillance data (Chapter 4) using Bayesian methods. Multiple scenarios are considered.

  • Chapter 10: In this chapter the model, developed and fitted in the previous chapters, is used to forecast the impact of universal BCG vaccination at school-age vs. universal vaccination of neonates vs. no vaccination from 2005 on-wards. The ongoing impact of each policy is then discussed through to 2040.

  • Chapter 11: Results from all previous Chapters are summarised and discussed as a whole. The strengths and weaknesses of the analysis in this thesis are outlined. Further work is outlined.

1.4 Thesis output

This thesis has produced: peer reviewed papers; preprints; talks at academic conferences; open source research software; open source software for improving the academic workflow; dashboards for exposing relevant data; dashboards for exploring the modelling methods used in this thesis; and an educational dashboard for teaching some the benefits of vaccination. These outputs are detailed in the following section.

1.4.1 Peer reviewed papers

1.4.2 Papers under review

  • Abbott S., Christensen H., Brooks-Pollock E. Reassessing the evidence for universal school-age BCG vaccination in England and Wales, doi:

1.4.3 Software Packages

  • getTBinR: The getTBinR R package facilitates downloading the most up-to-date version of multiple TB relevant data sources from the World Health Organization, along with the accompanying data dictionaries. It also contains functions to allow easy exploration of the data via searching data dictionaries, summarising key metrics on a regional and global level, and visualising the data in a variety of customisable ways. See Chapter 3 for further details. Install from CRAN with install.packages("getTBinR") or install the development version from GitHub with devtools::install_github("seabbs/getTBinR"). Link:

  • tbinenglanddataclean: An R package that contains the functions and documentation required to reproduce all data import and munging used in this thesis. This package provides a workflow to facilitate reproducing all analyses in this thesis and to expedite the work of others using data from the Enhanced Surveillance System (ETS) (Chapter 4). Available from GitHub using devtools::install_github("seabbs/tbinenglanddataclean"). Link:

  • idmodelr: An R package that contains a library of infectious disease models as well as modelling utilities. It provides tooling that includes: example SEI/SEIR/SHLIR/SHLITR model code, a model solving wrapper; a model summary function; and a scenario analysis function. Used by the explore infectious disease model dashboard ( for all functionality. Available from CRAN using install.packages("idmodelr") or install the development version from GitHub with devtools::install_github("seabbs/idmodelr"). Link:

  • prettypublisher: An R package that improves the R based reproducible research workflow. It provides tooling that includes: paper and figure referencing; effect size reporting; percentage reporting; p-value reporting; and produces a table ready for further word processing. Used throughout this thesis. Available from GitHub using devtools::install_github("seabbs/prettypublisher"). Link: Interactive tools

  • Explore global Tuberculosis: Developed to showcase geTBinR ( package functionality. This dashboard allows the interactive exploration of WHO TB data. It can also be used to generate a static, country level, report on TB epidemiology. Link:

  • Explore Tuberculosis in England and Wales: Developed to allow public Public Health England TB Notification data to be explored interactively. Key interventions are highlighted and link to trends in TB notifications. This app is used in its static form in Chapter 2. Link:

  • Explore infectious disease models: Developed to be used within a modelling short course at the University of Bristol ( This dashboard allows the user to simulate and compare a variety of compartmental infectious disease models. All model code is surfaced in an easily view-able format to allow for users to develop their own models. Link:

  • Introduction to Tuberculosis models: Developed to allow simple TB models to be explored in an interactive session. Inspired by practicals from the Introduction to TB, run by TB MAC ( at the 2017 Union conference. Link:

  • The pebble game: Developed as a learning aid to help a general audience understand the impact of vaccination on infectious disease dynamics. Used at Green Man 2016 as part of a week of outreach work and subsequently developed further. Link:

1.4.4 Talks

  • Assessing the Evidence for Universal BCG Vaccination in England - Research and Applied Epidemiology Scientific Conference 2016, Warwick, United Kingdom. Received best abstract from an early career researcher. Link:

  • Beneficial effects of BCG vaccination in outcomes for patients with active TB: observational study using the Enhanced Tuberculosis surveillance system 2000-2014 - Research and Applied Epidemiology Scientific Conference 2017, Warwick, United Kingdom. Received best PhD student abstract. Link:

  • Beneficial effects of BCG vaccination in outcomes for patients diagnosed with TB: observational study using the Enhanced Tuberculosis surveillance system 2009-2015 - 48th Union World Conference on Lung Health. Link:

  • Estimating the effect of the 2005 UK BCG vaccination policy change: A retrospective cohort study using the Enhanced Tuberculosis Surveillance system, 2000-2015 - Research and Applied Epidemiology Scientific Conference 2018, Warwick, United Kingdom. Link:

  • What do we really know about BCG? - UK Clinical Vaccine Network Conference 2019, Oxford, United Kingdom. Link:

1.5 Summary

  • This chapter provides an introduction to TB and the BCG vaccine. It then motivates the remainder of this thesis.
  • An outline of the theoretical framework used throughout this thesis is given.
  • The aims and objectives of this thesis are detailed.
  • An overview of the chapters is provided.
  • Finally the dissemination of this work so far is summarised, broken down into peer reviewed output, preprints, software output, and talks given at academic conferences.


1 Stone AC, Wilbur AK, Buikstra JE et al. Tuberculosis and leprosy in perspective. American Journal of Physical Anthropology 2009;140:66–94. doi:10.1002/ajpa.21185

2 Public Health England. Tuberculosis in England 2017 report ( presenting data to end of 2016 ) About Public Health England. 2017.

3 The World Health Organization. BCG vaccine:WHO position paper. Weekly epidemiological record 2018;1–24.

4 Roy A, Eisenhut M, Harris RJ et al. Effect of BCG vaccination against Mycobacterium tuberculosis infection in children: systematic review and meta-analysis. BMJ (Clinical research ed) 2014;349:g4643–3.

5 Zwerling A, Behr MA, Verma A et al. The BCG world atlas: A database of global BCG vaccination policies and practices. PLoS medicine 2011;8:e1001012.

6 McElreath, Richard. Statistical Rethinking. 1st ed. Chapman; Hall/CRC 2018.

7 Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control (Oxford Univ. Press, Oxford 1991.

8 Keeling MJ, Rohani P. Modeling Infectious Diseases in Humans and Animals. Epidemiology Department, Ben-Gurion University of the Negev, Beer-Sheva, Israel. [email protected]: 2007.

9 Funk S, Camacho A, Kucharski AJ et al. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics 2016;22:56–61.

10 King AA, Nguyen D, Ionides EL. Statistical Inference for Partially Observed Markov Processes via the R Package pomp. Journal Of Statistical Software 2016;69:1–43.