Exploring Estimates of the Tuberculosis Case Fatality Ratio - with getTBinR

This is a quick post exploring estimates of the case fatality ratio for Tuberculosis (TB) from data published by the World Health Organisation (WHO). It makes use of getTBinR (which is now on CRAN), pacman for package management, hrbrthemes for plot themes, and pathwork for combining multiple plots into a storyboard. For an introduction to using getTBinR to explore the WHO TB data see this post.

It is estimated that in 2016 there was more than 10 million cases of active TB, with 1.3 million deaths (source). This makes TB the leading cause of death from an infectious disease. Despite the high numbers of cases globally, and high levels of mortality, TB is often considered a historic disease in developed countries. However, in recent years previously declining trends have flattened out and there is increasing concern surrounding multi-drug resistant TB, which has a greatly reduced the rate of successful treatment (source).

A key component of TB control is reducing the case fatality rate from active TB, this post explores WHO estimates for the TB case fatality rate in 2016 and then estimates the case fatality rates in previous years. The first step is to download and install the required packages, pacman is used to manage this process.

if (!require(pacman)) install.packages("pacman"); library(pacman)
p_load("getTBinR")
p_load("tidyverse")
p_load("viridis")
p_load("hrbrthemes")
p_load("scales")
p_load_gh("thomasp85/patchwork", dependencies = TRUE)

Now we use getTBinR to download the TB burden data and its accompanying data dictionary.

tb_burden <- get_tb_burden(verbose = FALSE)
dict <- get_data_dict(verbose = FALSE)

tb_burden
## # A tibble: 3,651 x 71
##    country  iso2  iso3  iso_numeric g_whoregion  year e_pop_num e_inc_100k
##    <chr>    <chr> <chr>       <int> <chr>       <int>     <int>      <dbl>
##  1 Afghani… AF    AFG             4 Eastern Me…  2000  20093756       190.
##  2 Afghani… AF    AFG             4 Eastern Me…  2001  20966463       189.
##  3 Afghani… AF    AFG             4 Eastern Me…  2002  21979923       189.
##  4 Afghani… AF    AFG             4 Eastern Me…  2003  23064851       189.
##  5 Afghani… AF    AFG             4 Eastern Me…  2004  24118979       189.
##  6 Afghani… AF    AFG             4 Eastern Me…  2005  25070798       189.
##  7 Afghani… AF    AFG             4 Eastern Me…  2006  25893450       189.
##  8 Afghani… AF    AFG             4 Eastern Me…  2007  26616792       189.
##  9 Afghani… AF    AFG             4 Eastern Me…  2008  27294031       189.
## 10 Afghani… AF    AFG             4 Eastern Me…  2009  28004331       189.
## # ... with 3,641 more rows, and 63 more variables: e_inc_100k_lo <dbl>,
## #   e_inc_100k_hi <dbl>, e_inc_num <int>, e_inc_num_lo <int>,
## #   e_inc_num_hi <int>, e_inc_num_f014 <int>, e_inc_num_f014_lo <int>,
## #   e_inc_num_f014_hi <int>, e_inc_num_f15plus <int>,
## #   e_inc_num_f15plus_lo <int>, e_inc_num_f15plus_hi <int>,
## #   e_inc_num_f <int>, e_inc_num_f_lo <int>, e_inc_num_f_hi <int>,
## #   e_inc_num_m014 <int>, e_inc_num_m014_lo <int>,
## #   e_inc_num_m014_hi <int>, e_inc_num_m15plus <int>,
## #   e_inc_num_m15plus_lo <int>, e_inc_num_m15plus_hi <int>,
## #   e_inc_num_m <int>, e_inc_num_m_lo <int>, e_inc_num_m_hi <int>,
## #   e_inc_num_014 <int>, e_inc_num_014_lo <int>, e_inc_num_014_hi <int>,
## #   e_inc_num_15plus <int>, e_inc_num_15plus_lo <int>,
## #   e_inc_num_15plus_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
## #   e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>,
## #   e_inc_tbhiv_100k_lo <dbl>, e_inc_tbhiv_100k_hi <dbl>,
## #   e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
## #   e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
## #   e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
## #   e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
## #   e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
## #   e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
## #   e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
## #   e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
## #   e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
## #   e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>,
## #   c_newinc_100k <dbl>, c_cdr <dbl>, c_cdr_lo <dbl>, c_cdr_hi <dbl>
dict
## # A tibble: 402 x 4
##    variable_name            dataset              code_list   definition   
##    <chr>                    <chr>                <chr>       <chr>        
##  1 bmu                      Community engagement ""          Number of TB…
##  2 bmu_community_impl       Community engagement ""          Number of TB…
##  3 bmu_ref_data             Community engagement ""          Number of Ba…
##  4 bmu_rxsupport_data       Community engagement ""          Number of Ba…
##  5 bmu_rxsupport_data_coh   Community engagement ""          Total number…
##  6 community_data_available Community engagement 0=No; 1=Yes Are data ava…
##  7 notified_ref             Community engagement ""          Total number…
##  8 notified_ref_community   Community engagement ""          Total number…
##  9 rxsupport_community_coh  Community engagement ""          Total number…
## 10 rxsupport_community_succ Community engagement ""          Total number…
## # ... with 392 more rows

We want to explore the case fatality rate (cfr), lets first look at the WHO definition.

search_data_dict("cfr", verbose = FALSE) %>% 
  knitr::kable()
variable_name dataset code_list definition
cfr Estimates Estimated TB case fatality ratio

Maps can be a useful first visualisation for summarising spatial data, although they can also be misleading for more complex comparisons. Lets look at global TB incidence rates (per 100,000 population) in 2016, and the WHO estimate of the case fatality ratio in the same year.

# Map TB incidence rates - 2016
mp1 <- map_tb_burden(year = 2016, verbose = FALSE) +
  scale_fill_viridis(option = "cividis", direction = -1) +
  labs(title = "Map of Tuberculosis Incidence Rates - 2016",
       subtitle = "Incidence rates are per 100,000 population")

mp1 +
  labs(caption = "@seabbs Source: WHO")

# Map TB case fatality ratio -2016
mp2 <- map_tb_burden(metric = "cfr", year = 2016, verbose= FALSE) +
  scale_fill_viridis(option = "cividis", direction = -1) +
  labs(title = "Map of Tuberculosis Case Fatality Ratio - 2016",
       subtitle = "Case fatality rate estimated by the WHO")

mp2 +
  labs(caption = "@seabbs Source: WHO")

Whilst quantitative insights are hard to glean from the above maps we can see that incidence rates appear to be highest in Africa and Asia. It also looks like the case fatality ratios are highest in these regions as well.

The WHO have only provided estimates for the case fatality ratio for 2016, estimating the case fatality ratio for other years will allow trends over time to be explored. We do this by first summarising TB incidence rates, and mortality rates by region. This allows insights into the general global picture to be more easily extracted. The first step is to write a function to summarise rates on regional levels, we do this using dplyr::summarise_at.

## Make function to summarise rates in a given region
rate_region <- function(df = NULL, metric = NULL) {
  
  metric_vars <- c(metric, paste0(metric, "_lo"), paste0(metric, "_hi"))
  
  df <- df %>% 
    group_by(year, g_whoregion) %>% 
    summarise_at(.vars = c(metric_vars, "e_pop_num"),
                 .funs = funs(sum(as.numeric(.), na.rm = T))) %>% 
    ungroup() %>% 
    mutate_at(.vars = metric_vars,
              .funs = funs(. / e_pop_num * 1e5))
}

We then write a reusable plotting function to visualise this data (making use of aes_string).

## Plotting function for rates
plot_rate_region <- function(df = NULL, metric = NULL, title = NULL, subtitle = NULL, 
                             y_lab = NULL, scales = NULL) {
  
  metric_vars <- c(metric, paste0(metric, "_lo"), paste0(metric, "_hi"))
  
  df %>% 
    ggplot(aes_string(x = "year", y = metric, col = "g_whoregion", 
                      fill = "g_whoregion")) +
    geom_point(size = 1.3) +
    geom_ribbon(aes_string(ymin = metric_vars[2], ymax = metric_vars[3]), alpha = 0.3) +
    geom_line(size = 1.1) +
    scale_y_continuous(labels = comma) +
    scale_colour_viridis(discrete = TRUE, option = "E") +
    scale_fill_viridis(discrete = TRUE, option = "E") +
    labs(title = title, subtitle = subtitle, 
         x = "Year", y = y_lab) +
    theme_ipsum() +
    theme(legend.position = "none") +
    facet_wrap(~g_whoregion, scales = scales)
}

We can now quickly plot TB incidence rates by region, as well as mortality rates by region.

plot_inc_region <- tb_burden %>% 
  rate_region(metric = "e_inc_num") %>% 
  plot_rate_region(metric = "e_inc_num",
                   title = "Tuberculosis Incidence Rates",
                   subtitle = "By WHO region: 2000 to 2016",
                   scales = "free_y",
                   y_lab = "Tuberculosis Incidence Rates (per 100,000 population)")

plot_inc_region +
  labs(caption = "@seabbs Source: WHO")

plot_mort_region <- tb_burden %>% 
  rate_region(metric = "e_mort_num") %>% 
  plot_rate_region(metric = "e_mort_num",
                   title = "Tuberculosis Mortality Rates",
                   subtitle = "By WHO region: 2000 to 2016",
                   scales = "free_y",
                   y_lab = "Tuberculosis Mortality Rates (per 100,000 population)")

plot_mort_region +
  labs(caption = "@seabbs Source: WHO")

This gives an encouraging picture with both incidence rates and mortality rates declining in all regions. As we observed from the maps above incidence rates (and mortality rates) are highest in Africa, and South-East Asia. We also see that the rate of decline in incidence rates varies across regions, with South-East Asia and the Eastern Mediterranean experiencing the slowest decreases. Interestingly it also appears that mortality rates in some areas are decreasing more quickly than incidence rates, this is most notable for Europe. As mortality rates and incidence rates are highly correlated to explore in more depth we need to estimate the case fatality ratio for all years in the data.

To validate our estimates we first extract the regional estimates for the case fatality ratio based on the WHO estimates, by taking the mean (and standard deviation) of case fatality ratios by country in each region. This gives an estimate for case fatality rates at the regional level as well as the between country variation.

## Summarise Case fatality rate by region - only availble for 2016
region_case_fat <- tb_burden %>% 
  filter(year %in% 2016) %>% 
  group_by(year, g_whoregion) %>% 
  summarise(mean = mean(cfr, na.rm = TRUE),
            sd = sd(cfr, na.rm = TRUE)) %>% 
  mutate(ll = mean - sd,
         lll = mean - 2*sd,
         hh = mean + sd,
         hhh = mean + 2 * sd)

region_case_fat
## # A tibble: 6 x 8
## # Groups:   year [1]
##    year g_whoregion             mean     sd       ll      lll    hh   hhh
##   <int> <chr>                  <dbl>  <dbl>    <dbl>    <dbl> <dbl> <dbl>
## 1  2016 Africa                0.238  0.0899  0.148    0.0581  0.328 0.418
## 2  2016 Americas              0.132  0.133  -0.00116 -0.134   0.265 0.397
## 3  2016 Eastern Mediterranean 0.131  0.129   0.00173 -0.127   0.260 0.389
## 4  2016 Europe                0.0837 0.0458  0.0379  -0.00792 0.130 0.175
## 5  2016 South-East Asia       0.135  0.0439  0.0916   0.0477  0.179 0.223
## 6  2016 Western Pacific       0.0885 0.0483  0.0402  -0.00810 0.137 0.185

We then estimate the regional case fatality ratio by taking the regional mortality rate and dividing it by the regional incidence rate for each year. We also do this for the high and low mortality rate estimates (using the lowest estimate for TB mortality and the highest estimate for incidence rates to get the lower bound, and vice versa for the high bound). This gives,

region_case_fatality <- tb_burden %>% 
  rate_region(metric = "e_inc_num") %>% 
  left_join(tb_burden %>% 
              rate_region(metric = "e_mort_num")) %>% 
  mutate(case_fat_rate = e_mort_num / e_inc_num,
         case_fat_rate_lo = e_mort_num_lo / e_inc_num_hi,
         case_fat_rate_hi = e_mort_num_hi / e_inc_num_lo)
## Joining, by = c("year", "g_whoregion", "e_pop_num")
region_case_fatality
## # A tibble: 102 x 12
##     year g_whoregion        e_inc_num e_inc_num_lo e_inc_num_hi  e_pop_num
##    <int> <chr>                  <dbl>        <dbl>        <dbl>      <dbl>
##  1  2000 Africa                 332.         210.         481.      6.59e8
##  2  2000 Americas                36.5         28.6         45.6     8.38e8
##  3  2000 Eastern Mediterra…     127.          87.0        175.      4.81e8
##  4  2000 Europe                  52.3         38.1         68.8     8.69e8
##  5  2000 South-East Asia        305.         175.         473.      1.57e9
##  6  2000 Western Pacific        131.          91.9        178.      1.70e9
##  7  2001 Africa                 337.         215.         487.      6.76e8
##  8  2001 Americas                35.0         27.4         43.6     8.48e8
##  9  2001 Eastern Mediterra…     126.          86.2        174.      4.92e8
## 10  2001 Europe                  51.9         37.7         68.3     8.70e8
## # ... with 92 more rows, and 6 more variables: e_mort_num <dbl>,
## #   e_mort_num_lo <dbl>, e_mort_num_hi <dbl>, case_fat_rate <dbl>,
## #   case_fat_rate_lo <dbl>, case_fat_rate_hi <dbl>

We can then plot this using the plot_rate_region function outlined above, combined with addition layers to compare our estimate to that produced by the WHO.

plot_region_case_fatality <- region_case_fatality %>%
  plot_rate_region(metric = "case_fat_rate",
                   title = "Tuberculosis Case Fatality Rate",
                   subtitle = "By WHO region: 2000 to 2016",
                   scales = "free_y",
                   y_lab = "Estimated TB Case Fatality Ratio") +
  labs(caption = "Case fatality ratio estimated by taking the ratio of TB mortality rates and TB incidence rates each year in all years. For 2016 
       the mean regional case fatality ratio estimated by the WHO is also shown (along with one and two standard deviations)") +
  geom_point(data = region_case_fat, aes(y = mean, x = year, fill = g_whoregion), shape = 2, size = 1.3, col = "black") +
  geom_linerange(data = region_case_fat, aes(ymin = ll, ymax = hh, y = NULL), alpha = 0.4, size = 1.2, col = "black") +
  geom_linerange(data = region_case_fat, aes(ymin = lll, ymax = hhh, y = NULL), alpha = 0.2, size = 1.2, col = "black")
  
plot_region_case_fatality +
  labs(caption = "@seabbs Source: WHO")

We see that the estimate using mortality rates / incidence rates compares well to the estimate published for 2016 by the WHO, although there is substantial within country variation. For all regions our estimate has substantial uncertainty. The plot suggests that the TB case fatality ratio is decreasing over time in all regions, although this reduction appears to be very minimal in some regions (such as the Americas, Africa, and the Eastern Mediterranean). It is likely that this trend does not hold across all countries, but this is beyond the scope of this post (watch this space).

As a first look at the case fatality ratios in individual countries we can quickly pull out the 10 countries with the highest case fatality ratio in 2016. These are,

highest_case_fataltity_countries <- tb_burden %>% 
  filter(year %in% 2016) %>% 
  arrange(desc(cfr)) %>% 
  slice(1:10) %>% 
  pull(country)

highest_case_fataltity_countries
##  [1] "Barbados"                    "United Arab Emirates"       
##  [3] "Anguilla"                    "Lesotho"                    
##  [5] "Ghana"                       "Guinea-Bissau"              
##  [7] "Nigeria"                     "United Republic of Tanzania"
##  [9] "Mozambique"                  "Kenya"

We can then plot the TB case fatality ratio in these countries, as well as the TB incidence rates over time.

## Plot overview of cases fatality ratios
plot_high_cfr <- plot_tb_burden_overview(metric = "cfr", 
                                         countries = highest_case_fataltity_countries,
                                         verbose = FALSE) +
  scale_color_viridis(option = "cividis", direction = -1) +
  theme_ipsum() +
  theme(legend.position = "none") +
  labs(title = "Tuberculosis Case Fatality Ratio - 2016",
       subtitle = "For the countries with the 10 highest ratios")

plot_high_cfr +
  labs(caption = "@seabbs Source: WHO")

plot_inc_high_cfr <- plot_tb_burden_overview(countries = highest_case_fataltity_countries,
                                             verbose = FALSE) +
  theme(legend.position = "none") +
  scale_color_viridis(option = "cividis", direction = -1) +
  theme_ipsum() +
  theme(legend.position = "bottom") +
  labs(title = "Tuberculosis Incidence Rates - 2000:2016",
       subtitle = "In the countries with the 10 highest TB case fatality ratios")

plot_inc_high_cfr +
  labs(caption = "@seabbs Source: WHO")

We see that for some countries on this list (i.e Anguilla, the United Arab Emirates, and Barbados) the high case fatality ratio estimates are likely to be based on a very low sample size and therefore may not be trustworthy. Worryingly however for several countries (Lesotho, Tanzania, Kenya, Mozambique, Guinea-Bissau, and Ghana, and Nigeria) with extremely high case fatality ratios incidence rates are also very high, which will correspond to very high mortality rates. These countries are all in Africa, which as a region also had the highest incidence rates and the highest case fatality ratio. It is clear from these plots that Africa, and these countries in particular should be a particular focus of TB control efforts if TB mortality rates are to be reduced more quickly.

Finally we summarise this post using patchwork, a great new package by Thomas Pederson, which lets you easily combine ggplot plots into a single plot. The resulting storyboard (below) neatly summarises the exploratory plots, and the narrative they provide, that we have made in this post. See here for a full size version.

storyboard <- (mp1 | mp2 ) /
    {
      plot_region_case_fatality + {
        plot_high_cfr +
          plot_inc_high_cfr +
          labs(caption = "@seabbs Source: World Health Organisation") +
          plot_layout(ncol = 1)
      } +
        plot_layout(ncol = 2, widths = c(3, 1))
    } +
  plot_layout(ncol = 1, heights = c(2, 4))

ggsave("../../static/img/getTBinR/storyboard_tb_case_fatality_rate.png",
       storyboard, width = 20, height = 15, dpi = 330)

storyboard

The WHO TB data contains many more variables not mentioned in this post and getTBinR contains additional features, see the site for getTBinR for details. See my gists for code snippets to get you started in exploring the data further. Package feature requests are welcome, preferably by filing a GitHub issue, as are suggestions for additional data sources to include.

Related

comments powered by Disqus