Using the package

First load the package. We also load several other packages to help quickly explore the data.

library(getTBinR)
library(ggplot2)
library(knitr)
library(magrittr)
library(dplyr)

Getting TB burden data

Get TB burden data with a single function call. This will download the data if it has never been accessed and then save a local copy to R’s temporary directory (see tempdir()). If a local copy exists from the current session then this will be loaded instead.

tb_burden <- get_tb_burden()
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
#> Saving data to: /tmp/RtmpnhfhRA/TB_burden.rds
#> Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=mdr_rr_estimates
#> Saving data to: /tmp/RtmpnhfhRA/MDR_TB.rds
#> Joining TB burden data and MDR TB data.

tb_burden
#> # A tibble: 3,850 x 68
#>    country iso2  iso3  iso_numeric g_whoregion  year e_pop_num e_inc_100k
#>    <chr>   <chr> <chr>       <int> <chr>       <int>     <int>      <dbl>
#>  1 Afghan… AF    AFG             4 Eastern Me…  2000  20093756        190
#>  2 Afghan… AF    AFG             4 Eastern Me…  2001  20966463        189
#>  3 Afghan… AF    AFG             4 Eastern Me…  2002  21979923        189
#>  4 Afghan… AF    AFG             4 Eastern Me…  2003  23064851        189
#>  5 Afghan… AF    AFG             4 Eastern Me…  2004  24118979        189
#>  6 Afghan… AF    AFG             4 Eastern Me…  2005  25070798        189
#>  7 Afghan… AF    AFG             4 Eastern Me…  2006  25893450        189
#>  8 Afghan… AF    AFG             4 Eastern Me…  2007  26616792        189
#>  9 Afghan… AF    AFG             4 Eastern Me…  2008  27294031        189
#> 10 Afghan… AF    AFG             4 Eastern Me…  2009  28004331        189
#> # … with 3,840 more rows, and 60 more variables: e_inc_100k_lo <dbl>,
#> #   e_inc_100k_hi <dbl>, e_inc_num <int>, e_inc_num_lo <int>,
#> #   e_inc_num_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
#> #   e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>,
#> #   e_inc_tbhiv_100k_lo <dbl>, e_inc_tbhiv_100k_hi <dbl>,
#> #   e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
#> #   e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
#> #   e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
#> #   e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
#> #   e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
#> #   e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
#> #   e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
#> #   e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
#> #   e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
#> #   e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>,
#> #   c_newinc_100k <dbl>, c_cdr <dbl>, c_cdr_lo <dbl>, c_cdr_hi <dbl>,
#> #   source_rr_new <chr>, source_drs_coverage_new <chr>,
#> #   source_drs_year_new <int>, e_rr_pct_new <dbl>, e_rr_pct_new_lo <dbl>,
#> #   e_rr_pct_new_hi <dbl>, e_mdr_pct_rr_new <int>, source_rr_ret <chr>,
#> #   source_drs_coverage_ret <chr>, source_drs_year_ret <int>,
#> #   e_rr_pct_ret <dbl>, e_rr_pct_ret_lo <dbl>, e_rr_pct_ret_hi <dbl>,
#> #   e_mdr_pct_rr_ret <int>, e_inc_rr_num <int>, e_inc_rr_num_lo <int>,
#> #   e_inc_rr_num_hi <int>, e_mdr_pct_rr <int>,
#> #   e_rr_in_notified_pulm <int>, e_rr_in_notified_pulm_lo <int>,
#> #   e_rr_in_notified_pulm_hi <int>

Searching for variable definitions

The WHO provides a large, detailed, data dictionary for use with the TB burden data. However, searching through this dataset can be tedious. To streamline this process getTBinR provides a search function to find the definition of a single or multiple variables. Again if not previously used this function will download the data dictionary to the temporary directory, but in subsequent uses will load a local copy.

variable_name dataset code_list definition
country Country identification Country or territory name
e_inc_100k Estimates Estimated incidence (all forms) per 100 000 population
e_inc_100k_hi Estimates Estimated incidence (all forms) per 100 000 population, high bound
e_inc_100k_lo Estimates Estimated incidence (all forms) per 100 000 population, low bound

We might also want to search the variable definitions for key phrases, for example mortality.

variable_name dataset code_list definition
e_mort_100k Estimates Estimated mortality of TB cases (all forms) per 100 000 population
e_mort_100k_hi Estimates Estimated mortality of TB cases (all forms) per 100 000 population, high bound
e_mort_100k_lo Estimates Estimated mortality of TB cases (all forms) per 100 000 population, low bound
e_mort_exc_tbhiv_100k Estimates Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population
e_mort_exc_tbhiv_100k_hi Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound
e_mort_exc_tbhiv_100k_lo Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound
e_mort_tbhiv_100k Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population
e_mort_tbhiv_100k_hi Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound
e_mort_tbhiv_100k_lo Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound

Finally we could both search for a known variable and for key phrases in variable definitions.

variable_name dataset code_list definition
country Country identification Country or territory name
e_mort_100k Estimates Estimated mortality of TB cases (all forms) per 100 000 population
e_mort_100k_hi Estimates Estimated mortality of TB cases (all forms) per 100 000 population, high bound
e_mort_100k_lo Estimates Estimated mortality of TB cases (all forms) per 100 000 population, low bound
e_mort_exc_tbhiv_100k Estimates Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population
e_mort_exc_tbhiv_100k_hi Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound
e_mort_exc_tbhiv_100k_lo Estimates Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound
e_mort_tbhiv_100k Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population
e_mort_tbhiv_100k_hi Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound
e_mort_tbhiv_100k_lo Estimates Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound

Mapping Global Incidence Rates

To start exploring the WHO TB data we map, the most recently available, global TB incidence rates. Mapping data can help identify spatial patterns.

Plotting Incidence Rates for All Countries

To showcase how quickly we can go from no data to plotting informative graphs we quickly explore incidence rates for all countries in the WHO data.

Another way to compare incidence rates in countries is to look at the annual percentage change. The plot below only shows countries with a maximum incidence rate above 5 per 100,000.

Plotting Incidence Rates over Time in 9 Randomly Sampled Countries

Diving deeper into the data lets plot a sample of 9 countries using the inbuilt plot_tb_burden function. Again plotting incidence rates, but this time with 95% confidence intervals. As you can see this isn’t a hugely informative graph. Lets improve it!

We have faceted by country so that we can more easily see what is going on. This allows us to easily explore between country variation - depending on the sample there is likely to be alot of this.

To explore within country variation we need to change the scale of the y axis.

We might also be interested in mortality in both HIV negative and HIV positive cases in our sample countries. We can also look at this using plot_tb_burden as follows. Note we can do this without specifying the TB burden data, the plotting function will automatically find it either locally or remotely.