getTBinR 0.6.0 now on CRAN - from 80 variables to more than 450!

getTBinR 0.6.0 is now on CRAN and should be available on a mirror near you shortly! This update includes multiple new Tuberculosis datasets - increasing the available number of variables through getTBinR from 80 to over 450. To help support these new datasets the package now contains a dataframe listing the available datasets and search_data_dict can now also be used to search the data dictionary for variables by dataset. On top of this, this update contains suggested changes by reviewers (@rrrlw and @strengejacke) from JOSS (see here for the review thread). If any package authors are reading this then I cannot recommend the JOSS review process enough!

The full change log is included below (or can be found here). The JOSS paper can be found here. See the bottom of this post for an example of using the package to explore global Tuberculosis incidence rates.

getTBinR 0.6.0

Feature updates

  • Added a new summarise_metric function to the package. This function was previously used internally by the TB report. For a given year it returns the value of a given metric, along with the regional and global rankings. The average change over the last decade is also supplied. Linked to #57.
  • search_data_dict can now be used to search for a dataset by name. All variables in this dataset are then returned.
  • Added a new dataframe, available_datasets, that lists the datasets available to be imported into R using the package. This dataframe also gives a short description of each dataset, details the time-span of the dataset, and whether or not it is downloaded by default. Used by get_tb_burden as a URL source for downloading the datasets. Linked to #58.
  • get_tb_burden can now import additional datasets (listed in available_datasets), clean them, and then link them with the core dataset. This adds over 400 new variables to the package and provides a near complete list of data used in the WHO Tuberculosis global report. Please open an issue if you find an issue with this dataset.

Package updates

  • Jumped to 0.6.0 to signal a major release.
  • Updated earliest supported R version based on travis testing - now 3.3.0.
  • Added the JOSS paper as the preferred citation for getTBinR and also added this information to the README.
  • URL and data save names have been deprecated from all functions and will be removed in a future release. This allows the number of arguments for many functions to be reduced with no loss of functionality (as data is only saved temporally by package functions).
  • search_data_dict has improved messaging and no longer returns an error when nothing is found in the data dictionary. From #65.
  • search_data_dict has expanded testing to account for new dataset searching and for failing to find results. Linked to #60.
  • Dropped usage of dplyr::funs as soft deprecated.
  • Added tests for summarise_metric and added to documentation.
  • Added tests for additional dataset import in get_tb_burden. #58
  • Added available_datasets and new data import to the README and to the getting started vignette.

getTBinR 0.5.8

Package updates

  • Added package information to license file - suggested during review for JOSS submission by @rrrlw.
  • Updated README introduction to better explain package aim - suggested during review for JOSS submission by @strengejacke
  • Improved package DESCRIPTION for CRAN only users - suggested during review for JOSS submission by @rrrlw.
  • Used usethis::use_tidy_description to improve DESCRIPTION formatting.
  • Added development documentation badge to the README + website.
  • Moved to automated pkgdown deployment using travis. Based on this and the dplyr implementation.
  • Expanded travis testing grid based on dplyr implementation.
  • Updated earliest supported R version based on travis testing - now 3.2.0.
  • Used usethis::use_tidy_versions() to set package to dependent on package versions used during development work. Added this to makefile to make automated.
  • Added a git commit step to the Makefile use with `make message=“your commit message”. This will automatically run all build steps that are required and then commit any changes.

Example: Changes in TB incidence rates in 2017

The code below explores a subset of the available data by first estimating global incidence rates and the annual change between years. The country level annual changes in TB incidence rates are then plotted, first globally and then by region. Finally, the trend in incidence rates is explored using country, regional and global level TB incidence rates. See here for a full size version of the storyboard.

This example is an updated version of the example used for the 0.5.5 release blog post. It showcases some of the many improvements that have been made to the package since this release (see here for a comparison of the two examples)

## Get required packages - managed using pacman
if (!require(pacman)) install.packages("pacman"); library(pacman)

##Pull TB data 
tb_burden <- get_tb_burden(verbose = FALSE) 

## Summarise global changes
global_tb <- summarise_tb_burden(compare_to_world = TRUE,
                                 annual_change = TRUE,
                                 stat = "rate",
                                 verbose = FALSE) %>% 
  filter(area == "Global")

## TB in 2017
tb_2017 <- global_tb %>% 
  filter(year == 2017)

## Global annual change
global_annual_change <- ggplot(global_tb, aes(year, e_inc_num)) +
  geom_smooth(se = FALSE, col = "black", size = 1.2, alpha = 0.7) +
  geom_point(size = 1.2, alpha = 0.8, col = "black") +
  scale_y_continuous(label = scales::percent, minor_breaks = NULL, breaks = seq(-0.025, 0, 0.0025)) +
  theme_minimal() +
    y = "Annual Percentage Change",
    x = "Year",
    title = "Global Annual Percentage Change in Tuberculosis Incidence Rates",
    caption = ""

## Remove countries with incidence below 1000 or incidence rates below 10 per 100,000 to reduce noise and cal country level annual change.
countries_with_tb_burden <- tb_burden %>% 
  filter(year == 2017,
         e_inc_100k > 10,
         e_inc_num > 1000) %>% 

tb_annual_change <- summarise_tb_burden(countries = countries_with_tb_burden, 
                                        compare_to_region = FALSE, 
                                        compare_to_world = FALSE,
                                        metric = "e_inc_100k",
                                        annual_change = TRUE,
                                        stat = "mean",
                                        verbose = FALSE) %>% 
  mutate(annual_change = e_inc_100k) %>% 
  left_join(tb_burden %>% 
              select(country, g_whoregion), 
                     by = c("area" = "country")) %>% 

## Function to plot annual change
plot_annual_change <- function(df, strat = NULL, subtitle = NULL, years = 2000:2017) {
  dist <- df %>% 
    filter(year %in% years) %>% 
    rename(Region = g_whoregion) %>% 
    mutate(year = year %>% 
             factor(ordered = TRUE) %>% 
             fct_rev) %>% 
    ggplot(aes_string(x = "annual_change", y = "year", col = strat, fill = strat)) +
    geom_density_ridges(quantile_lines = TRUE, quantiles = 2, alpha = 0.6) +
    scale_color_viridis(discrete = TRUE, end = 0.9) +
    scale_fill_viridis(discrete = TRUE, end = 0.9) +
    geom_vline(xintercept = 0, linetype = 2, alpha = 0.6) +
    scale_x_continuous(labels = scales::percent, breaks = seq(-0.4, 0.4, 0.1),
                       limits = c(-0.4, 0.4), minor_breaks = NULL) +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(x = paste0("Annual Change in ", search_data_dict("e_inc_100k")$definition),
         y = "Year",
         title = "Annual Percentage Change in Tuberculosis Incidence Rates",
         subtitle = subtitle,
         caption = "")

## Overall country level annual change
overall <- plot_annual_change(tb_annual_change, NULL,
                              years = seq(2001, 2017, 2), subtitle = "By Country") 

## Regional country level annual change
region <-  plot_annual_change(tb_annual_change, "Region",
                              subtitle = "By Region", 
                              years = seq(2001, 2017, 2)) + 
  facet_wrap(~Region) +
  labs(caption = "")

## Regional and Global TB incidence rates over time
regional_incidence <- plot_tb_burden_summary(conf = NULL)

## Map global TB incidence rates for 2017 using getTBinR
map <- map_tb_burden(year = c(2005, 2009, 2013, 2017), facet = "year") +
  theme(strip.background = element_blank()) +
  labs(caption = "",
       title = "Tuberculosis Incidence Rates",
       subtitle = "By Country, per 100,000 population")

## Compose storyboard
storyboard <- (map + regional_incidence + plot_layout(widths = c(2, 1))) /
  (region + (global_annual_change /
               overall + labs(caption = "For country level annual percentages change countries with incidence above 1000 and an incidence rate above 10 per 100,000 are shown.
                    The global annual percentage change is shown with a LOESS fit. 
                    By @seabbs | Made with getTBinR | Source: World Health Organisation")) + plot_layout(widths = c(2, 1))) +
  plot_layout(widths = c(1, 1))

## Save storyboard
       storyboard, width = 20, height = 15, dpi = 330)

For other examples of using getTBinR to visualise the WHO TB data see my gists, previous blog posts, and the getTBinR website. To start using getTBinR in your browser see here for a shiny app or here for an Rstudio client hosted by


comments powered by Disqus