getTBinR 0.5.4 now on CRAN - new data, map updates and a new summary function.

getTBinR 0.5.4 is now on CRAN and should be available on a mirror near you shortly! This update includes an additional data set for 2016 containing variables related to drug resistant Tuberculosis, some aesthetic updates to mapping functionality and a new summarise_tb_burden function for summarising TB metrics. Behind the scenes there has been an extensive test overhaul, with vdiffr being used to test images, and several bugs fixes. See below for a full list of changes and some example code exploring the new functionality.

Feature updates

  • Added MDR-TB data for 2016, see here for the dataset. The MDR-TB data is automatically joined to the WHO TB burden data.
  • Aesthetic updates to map_tb_burden.
  • Added new summarise_tb_burden function for summarising metrics across regions, across custom groups and globally.

Package updates

  • Improved data cleaning, converting Inf and NaN values to NA when the data is read in.
  • Added pgknet report.
  • Improved test robustness and scope
  • Added vdiffr to test plots when not on CRAN.
  • Fixed bug for map_tb_burden which was adding duplicate variables which caused map build to fail.

Example: Exploring Global TB Incidence Rates

For a quick example the code below pulls the WHO TB data, summarises it by region using getTBinR and ggridges, maps global TB incidence rates in 2016, and finally plots an overview of incidence rates in the 10 countries with highest TB incidence rates in 2016. The figures generated are then combined into a single storyboard using the pathwork package. See here for a full size version of the storyboard.


## Get required packages - managed using pacman
if (!require(pacman)) install.packages("pacman"); library(pacman)
p_load("getTBinR")
p_load("ggplot2")
p_load("viridis")
p_load("dplyr")
p_load("forcats")
p_load("ggridges")
p_load_gh("thomasp85/patchwork")

## Pull TB data and summarise TB incidence rates by region using the median
tb_sum <- summarise_tb_burden(metric = "e_inc_100k",
                              stat = "median",
                              compare_all_regions = TRUE,
                              samples = 1000)

## Plot the median and IQR for each region
sum <- tb_sum %>% 
  rename(Region = area) %>% 
  ggplot(aes(x = year, y = e_inc_100k, col = Region, fill = Region)) +
  geom_ribbon(alpha = 0.2, aes(ymin = e_inc_100k_lo, ymax = e_inc_100k_hi)) +
  scale_color_viridis(discrete = TRUE) +
  scale_fill_viridis(discrete = TRUE) +
  geom_line(alpha = 0.6, size = 1.2) +
  geom_point(size = 1.3) +
  theme_minimal() +
  facet_wrap(~Region, scales = "free_y") +
  theme(legend.position = "none") +
  labs(y = search_data_dict("e_inc_100k")$definition,
       x = "Year",
       title = "Regional Summary of Tuberculosis Incidence Rates - 2000 to 2016",
       subtitle = "Median country level incidence rates (with 95% interquartile ranges) are shown")

## Get the full TB burden dataset (including MDR TB)
tb <- get_tb_burden()

## Plot the distribution of country level TB incidence rates using ggridges
dist <- tb %>% 
  rename(Region = g_whoregion) %>% 
  mutate(year = year %>% 
           factor(ordered = TRUE) %>% 
           fct_rev) %>% 
  ggplot(aes(x = e_inc_100k, y = year, col = Region, fill = Region)) +
  geom_density_ridges(alpha = 0.6) +
  scale_color_viridis(discrete = TRUE) +
  scale_fill_viridis(discrete = TRUE) +
  theme_minimal() +
  facet_wrap(~Region, scales = "free_x") +
  theme(legend.position = "none") +
  labs(x = search_data_dict("e_inc_100k")$definition,
       y = "Year",
       title = "Distribution of Country Level Tuberculosis Incidence Rates by Region - 2000 to 2016",
       caption = "By @seabbs | Made with getTBinR | Source: World Health Organisation")

## Map global TB incidence rates for 2016 using getTBinR
map <- map_tb_burden() +
  labs(caption = "",
       title = "Map of Tuberculosis Incidence Rates - 2016")

## Extract the top 10 high incidence countries in 2016.
high_inc_countries <- tb %>% 
  filter(year == 2016) %>% 
  arrange(desc(e_inc_100k)) %>% 
  slice(1:10) %>% 
  pull(country)

## Plot an overview of TB incidence rates in 2016.
high_inc_overview <- plot_tb_burden_overview(countries = high_inc_countries) +
  labs(caption = "",
       title = "10 Countries with the Highest Tuberculosis Incidence Rates - 2016") 

## Compose storyboard
storyboard <- (map + high_inc_overview) /
                 (sum | dist) +
                 plot_layout(heights = c(1, 2))

## Save storyboard
ggsave("storyboard.png",
       storyboard, width = 20, height = 15, dpi = 330)

getTBinR 0.5.4 storyboard

For other examples of using getTBinR to visualise the WHO TB data see my gists, previous blog posts, and the getTBinR website.

Related

comments powered by Disqus