getTBinR 0.7.0 released - more data, {ggplot2} best practices and bug fixes

getTBinR 0.7.0 should now be available on CRAN. This release includes some new experimental data (TB incidence by age and sex) that for now is only partly supported by {getTBinR}. It also brings {getTBinR} into line with new (or new to me) {ggplot2} best practices. This involved two major changes (plans are also afoot for an S3 plot method):

  1. Moving from a full @import of {ggplot2} to only using @importFrom for required functions (something that I had previously been too lazy to do…).

  2. Dropping aes_string, which is now soft depreciated, in favour of aes and rlang::.data for programmatic variables (Note: Yes I could have more fully embraced rlang here but this would have required some major breaking package changes). This was a real joy as aes_string has always been a bit clunky to use and resulted in very messy looking code. A simple pseudo-code example of this can be seen below.

library(ggplot2)

## Variable to plot
y_var <- "disp"

## Old getTBinR approach for programming across variables
ggplot(mtcars, aes_string(x = "cyl", y = y_var)) + 
  geom_point()

## New getTBinR approach using rlang
ggplot(mtcars, aes(x = cyl, y = .data[[y_var]])) + 
  geom_point() +
  ## This is now needed to get the correct label - a small price to pay.
  labs(y = y_var)

Wrapping up this release are several bug fixes for the embedded {shiny} app and {rmarkdown} report. Hopefully, {getTBinR} is now ready to be presented next week at R medicine (see your there)! The full change log is included below (or can be found here). See the bottom of this post for an example of using the new age and sex stratified incidence data.

getTBinR 0.7.0

Feature updates

  • Added experimental support for incidence data stratified by age and sex. Current implementation requires data cleaning before use. See the release post for details.

Package updates

  • Fixed a bug that was preventing render_country_report from producing a country level report. Added tests to flag this in the future.
  • Updated the packages requested for installation by run_tb_dashboard so that render_country_report runs without errors.
  • Switched to using ggplot2 best practices (#77).
  • Updated the README to make identifying types of badges easier.

Example: % of TB incidence in each age group

The code below downloads the new experimental (and in its raw form messy) incidence data stratified by age and sex. As these data are in a different format to the rest of the data in {getTBinR} some cleaning + aggregation is needed to get it into a workable form. From there it is a simple task to use {getTBinR} to produce a global map of the % of TB incidence in each age group. For interest I have also used {ggridges} to look at the regional distribution of TB incidence stratified by age group. Finally everything is pulled together into a single figure using {patchwork}.

## Get required packages - managed using pacman
if (!require(pacman)) install.packages("pacman"); library(pacman)
p_load_gh("getTBinR")
p_load("dplyr")
p_load("tidyr")
p_load("ggplot2")
p_load("forcats")
p_load("ggridges")
p_load("patchwork")

##Pull TB data 
tb_burden <- get_tb_burden(additional_datasets = "Incidence by age and sex",
                           verbose = FALSE) %>% 
  mutate(country = country %>% 
           factor()) %>% 
  mutate(age_group = age_group %>% 
           factor(levels = c("0-4", paste(seq(5, 55, 10), seq(14, 64, 10), sep = "-"), "65plus")) %>% 
         fct_explicit_na()) 

## Aggregate incidence by age group
tb_burden_ag <- tb_burden %>%
filter(age_group != "0-14") %>% 
mutate(age_group = age_group %>% 
         fct_drop()) %>% 
group_by(age_group, country, .drop = FALSE) %>%
summarise_at(.vars = vars(inc_age_sex, inc_age_sex_lo, inc_age_sex_hi), sum) %>%
full_join(
  select(tb_burden, -sex, -age_group, -contains("inc_age_sex")) %>%
    unique
  ) %>% 
ungroup %>% 
  add_count(country, year, wt = inc_age_sex) %>% 
  mutate(inc_age_dist = inc_age_sex / n * 100) %>% 
  filter(!(age_group %in% "(Missing)"))


## Map using getTBinR
map <- map_tb_burden(tb_burden_ag, metric = "inc_age_dist", 
                     metric_label = "% of TB incidence in each age group",
                     facet = c("age_group")) +
  labs(title = "Distribution of Tuberculosis (TB) incidence by age",
       subtitle = "Global map of the % of TB incidence in each age group",
       caption = "") +
  theme(plot.title = element_text(face = "bold", size = 26), text = element_text(size = 20))

## Density plot using ggridges
density <- tb_burden_ag %>% 
drop_na(inc_age_dist) %>% 
mutate(age_group = age_group %>% 
         fct_rev()) %>% 
ggplot(aes(x = inc_age_dist, y = age_group, col = age_group, fill = age_group)) +
  geom_density_ridges(alpha = 0.8) +
  facet_wrap(~g_whoregion) +
  theme_ridges() +
  theme(legend.position = "none", plot.subtitle = element_text(size = 20),
        plot.title = element_text(size = 20), plot.caption = element_text(size = 20),
        text = element_text(size = 20)) +
  labs(title = "Regional differences in the % of TB incidence in each age group",
       subtitle = "+ within country differences for each region",
       x = "Age group", 
       y = "% of TB incidence",
       caption = "By @seabbs | Made with {getTBinR} | Source: World Health Organization") +
  scale_fill_brewer(palette = "Dark2") +
  scale_color_brewer(palette = "Dark2") 

## Patchwork storyboard
storyboard <- (map) /
  (density)

## Save everything
ggsave("../../static/img/getTBinR/storyboard-7-0.png",
       storyboard, width = 32, height = 32, dpi = 330)

For other examples of using getTBinR to visualise the WHO TB data see my gists, previous blog posts, and the getTBinR website. To start using getTBinR in your browser see here for an Rstudio client hosted by mybinder.org.

Related

comments powered by Disqus