Posts

getTBinR 0.6.0 is now on CRAN and should be available on a mirror near you shortly! This update includes multiple new Tuberculosis datasets - increasing the available number of variables through getTBinR from 80 to over 450. To help support these new datasets the package now contains a dataframe listing the available datasets and search_data_dict can now also be used to search the data dictionary for variables by dataset. On top of this, this update contains suggested changes by reviewers (@rrrlw and @strengejacke) from JOSS (see here for the review thread).

CONTINUE READING

Why? I recently built out a new workstation and have done some benchmarking using xgboost via h2o. In this post I am using the benchmarkme package to get another perspective on performance. Note: The benchmarkme package appears to have some issues when it comes to plotting benchmarks. I ended up having to drop them entirely from this post. Update (2019-02-11): Just checked this issue using rocker/tidyverse:latest and found all benchmarkme functionality is working well.

CONTINUE READING

Why? I recently built out a new workstation to give me some local compute for data science workloads. Now that I have local access to both a CPU with a large number of cores (Threadripper 1950X with 16 cores) and a moderately powerful GPU (Nvidia RTX 2070), I’m interested in knowing when it is best to use CPU vs. GPU for some of the tasks that I commonly do.

CONTINUE READING

getTBinR 0.5.7 is now on CRAN and should be available on a mirror near you shortly! This update mainly focussed on building out new country level Tuberculosis (TB) report functionality but along the way this led to a new summary plotting function that quickly and easily shows TB trends across regions and globally. I also had some fun developing a hexsticker (Tweet at me with something you made using the package to get a physical version - whilst my postage money lasts…), reducing the dependencies with itdepends and pkgnet and dealing with some breaking changes from an uncoming dplyr update (my own fault for missing a function import).

CONTINUE READING

Why? I regularly use cloud resources (AWS and GCP) both in my day job and for personal projects but recently I have been finding that having to spin up a cloud instance for quick analysis can be tedious, even when making use of tools for reproducibility like docker. This is particularly the case for self-learning when spending money on cloud resources feels wasteful, especially when I have half an eye on something else (i.

CONTINUE READING

getTBinR 0.5.5 is now on CRAN and should be available on a mirror near you shortly! This update is mainly about highlighting the availability of TB data for 2017, although some small behind the scenes changes were required to get the code set up going forward for yearly updates. A few more plotting options have been added, along with the corresponding tests (definitely the most exciting news). The full changelog is below along with a short example highlighting some of the changes in the 2017 data.

CONTINUE READING

getTBinR 0.5.4 is now on CRAN and should be available on a mirror near you shortly! This update includes an additional data set for 2016 containing variables related to drug resistant Tuberculosis, some aesthetic updates to mapping functionality and a new summarise_tb_burden function for summarising TB metrics. Behind the scenes there has been an extensive test overhaul, with vdiffr being used to test images, and several bugs fixes. See below for a full list of changes and some example code exploring the new functionality.

CONTINUE READING

Introduction I recently attended the Public Health Research and Science Conference, run by Public Health England (PHE), at the University of Warwick. I was mainly there to present some work that I have been doing (along with my co-authors) estimating the direct effects of the 2005 change in BCG vaccination policy on Tuberculosis (TB) incidence rates (slides) but it was also a great opportunity to see what research is being done within, and partnered with, PHE.

CONTINUE READING

This is a quick post exploring estimates of the case fatality ratio for Tuberculosis (TB) from data published by the World Health Organisation (WHO). It makes use of getTBinR (which is now on CRAN), pacman for package management, hrbrthemes for plot themes, and pathwork for combining multiple plots into a storyboard. For an introduction to using getTBinR to explore the WHO TB data see this post. It is estimated that in 2016 there was more than 10 million cases of active TB, with 1.

CONTINUE READING

In November I attended Epidemics, which is a conference focused on modelling infectious diseases. There was a lot of great work and perhaps most excitingly a lot of work being offered as R packages. I’ve recently begun wrapping all my analytical work in R packages, as it makes producing reproducible research a breeze! Unfortunately all of this work is still making it’s way towards publication and for a variety of reasons can’t be shared until it has passed this hurdle.

CONTINUE READING