data science

Benchmarking an Rstats workstation - using benchmarkme

Why? I recently built out a new workstation and have done some benchmarking using xgboost via h2o. In this post I am using the benchmarkme package to get another perspective on performance. Note: The benchmarkme package appears to have some issues when it comes to plotting benchmarks. I ended up having to drop them entirely from this post. Update (2019-02-11): Just checked this issue using rocker/tidyverse:latest and found all benchmarkme functionality is working well.

Benchmarking an Rstats workstation on realistic workloads - using xgboost via h2o

Why? I recently built out a new workstation to give me some local compute for data science workloads. Now that I have local access to both a CPU with a large number of cores (Threadripper 1950X with 16 cores) and a moderately powerful GPU (Nvidia RTX 2070), I’m interested in knowing when it is best to use CPU vs. GPU for some of the tasks that I commonly do.

Building an Rstats Workstation

Why? I regularly use cloud resources (AWS and GCP) both in my day job and for personal projects but recently I have been finding that having to spin up a cloud instance for quick analysis can be tedious, even when making use of tools for reproducibility like docker. This is particularly the case for self-learning when spending money on cloud resources feels wasteful, especially when I have half an eye on something else (i.