I regularly use cloud resources (AWS and GCP) both in my day job and for personal projects but recently I have been finding that having to spin up a cloud instance for quick analysis can be tedious, even when making use of tools for reproducibility like docker. This is particularly the case for self-learning when spending money on cloud resources feels wasteful, especially when I have half an eye on something else (i.e the TV). Often I find that this leads me to not look at the things that I am interested in, although this could also just be my own lack of motivation!
It seemed sensible to finally bite the bullet and build a local workstation that could handle all the compute tasks I can throw at it (up to a point obviously). At the moment I am looking to further explore P-MCMC methods, Deep learning and get involved in a few kaggle competitions. These needs inform my parts choices below.
- A high core number for compute intensive CPU tasks that can easily be parallel enabled such as Sequential Monte Carlo and distributed machine learning (
- Good single core CPU performance as many tasks are natively single core (i.e R).
- Enough RAM to support the CPU cores but this is an area that I could economise as more RAM can be added at a later date. Typically 2Gbs per core is the minimum.
- A GPU with good support and sufficient power to do meaningful deep learning. As of my current reading this essentially means a Nvidia GPU.
- A fast hard drive but a large one is not needed as only the data currently used in analysis needs to be stored (on top of the OS and applications).
Knowing nothing about building PCs, or PC components, I started using a suggested build on pcpartpicker for streaming and editing. This was a good match as editing/streaming require a larger core number than other use cases, whilst still needing a strong GPU. From here, I went through each part in turn and reviewed the alternatives. The main resources that I used for PC part reviews and thoughts were: the thewirecutter, anandtech and tomshardware. From this reading, I settled on a Threadripper CPU from the previous generation as this provided 16 cores, relatively good single core performance and some scope for further overclocking. Another viable option was the current generation Threadripper, which has improved speed and better automatic overclocking but ultimately I decided that the cost/benefit didn’t make sense. The choice of CPU dictated the motherboard choice, which I selected based on the parametric filters pcpartpicker provides (a real lifesaver). As the Threadrippers are very heat intensive I went with a water cooler for the CPU (Note: I probably should have chosen a bigger radiator here for more cooling), again selected based on the parametric filters + reviews. I went with 32GB of 3200 Mhz RAM as the minimum required for this many CPU cores, with a good balance of speed and price. I chose the Nvidia RTX 2070 Windforce for GPU after looking at some benchmarks and because the 2080 was dramatically more expensive. The Windforce also comes with a slightly higher clock speed than other entry level 2070s along with 3 fans for additional cooling. Finally, I chose the best reviewed M2 SSD that I could find as the read/write speeds are dramatically faster than traditional SSDs, going with a smaller disk size (500 Gb) to minimise the cost. The final parts list is here.
I was slightly nervous about the build process but after reading the manuals for all of the parts (quite dull - the motherboard and case manuals turned out to be the most helpful) and watching multiple youtube videos going through the PC building process (there are lots of great channels providing great resources) it went without a hitch. In brief the build process boiled down to the following.
- Installing the CPU into the motherboard
- Installing the RAM into the motherboard
- Installing the SSD into the motherboard (as using an M.2 SSD).
- Installing the motherboard into the case
- Adding the CPU cooler to the case
- Plugging the fans into the correct motherboard sockets
- Plugging the case ports into the motherboard
- Installing the GPU into the motherboard
- Installing the power supply and linking up the power connections
This all sounds very simple and, amazingly, it really was. After going through this process over a few hours (most of the time was spent looking for screws and getting confused over which instruction manual to use), all the parts were installed and everything powered up correctly (having RGB was very reassuring here).
Historically, I have been a Mac user but recently I have been using cloud resources and docker more and more. Both of these use Linux so this seemed like a sensible choice. Not knowing much about the various distros available I wanted to go with one based on Debian, as these use the same package libraries etc. that I have been using on AWS and elsewhere. I settled on Ubuntu Budgie, which is a theme for standard Ubuntu (one of the most commonly used distros). Budgie provides a modern, lightweight interface, that overlays the core Ubuntu distro. I followed a guide for creating an OS USB boot and then followed the installation instructions after rebooting the workstation (Note: During installation I chose to encrypt the boot disk. This means that the computer requires a password before booting - meaning that it can’t be remotely rebooted. I would definitely not choose to do this in the future.)
After completing the Ubuntu set up and installing the usual suspects (i.e Dropbox etc.), my first requirement was something to reproduce the functionality of alfred (an amazing Mac only spotlight replacement that I completely rely on for navigating my computer). After a brief search, I came across albert a great open source Linux alternative. The next step was to get my most commonly used Linux command line tools, namely
htop (to monitor CPU usage) and
tree (to explore the file system). These were installed in the terminal with the following:
sudo apt-get update sudo apt-get install htop tree
To check that everything was working correctly in the build, I needed to be able to monitor CPU temperatures, both ideal and under load. I chose to do this using
s-tui - a great tool that monitors CPU usage, temperatures, power consumption and speed. It provides an interface to the
stress package, allowing load scenarios to be simulated. I also needed to be able to check that the GPU was working as expected. To do this, I used the following command in the terminal:
watch -n 5 nvidia-smi - this calls
nvidia-smi every 5 seconds.
nvidia-smi comes prepackaged with the Nvidia drivers (you may need to update/switch your drivers - I used this post) and allows monitoring of Nvidia GPUs (much like
top for CPUs).
As this workstation is mostly going to be running analyses via rocker-based docker containers, the next key step is to download and install Docker CE. See the Docker documentation for more details on using docker (well worth the time as Docker is the one of the best tools for ensuring that analysis is reproducible). If everything is installed and working correctly, the following should lead to an instance of an Rstudio server at
localhost:8787 with the username and password of
seabbs (you may need to use
docker run -d -p 8787:8787 -e USER=seabbs -e PASSWORD=seabbs --name rstudio rocker/tidyverse
From here, you should have everything you need to do Rstats using your new workstation! The final software that is required is
nvidia-docker, this provides a docker wrapper that allows docker containers to access the Nvidia GPU. After installing, check its working using the following:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Overclocking and stress tests
As the 1950X Threadripper has a relatively low clock speed of 3.4 Mhz, there is general agreement online that with proper cooling there is some good overhead for overclocking. Depending on the use case, this may or may not be worth doing as for mostly low core work flows the turbo-boosting (not available when overclocked) will give improved performance over most overclocks. In my case, I am most interested in optimizing multicore use and am willing to sacrifice a small boost in single core performance (of around 5%) to achieve this.
In order to check that an overclock is both stable and does not increase the temperature of the system to dangerous levels, it is important to stress test both the CPU and GPU over an extended time period (I initially started with an hour and then extended this to 6 hours for the final test). To do this I used
s-tui to both monitor and stress the CPU,
ethminer to stress the GPU by mining Ethereum and
nvidia-smi to monitor the GPU temperature. To overclock the system, I made adjustments in the BIOS, starting with the RAM (to get it running at its maximum specifications, which is not the default), and then moving to increasing the CPU speed by 100 Mhz intervals each time. Following this process, I got to a speed of 3.8 Mhz (see the picture above for the stress test at this level). Going higher than this required an increase in voltage for stability and this lead to dramatic temperature increases (beyond 70C). A possible option is turning off simultaneous multithreading (i.e going from 32 virtual cores to 16 real cores), this would reduce power consumption and allow for a higher overclock at the cost of reduced potential parallelisation.
Static IP and Remote Access
This workstation will primarily be used headlessly from my Macbook Pro so the next important step is to set up remote access. I generally like to connect over SSH but sometimes it is very useful to have an GUI interface to work with. To allow this I installed a VNC server (RealVNC VNC Server). The nice thing about this is that it works out of the box both on the local network and externally. For the SSH setup, I hardened my configuration using this post (I also changed my SSH port from 22 to something else to limit automatic detection). After following these steps, the workstation now needs the public keys of any computers that you want to connect to it using SSH - some tips for this can be found here. To make connections to the workstation consistent it needs a static IP. I did this using the Ubuntu interface rather than using my router as I usually would and it was so easy that I will definitely favour this option in the future. The next step is to allow SSH connection from your computer when your not at home. To do this you need to open a port in your router and link it to the SSH port of the workstation (see the instructions for your router). If, like me, you don’t have a static IP address then you need to find a workaround. I used noip to provide an address to SSH to rather than an IP. Finally, I like to add aliases to avoid having to write out the IP each time I want to SSH. Below I have an example setup (this needs to be copied into the
alias archie_internal='ssh -p 2345 [email protected]' alias archie='ssh -p 2345 [email protected]' alias archie_with_ports='ssh -p 2345 -l localhost:8787:localhost:8787 [email protected]'
Test your connection from the connecting computer using the following:
You should be able to control the workstation in the terminal and see Rstudio at
localhost:8787 (This post was written remotely using this approach and the
seabbs/seabbs.github.io docker container).
Whilst I now have a working workstation there are still some things that need sorting out. The most important of these are:
- Work out how to use GPUs in arbitrary docker containers without first installing cuda etc.
Suggestions for how best to use @nvidia GPUs in rocker #rstats #docker containers. Ideally looking for a solution using nvidia docker compose rather than manually installing cuda into each container? Does this exist or are there good alternatives?— Sam Abbott (@seabbs) December 20, 2018
Explore whether simultaneous multi-threading (hyper-threading) leads to performance improvements in common Rstats work loads. For Intel CPUs, my experience is that only using real cores leads to better performance for intensive compute tasks. If this is the case here then this will allow for greater overclocking headroom.
Things to improve next time
This was a huge learning experience (which I really enjoyed). My main takeaways so far have been:
Think hard about parts trade-off - for example, I went with the last generation Threadripper as a cost saving initiative. In retrospect, the latest generation might have been worth the money because the new automatic overclocking features mean that significant improves are likely for most workloads.
Look carefully at cooling options - for example, a larger radiator would give more head room from CPU overclocking.
If you got through this post then thanks for reading! Hopefully it gave some insights into how to approach building out a workstation for Rstats. I am definitely not an expert so any thoughts would be very welcome. As mentioned above, I am particularly interested in performance comparisons between GPUs and CPUs (both real and virtual cores) and so would welcome any insights into this aspect of things.