Running PowerAI in Docker containers on Power Systems

The Docker containers that IBM provide (on Docker Hub) contain everything that you need to get up and running with the PowerAI frameworks — including the CUDA, cuDNN, and NCCL components that are required, as well as the Python environment managed by Anaconda. The only requirements for the base system are:
- the CUDA drivers for the GPUs
- a Docker installation
- the NVIDIA Docker runtime
The drivers can be downloaded and installed from the NVIDIA website, following the standard documentation for the setup of PowerAI.
Docker can be installed from the standard RHEL / Ubuntu repositories, or from the official Docker repositories. As far as I can tell, both options work just fine at the moment.
There is a public repository for the NVIDIA Docker runtime, which you can add to the system before installing the nvidia-runtime-container-hook or nvidia-docker2 (depending on which Docker version you are using) which will also pull in the dependencies.
For RHEL, you will need to add the repo definition to /etc/yum.repos.d/ which is made easier, as there is a pre-built version you can get from https://nvidia.github.io/nvidia-docker/rhel7.6/nvidia-docker.repo.
Simplified code steps:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.reposudo yum install nvidia-docker2 (if using docker-ce)
sudo yum install nvidia-runtime-container-hook (if using docker)
For Ubuntu, you will need to add a file to /etc/apt/sources.list.d/ which you can pull from https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list. From that repository you can pull the nvidia-docker2 package.
Simplified code steps:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update
sudo apt-get install nvidia-docker2
For both options you’ll need to restart the Docker daemon to pick up the changes, ie:
sudo systemctl restart docker (or docker-ce)
Note: your user will need to be a member of the right group to interact with the Docker daemon, so to run commands like docker ps. This group is commonly docker or docker-root, but may change again. It’s worth checking the /etc/group list to see if there’s an obvious option if those don’t work. If in doubt, sudo will work!
sudo usermod -a -G docker <userid>
Once the nvidia-docker runtime is installed and running, then deploying a container based on the PowerAI docker images should work, but you can test with the standard cuda container, running nvidia-smi to show the status of all GPUs:
docker run — rm nvidia/cuda-ppc64le nvidia-smi
Note: SELinux can cause some problems allowing access to the NVIDIA devices and drivers from Containers, causing a few problems. It is possible to set the right contexts for these, but as a troubleshooting step I would suggest trying with SELinux in permissive mode first.
sudo setenforce permissive
If the customer requires SELinux to be enforced, then there are some notes on getting the right contexts in this Blog post.
Once you’re successfully running the PowerAI docker containers, then you will have a complete environment ready to go, with all of the dependencies and Anaconda environment setup ready to go. There are various images, based on the tag used — some include just one framework, while others include all frameworks. There are also python2 (standard) and python3 (-py3) based versions to choose from. From within the container, you can simply run your python scripts, using commands like ‘import tensorflow’ as you would in a native environment.
The NVIDIA Docker Wiki has some useful information on how to do things like limit the GPUs that a container can access, as well as other utilisation tips.