Installing Docker on IBM AC922 or IC922 server

4 min readJan 30, 2020

This short article will show you how to install the Docker runtime on an AC922 or IC922 server with NVIDIA GPUs, and ensure that deployed containers can access the GPU resources installed.

Assumptions

We expect system to be running RHEL 7.6-alt (the version for POWER9 based systems), and have base NVIDIA drivers installed - ie you can run `nvidia-smi` and receive no errors. We also assume that the nvidia-persistenced service is enabled and started. Finally we expect the system to be registered to Red Hat Subscription Manager and attached to a valid subscription — this can be a full subscription for RHEL, or an evaluation subscription.
These steps have not been tested on CentOS, but the expectation is that the steps are very similar.

Summary of steps

Enable the Red Hat extras repository
Install Docker runtime
Install the NVIDIA runtime hook for Docker
Test

Installation Steps

Enable the Red Hat Extras repository for POWER9 based systems. This includes the Docker runtime. I would also advise enabling the Optional repository, although it is not required for this installation.
`sudo subscription-manager repos — enable=rhel-7-for-power-9-extras-rpms`
`sudo subscription-manager repos — enable=rhel-7-for-power-9-optional-rpms`
-
To see which repositories are enabled for your system you can run:
`sudo yum repolist`
Install the Docker runtime and all of it’s dependencies. This should pull in version 1.13.1–102 (if it tries to install 1.13.1–108 then be careful as there is a known issue).
`sudo yum install docker`
-
Once installed, you will need to start the docker service, and enable it if you want docker to start on reboot of the system.
`sudo systemctl docker.service start`
`sudo systemctl docker.service enable`
-
In a default installation, docker can only be operated by users with root privileges. To allow other users access to docker commands, you can create a docker group and add users to that. You will need to restart the docker service once you create the docker group.
`sudo groupadd docker`
`sudo systemctl restart docker.service`
`sudo usermod -aG docker <username>`
-
To interact with the Docker runtime you generally use the `docker` command.
Install the NVIDIA runtime hook for the Docker runtime to allow access to GPUs from within deployed containers. This requires adding some extra packages from two NVIDIA repositories. We can download the relevant .repo file from a public location to set up these repositories for this system.
`distribution=$(. /etc/os-release;echo $ID$VERSION_ID)`
`wget https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo`
`sudo mv nvidia-container-runtime.repo /etc/yum.repos.d/`
-
This adds two new repositories to the system — libnvidia-container/ppc64le and nvidia-container-runtime/ppc64le. We can run `sudo yum repolist` to check this works. This will check that we want to install some new gpgkeys for the added repositories.
`sudo yum install nvidia-container-runtime-hook`
-
This will throw an error, as the signature cannot be verified, as NVIDIA have not updated the gpgkeys for their signed packages. This can either be fixed by finding the correct gpgkey for the repository, and installing it locally. Alternatively, we can just turn off checking for those repositories (this is not good security practice, so is not advised, but it does work). To do this, you need to set repo_gpgcheck=0 for each repository in the .repo file.
`sudo nano /etc/yum.repos.d/nvidia-container-runtime.repo`
-
Then you can install the runtime hook and all dependencies from those repositories.
`sudo yum install nvidia-container-runtime-hook`
-
If you have SELinux enabled on the system, you need to set the right contexts for the NVIDIA devices to allow access from within container instances.
`sudo chcon -t container_file_t /dev/nvidia*`
To check that the Docker runtime works and that the NVIDIA runtime hook is operational, you can deploy a CUDA enabled docker image and check it has access to the system GPUs.
`docker run — rm nvidia/cuda-ppc64le nvidia-smi`
-
This will pull the image within the docker.io repository nvidia/cuda-ppc64le with tag latest and will run one instance of that image, running the command nvidia-smi and then exiting and removing the deployment. The output should be the same as running nvidia-smi on the host system.

Troubleshooting

Recently, some people have been seeing some difficulties when using the NVIDIA container runtime hook on their systems, with an error message similar to this when trying to run containers (see Step4 [Test] above):
`Your docker environment is not setup properly to support GPUs`
or
`could not decode OCI spec: json: cannot unmarshal array into Go struct field`

Due to some changes in the NVIDIA repository, the installation process will install the NVIDIA container toolkit, but not the runtime hook that is needed to allow GPU acceleration within the container instances.

The fix requires you to force the installation of the correct packages to allow for the use of GPUs within containers. You will first need to uninstall any libnvidia-container packages that are on your system, and then install the versions that are known to work:

sudo yum remove libnvidia-container*
sudo yum install libnvidia-container1–1.0.0
sudo yum install libnvidia-container-tools-1.0.0
sudo yum install — setopt=obsoletes=0 nvidia-container-runtime-hook-1.4.0–2

Restart the docker service:

sudo systemctl restart docker

Once you have the right packages installed, you may need to set the SELinux contexts for the NVIDIA drivers once again, and then you’ve ready to test the docker install.

Installing Docker on IBM AC922 or IC922 server

Assumptions

Summary of steps

Installation Steps

Troubleshooting

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Andrew Laidlaw

No responses yet