NVIDIA-docker

NVIDIA-docker

This article describes how to install NVIDIA-docker on a Linux VM in two parts.
kudos to Bob van Dijk - Amsterdam UMC
Download and ingress the attachment for easy copy/paste.

Part 1: install and start the latest NVIDIA drivers for your GPU card

  1. Open up a terminal on the CentOS VM
  2. Run as root (sudo su)
  3. Install ls hardware which misses in the standard centOS 7.5 image:
    1. yum install lshw
  4. Identify graphic car model and driver:
    1. lshw -numeric -C display
      Mark the current drive name (let us say it is <drivername>)
      Now you will have the information needed to find the proper driver for your card when using CentOS on the NVIDIA website (nvidia.com).
  5. Download that driver which comes as an installation file named NVIDIA-Linux-x86-… (more precise: download it to your local machine, upload it to the fileshare, and copy it to a disk on your vm)
    To install the driver you will need to compile this download and install it in place of the current driver.
  6. Before that there is some preparation necessary on your VM
    1. yum groupinstall “Development Tools”
    2. yum install kernel-devel epel-release
  7. Make a back-up copy of the existing GRUB file (/etc/default/grub)
  8. Edit the GRUB file and set  the parameter <drivername>.mode to 0
  9. Remove the current driver file (still as root) by running the following 2 commands:
    1. grub2-mkconfig -o /boot/grub2/grub.cfg
    2. grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
  10. Reboot the VM. When restarted the present driver should not appear if you list the hardware (see above)
  11. Stop the X server (still as root):
    1. systemctl isolate multi-user .target
  12. Install the NVIDIA driver (still as root):
    1. bash NVIDIA-Linux-x86-*
      probably there will be some questions that you will need to answer. Almost all defaults can be used. However, do not agree to automatically update your driver! (there is no direct internet connection to nvidia.com, so that would generate errors).
  13. If needed you can now run nvidia-settings and configure your GPU card.
  14. Then a final reboot (to restart Xorg and assure everything will work after the next restart.

Part 2 install nvidia-docker packages

  1. Still as root
    1. yum install -y nvidia-docker2
  2. Sync the docker daemon
    1. pkill -SIGHUP dockerd

      Remarks:
    2. Having no internet connection from the linux vm makes the process a lot more cumbersome. If there are no data yet, you can ask your agent to open up internet for the workspace temporarily.
    3. I have noticed that the nvidia-docker2 is not in the present yum repository.  The workaround:
      1. on a system with internet:
        1. LOCALDIR=/var/lib/nvidia-docker-repo
        2. mkdir -p $LOCALDIR && cd $LOCALDIR
        3. mkdir -p /var/lib/nvidia-docker-repo/libnvidia-container
        4. mkdir -p /var/lib/nvidia-docker-repo/nvidia-container-runtime
        5. mkdir -p /var/lib/nvidia-docker-repo/nvidia-docker
        6. wget https://api.github.com/repos/nvidia/libnvidia-container/tarball/gh-pages -O - | tar -zx --strip-components=1 -C ./libnvidia-container
        7. wget https://api.github.com/repos/nvidia/nvidia-container-runtime/tarball/gh-pages -O - | tar -zx --strip-components=1 -C ./nvidia-container-runtime
        8. wget https://api.github.com/repos/nvidia/libnvidia-container/tarball/gh-pages -O - | tar -zx --strip-components=1 -C ./nvidia-docker

          Change /var/lib/nvidia-docker-repo/nvidia-docker/centos7/nvidia-docker.repo like this:

          [libnvidia-container]
        9. name=libnvidia-container
        10. baseurl=file:///var/lib/nvidia-docker-repo/libnvidia-container/centos7/$basearch
        11. repo_gpgcheck=1
        12. gpgcheck=0
        13. enabled=1
        14. gpgkey=file:///var/lib/nvidia-docker-repo/libnvidia-container/gpgkey
        15. sslverify=1
        16. sslcacert=/etc/pki/tls/certs/ca-bundle.crt

          [nvidia-container-runtime]
        17. name=nvidia-container-runtime
        18. baseurl=file:///var/lib/nvidia-docker-repo/nvidia-container-runtime/centos7/$basearch
        19. repo_gpgcheck=1
        20. gpgcheck=0
        21. enabled=1
        22. gpgkey=file:///var/lib/nvidia-docker-repo/nvidia-container-runtime/gpgkey
        23. sslverify=1
        24. sslcacert=/etc/pki/tls/certs/ca-bundle.crt

          [nvidia-docker]
        25. name=nvidia-docker
        26. baseurl=file:///var/lib/nvidia-docker-repo/nvidia-docker/centos7/$basearch
        27. repo_gpgcheck=1
        28. gpgcheck=0
        29. enabled=1
        30. gpgkey=file:///var/lib/nvidia-docker-repo/nvidia-docker/gpgkey
        31. sslverify=1
        32. sslcacert=/etc/pki/tls/certs/ca-bundle.crt

          The result is a home built docker repo! that you can then upload to the fileshare of your workspace

          As root temporarily move the content (if any)  from yum.repos.d and copy your home built repo to the yum repos:
        33. cp /var/lib/nvidia-docker-repo/nvidia-docker/centos7/nvidia-docker.repo /etc/yum.repos.d/nvidia-docker.repo

          And then yum install it:
        34. yum install -y nvidia-docker2  (see above)

          Restart docker.
          Finally set the old content of yum.repos.d back

    • Related Articles

    • Docker stopped working on Windows VMs - summary of tests and alternative

      First version: 2023-10-30 Last version: - Last change: - This is a community effort. The article was created by the anDREa Support Team in their spare time. If you have found a more efficient way to approach this, please contact us and we will update ...
    • Installing docker on Ubuntu 22

      First version: 2023-11-24 Last version: 2024-01-12 Last change: Small corrections to reflect that domain allowlisting is now a self-service feature. This is a community effort. The article was created through collaboration of the anDREa Support Team ...
    • Install NVIDIA GPU drivers on N-series VMs running Windows

      Introduction Microsoft Azure provides GPU capabilities. However, this requires to install the NVIDIA GPU drivers manually. Microsoft Support documentation NVIDIA Tesla (CUDA) drivers NVIDIA GRID drivers
    • Docker

      Docker is a tool that essentially lets you package a (development) environment with all its settings and dependencies, and move it to another machine, such that whatever you create and develop works the same on every machine you want to run it on. ...
    • Domains to be whitelisted/allowlisted for known applications

      First version: 2022-11-01 Last updated: 2024-04-08 Last change: Added domains for installing Visual Studio Code extensions. Introduction This is a community effort, if you experience issues, see mistakes/updates, or have other applications that you ...