ubuntu 22.04离线安装cuda 11.7.1、cudnn 8.9.3.28、nccl 2.18.3、tensorrt 8.6.1

发布时间 2023-07-18 11:31:30作者: 至尊王者

最近在使用飞桨OCR,有几个特殊的符号需要进行识别,手上只有两台机器,一台1080TI单卡(windows 11),一台1080Ti双卡(linux 22.04),习惯性追新到飞桨最高支持的cuda11.7,其实1080Ti到cuda10就够用了,后面的新版本差没有明显的性能提升。
windows上无脑安装,linux上安装比较麻烦,记录下安装过程。
cuda、cudnn对nvidia驱动以及内核有依赖关系,cuda 11.7最低驱动版本是450.80,详细请看https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#cudnn-versions-linux

注意:使用离线方式进行安装,要注册Nvidia的开发者账号才能下载相应的安装包。

  • 清理之前残留的nvidia驱动

    sudo apt autoremove -y nvidia* --purge
    sudo rm /etc/apt/sources.list.d/cuda*
    sudo apt-get autoremove && sudo apt-get autoclean
    sudo rm -rf /usr/local/cuda*
    
  • 更新显卡驱动

    ubuntu-drivers devices
    sudo ubuntu-drivers autoinstall
    sudo apt install -y nvidia-driver-525
    sudo reboot
    

    重启后使用nvidia-smi检测驱动安装是否正确

  • 安装 cuda 11.7.1: https://developer.nvidia.com/cuda-toolkit-archive https://developer.nvidia.com/cuda-11-7-1-download-archive

    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
    sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
    sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt update
    sudo apt -y install cuda-11-7
    
  • 安装 cudnn 8.9.3 for cuda 11: https://developer.nvidia.com/rdp/cudnn-download

    wget https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.3/local_installers/11.x/cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb/
    sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.3.28_1.0-1_amd64.deb
    sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.3.28/cudnn-local-7F7A158C-keyring.gpg /usr/share/keyrings/
    sudo apt update
    sudo apt -y install libcudnn8=8.9.3.28-1+cuda11.8 libcudnn8-dev=8.9.3.28-1+cuda11.8
    
  • 安装 nccl 2.18.3 for cuda 11: https://developer.nvidia.com/nccl/nccl-download

    wget https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.18.3/agnostic/x64/nccl_2.18.3-1+cuda11.0_x86_64.txz/
    tar xvf nccl_2.18.3-1+cuda11.0_x86_64.txz
    sudo mv nccl_2.18.3-1+cuda11.0_x86_64 /usr/local/nccl_2.18.3
    
  • 安装 tensorRT 8.6.1 for cuda 11: https://developer.nvidia.com/nvidia-tensorrt-8x-download

    wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.6.1/local_repos/nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-11.8_1.0-1_amd64.deb
    sudo dpkg -i nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-11.8_1.0-1_amd64.deb
    sudo cp $ ls /var/nv-tensorrt-local-repo-ubuntu2204-8.6.1-cuda-11.8/nv-tensorrt-local-0628887B-keyring.gpg /usr/share/keyrings/
    sudo apt update
    sudo apt -y install tensorrt=8.6.1.6-1+cuda11.8
    
  • 添加路径到环境变量或者.bashrc

    export PATH=/usr/local/cuda-11.7/bin:~/.local/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:/usr/local/nccl_2.18.3/lib:$LD_LIBRARY_PATH
    

    使用nvcc --version检测cuda版本