Installing NVIDIA drivers on Linux can be a bit tricky, especially when juggling CUDA, CUDNN, and ensuring compatibility. Here’s my firsthand account of how I successfully upgraded my Ubuntu system with the latest NVIDIA drivers, CUDA Toolkit, and CUDNN—all while avoiding common pitfalls!


1. Why Match CUDA and Driver Versions?

Before diving in, I knew it was crucial to align the NVIDIA driver version with the CUDA Toolkit. For my setup, I chose CUDA 12.6.0 paired with driver 560.28.03 (extracted from the CUDA package name). This combination ensures compatibility and optimal performance for my projects. I also downloaded CUDNN v8.9.2 (compatible with CUDA 12.6) from NVIDIA’s archives.

Download Links:

  • CUDA Installer: CUDA 12.6 Archive
    • Purpose: This is the CUDA Toolkit installer. CUDA (Compute Unified Device Architecture) is a parallel computing platform and API model created by NVIDIA.
    • Function: Installing CUDA provides the necessary compiler, libraries, and runtime to develop and run GPU-accelerated applications. It includes tools like nvcc (CUDA compiler), CUDA libraries, and runtime environment.
  • NVIDIA Driver: Extracted from the CUDA filename (NVIDIA-Linux-x86_64-560.28.03.run).
    • Purpose: This is the NVIDIA GPU driver installer for Linux. It contains the proprietary driver required for your system to interface with an NVIDIA GPU.
    • Function: Installing this ensures that your system can properly utilize the GPU for graphics rendering and computational tasks (such as CUDA applications).
  • CUDNN: CUDNN Archive.
    • Purpose: This is cuDNN (CUDA Deep Neural Network) library, which is optimized for deep learning operations.
    • Function: It provides highly optimized GPU implementations for neural network operations such as convolution, pooling, normalization, and activation functions. cuDNN is widely used in deep learning frameworks like TensorFlow and PyTorch.

2. Preparing the System: Stopping the GUI

To avoid conflicts with the running X server, I followed these steps:

  1. Close VNC/GUI Sessions: Ensure no graphical sessions are active.
  2. Stop the Display Manager:
    sudo systemctl stop gdm.service  # Replace 'gdm' with your DM (e.g., lightdm).  
    
  3. Switch to Text Mode (Runlevel 3):
    sudo init 3  
    
    This switches the system to a text-based interface, killing the X server and preventing driver conflicts.

3. Installing the NVIDIA Driver

Step 1: Install the NVIDIA Driver
Navigate to your downloads directory and make the installer executable:

cd ~/Downloads  
chmod +x NVIDIA-Linux-x86_64-560.28.03.run  
sudo ./NVIDIA-Linux-x86_64-560.28.03.run  

The installer automatically detected my existing NVIDIA driver (530) and upgraded it to 560.28.03. Follow the on-screen prompts to complete installation.


4. Installing CUDA Toolkit 12.6

Step 1: Prepare the CUDA Installer

chmod +x cuda_12.6.0_560.28.03_linux.run  

Step 2: Run the Installer

sudo ./cuda_12.6.0_560.28.03_linux.run  

Here’s where I hit a snag! The installer failed when I selected the NVIDIA Kernel Module (nvidia-fs) option. To resolve this:

  1. Reran the installer without checking that option.
  2. Selected only essential components (Toolkit, Samples, etc.).

Troubleshooting Tip: If you encounter errors, try:

sudo apt-get update && sudo apt-get install linux-headers-$(uname -r)  

to ensure kernel headers are installed.


5. Installing CUDNN

Step 1: Extract and Copy Files

tar -xzvf cudnn-linux-x86_64-8.9.2.26_cuda12-archive.tar.xz -C /tmp  
sudo cp /tmp/cuda/include/cudnn* /usr/local/cuda-12.6/include/  
sudo cp /tmp/cuda/lib/libcudnn* /usr/local/cuda-12.6/lib64/  

Step 2: Set Environment Variables
Add these lines to your ~/.bashrc or ~/.zshrc:

export PATH=/usr/local/cuda-12.6/bin:$PATH  
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH  

Reload the shell:

source ~/.bashrc  # or ~/.zshrc  

6. Final Steps: Reboot and Verify

Step 1: Reboot Safely
Before rebooting, ensure no virtual machines or critical processes are running:

sudo systemctl stop libvirtd  # If using KVM  
sudo reboot  

Step 2: Test the Setup
After rebooting, verify driver and CUDA versions:

nvidia-smi  # Check driver version and GPU status  
nvcc --version  # Confirm CUDA Toolkit version  

7. What I Learned

  • Dependency Management: Install kernel headers and dependencies first to avoid surprises.
  • Runlevel 3: Switching to text mode is critical for smooth driver installation.
  • CUDNN Compatibility: Always match CUDNN with your CUDA version.
  • Avoid Fancy Options: Stick to default installer options unless troubleshooting!

Final Thoughts

While NVIDIA driver installations can be intimidating, breaking down the steps and ensuring version compatibility made this process manageable. If you encounter errors, check NVIDIA’s Installation Guide or the logs in /var/log/nvidia-installer.log. Happy computing!

Your Turn: Did you face similar challenges? Share your experiences in the comments!


Hope this guide helps you navigate the world of NVIDIA drivers on Linux! 🚀