I'm trying to get nvidia gpu drivers and related software installed / upgrades on a debian bullseye system and having trouble. I tried following the instructions for installing cuda, but when I get to step 13.2.1 "Install Persistence Daemon", it fails with the error:
nvidia-persistenced failed to initialize. Check syslog for more details.
logfile shows:
Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 0 has read and write permissions for those files.
There are no nvidia files in /dev
/usr/local/ has the following:
$ ls -dl /usr/local/cuda*
lrwxrwxrwx 1 root root 22 Sep 30 20:15 /usr/local/cuda -> /etc/alternatives/cuda
drwxr-xr-x 16 root root 4096 Jun 16 16:35 /usr/local/cuda-11.3
lrwxrwxrwx 1 root root 25 Sep 30 20:15 /usr/local/cuda-12 -> /etc/alternatives/cuda-12
drwxr-xr-x 15 root root 4096 Sep 30 20:15 /usr/local/cuda-12.2
$ ls -dl /etc/alternatives/cuda*
lrwxrwxrwx 1 root root 20 Sep 30 20:15 /etc/alternatives/cuda -> /usr/local/cuda-12.2
lrwxrwxrwx 1 root root 20 Sep 30 20:15 /etc/alternatives/cuda-12 -> /usr/local/cuda-12.2
The gpu appears to be there:
sudo nvidia-smi
Sat Sep 30 21:51:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:00:04.0 Off | 0 |
| N/A 32C P0 49W / 400W | 4MiB / 40960MiB | 26% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
When this GCE system was originally built there was a cuda-11 installation which worked, but I fear I've messed everything up and not sure how to proceed.