Issue with containers using the nvidia-container toolkit failing after upgrade to 42

Hello all! I have been experiencing an issue with containers using the nvidia-container toolkit, specifically gpu enabled ollama after the Fedora 42 upgrade. After the upgrade, the container now fails to start and hangs in a status of Unknown. The error when starting is:

FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createRuntime hook #0: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted.

I am using nerdctl 2.0.4/containerd in rootless mode. I could potentially use Docker, however, am investigating it as a Docker replacement.

After some Googling, I tried the suggested

sysctl -w kernel.unprivileged_bpf_disabled=0

Which did not fix the issue. I further went so far as to put SELinux into permissive mode which did not work either.

I only got my gpu enabled containers working again by rebooting back into Fedora 41. I require the use of the gpu as I am doing alot of AI projects. Sure I can disable gpu access in the containers, but what is the point. More over, I am not looking forward to the weekly snapshot if it rolls over my current working 41 slot…

Any suggestions or anything that I overlooked would be much appreciated.

I have been using Aurora for the last 3 months and have been really happy with it until this issue came along.

Hello All!

A bit of a self reply as I work through potential resolution scenarios.

The first is that after being properly configured to use the nvidia container toolkit, Docker
running as a root daemon, works without issue. This seems to further indicate that it is a
change in permission configuration issue between Aurora 41 and 42.

nerdctl, running as a rootless daemon to a user level containerd process requesting
gpu services worked in Aurora 41. 42, not so much.

For the record I am interested in using nerdctl as a cli to containerd as it is a proving
ground for other technologies like image signing, encrypted images, on-demand
loading of image layers and others. Podman is fine, but has its own niggles and
nerdctl works as a drop in replacement without having to worry about whether the
image runs a root or not and requires no configuration. Podman does.