Hello all! I have been experiencing an issue with containers using the nvidia-container toolkit, specifically gpu enabled ollama after the Fedora 42 upgrade. After the upgrade, the container now fails to start and hangs in a status of Unknown. The error when starting is:
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createRuntime hook #0: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted.
I am using nerdctl 2.0.4/containerd in rootless mode. I could potentially use Docker, however, am investigating it as a Docker replacement.
After some Googling, I tried the suggested
sysctl -w kernel.unprivileged_bpf_disabled=0
Which did not fix the issue. I further went so far as to put SELinux into permissive mode which did not work either.
I only got my gpu enabled containers working again by rebooting back into Fedora 41. I require the use of the gpu as I am doing alot of AI projects. Sure I can disable gpu access in the containers, but what is the point. More over, I am not looking forward to the weekly snapshot if it rolls over my current working 41 slot…
Any suggestions or anything that I overlooked would be much appreciated.
I have been using Aurora for the last 3 months and have been really happy with it until this issue came along.