Hello guys, i’m suffering this issue a lot lately, here is the issue tracker item:
opened 04:44PM - 17 Jan 25 UTC
bug
### Describe the bug
# Bluefin Bug Report: System instability with recurring GP… U errors and NVMe issues
## System Information
- **OS**: Bluefin 40 (FROM Fedora Silverblue)
- **Kernel**: Linux 6.11.8-200.fc40.x86_64
- **Hardware**: ASUS TUF GAMING X570-PLUS (WI-FI)
- **GPU**: AMD Radeon Vega Series (Picasso/Raven 2)
- **Driver**: xorg-x11-drv-amdgpu-23.0.0-3
- **Memory**: 32GB (29Gi available)
- **Current Version**: gts-40.20250115 (2025-01-15T01:08:05Z)
## Issue Description
System experiences frequent unrecoverable hangs (that lasts minutes until the OS crashes) after running for a while.
IA analysis of the logs suggest that the crashes appear to be related to GPU driver issues and NVMe storage problems.
## Critical Errors
### 1. GPU-related errors
```
amdgpu 0000:0a:00.0: amdgpu: Secure display: Generic Failure
amdgpu 0000:0a:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
```
### 2. NVMe errors
```
nvme nvme0: failed to set APST feature (2)
nvme nvme1: failed to set APST feature (2)
```
### 3. System service errors
```
systemd[3217]: Failed to start app-gnome-gnome\x2dkeyring\x2dssh-3529.scope
systemd[3217]: Failed to start app-gnome-xdg\x2duser\x2ddirs-3556.scope
```
### 4. Display manager errors
```
gdm[1812]: Gdm: on_display_added: assertion 'GDM_IS_REMOTE_DISPLAY (display)' failed
gdm[1812]: Gdm: on_display_removed: assertion 'GDM_IS_REMOTE_DISPLAY (display)' failed
```
## System State
- Memory usage is normal (23GB free, no swap used)
- System load is normal (load average: 0.07, 0.19, 0.23)
- GPU temperature: 36.0°C (normal)
- NVMe temperatures: 33.9°C and 35.9°C (normal)
## Installed Packages
### Layered Packages
```
docker-compose
eza
java-17-openjdk-devel
nodejs
unetbootin
```
### Local Packages
```
appimagelauncher-2.2.0-travis995~0f91801.x86_64
balena-etcher-1.19.21-1.x86_64
```
## Steps to Reproduce
1. Normal system usage after fresh boot
2. System becomes unstable after some time
3. Eventually crashes or hangs
## Additional Notes
- Issues persist across reboots
- GPU errors appear consistently in system logs
- Both NVMe drives show APST errors during boot
- Multiple GNOME-related services fail to start properly
- System is running a recent deployment from January 15, 2025
## Attempted Solutions
- Cleared and Updated the system using ujust related commands
- Ran programmed and frecuente memory freeing commands
- Reporting issue to track the problem and get assistance with resolving the GPU and NVMe-related errors.
_Logs and additional system information available upon request._
### What did you expect to happen?
To not hang.
### Output of `bootc status`
```shell
No staged image present
Current booted state is native ostree
Current rollback state is native ostree
```
### Output of `groups`
```shell
falbertengo wheel docker incus-admin lxd libvirt
```
### Extra information or context
_No response_
TL;DR: My machine UI hangs completely. I clarify UI because, once i was listening music while working when the hang occured and i could still hear music in my headphones until the machine crashed 3 minutes later.
No need to say that this is highly detrimental to my UX
Im a developer and i tend to push my machine quite a bit, when i do, the hang seems to occur faster.
Turn off automatic updates and see if that helps. As far as I have noticed, auto updates occur regardless of idle status. It’s set to go off every 2 or 3 hours I guess and when it’s running, things stutter here and there.