My rx 7900 xtx keeps crashing on bazzite

Hi, I recently bought an rx 7900 xtx to replace my old rtx 3070 because it kept crashing as soon as it would load any graphically intensive game. Now the rx 7900 xtx doesn’t crash when I load a graphically intensive game. However, now the crashes happen very sporadically. The card kept over clocking itself without any prompting. I used Lact to limit the overclocking to stock. That seems to have helped a bit but it’s still crashing. I’m currently undervolting it by 50. I ran:

sudo nano /etc/default/grub

And pasted in:

GRUB_CMDLINE_LINUX_DEFAULT=“amdgpu.pcie_gen_cap=0 pcie_aspm=off”

This has not fixed the crashing.

Specs

CPU: Ryzen 7 9700 X

GPU: ASRock Phantom gaming OC RX 7900 XTX

RAM: Corsair Vengeance 64GB Ddr5 5200mt

Motherboard: Gigabyte Aorus Pro x870e

PSU: 850 watt Corsair RM850e

AI seems to think that it could be a PSU issue as it’s not powerful enough to deal with the spikes. I will emphasise that if it is that I would be very surprised as often the crashes happen as soon as I boot into the OS. It could also just be the graphics card. I know the RX 7900 XTX is known for being temperamental and crashing randomly with no one knowing how to resolve it.

Here are the crash logs

Nov 06 15:02:09 bazzite kernel: hub 12-0:1.0: config failed, hub doesn’t have any ports! (err -19)
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:12: Failed to resolve group ‘audio’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:13: Failed to resolve group ‘audio’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:14: Failed to resolve group ‘disk’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:18: Failed to resolve group ‘kvm’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:19: Failed to resolve group ‘kvm’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[395]: /usr/lib/tmpfiles.d/static-nodes-permissions.conf:20: Failed to resolve group ‘kvm’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[476]: /usr/lib/tmpfiles.d/systemd.conf:11: Failed to resolve group ‘utmp’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[476]: /usr/lib/tmpfiles.d/var.conf:15: Failed to resolve group ‘utmp’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[476]: /usr/lib/tmpfiles.d/var.conf:16: Failed to resolve group ‘utmp’: No such process
Nov 06 15:02:09 bazzite systemd-tmpfiles[476]: /usr/lib/tmpfiles.d/var.conf:17: Failed to resolve group ‘utmp’: No such process
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:38 Unknown group ‘tty’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:39 Unknown group ‘tty’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:40 Unknown group ‘tty’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:41 Unknown group ‘tty’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:44 Unknown group ‘kmem’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:46 Unknown group ‘input’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:49 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:50 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:51 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:52 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:53 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:54 Unknown group ‘video’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:56 Unknown group ‘render’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:57 Unknown group ‘render’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:58 Unknown group ‘render’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:60 Unknown group ‘sgx’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:61 Unknown group ‘sgx’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:67 Unknown group ‘audio’, ignoring.
Nov 06 15:02:09 bazzite systemd-udevd[536]: /usr/lib/udev/rules.d/50-udev-default.rules:77 Unknown group ‘audio’, ignoring.

Whenever I do a fresh install the first thing it says as I’m doing the install is the opening error “ config failed, hub doesn’t have any ports! (err -19)” then continues with the rest of the install with no other issues (until the desktop starts crashing). :') Any help would be greatly appreciated :smile:

In what way does it crash? Does the image just freeze?

The screen has 4 blue lines that run equally down each section of the screen. As in one down the left edge and one down the right edge. Then two down the middle equally quad secting the screen. Behind the lines it’s grey. There is also a crash that occurs when I play Clover Pit a and the screen just freezes then can’t be recovered from

I had lines with my 7900XT, and it turned out to be because I had set changes to clock speeds in LACT.

I noticed it by disabling the LACT systemd service (sudo systemctl disable lactd) and rebooting. No lines. I reset the configuration and this time I didn’t touch the clock speeds at all. It’s been working since.

As for the image freezing, this may be this bug. Some people report being able to work around it with the kernel boot argument amdgpu.dcdebugmask=0x10.

My workaround on my EndeavourOS/Arch system is to keep using an old 6.14 kernel. I haven’t given the boot argument a try with more recent kernels.

That’s exactly the same issue. The reason why I had to change the clock speeds was due to the fact that it kept overclocking past the stock clock speeds and thermal throttling. It was doing that before I set the clock speeds to the stock clock speed, but I will try what you suggested as the problem is eirily similar.

So I did exactly what you said and it started overclocking and crashed

I’ve now just stopped the service but I’m not changing the clock speed. Lact seems to think that my GPU clock speed is 500 mhz higher than stock. I’m doing exactly what you said but this time not resetting the clock speed. I will let you know how it goes.

Can’t you just lower the power limit? Or is it at the lowest setting already?

From my experience lowering the power limit didn’t really do anything. The clock speed is the main issue as it decides overclock itself to 3ghz. When the max the card can output is 2.615ghz. If it crashes again I will try what you suggested. When I lowered the clock of the card to stock the card stopped overheating. If I let the card run at the frequency Lact sets I get temps of 110 degrees and thermal throttling. It needs to be set to stock clock speeds. So even if I lower the power output and keep the overclocked speed it will cause a massive heat spot on the junction and memory. Furthermore crashes were happening far more frequently until I clocked it to stock.

Edit: I could drastically lower the wattage I don’t ever see it go over 300 watts.

I see.

The power limit makes a big difference for me. I set up a profile to max it out (333W) when it detects something was run via gamemode, and it gives me an easy FPS boost in games.

Why is your card overheating though? Is that not a case for warranty? edit: Right, overclocking itself.

Did you replace the paste yourself and put too little?

Is it starved of air in the case?

It’s a second hand card that I bought like new from an ebay seller so I’m unsure if the card is still under warranty. They suggested to put thermal pads on the card.

The card seems to think it’s correct clock speed is 3.xxx ghz. When I opened Lact the wattage of the card was set to 327 watts and it was still crashing. It wasn’t until I enabled over clocking that I discovered the card was clocked way too high.

Edit: I am planning on getting more case fans around Christmas so it can cool better. I’m also looking at investing in an AIO for the CPU as it likes to run at around 90 degrees

As soon as I start messing with the clock speeds, I get the grey screens. So if yours is like my system, I think whatever solution you find will have to include not touching those. At all. Like, reset LACT completely if you did to make sure they’re restored to defaults. Merely dragging the slider back to the wanted value was not enough for me for some reason. I had to nuke the settings and start over, but after that it worked and has kept working.

Replacing the thermal paste is not a bad idea. If you have to replace the pads that’s more annoying, since you have to order some of the right thickness.

Otherwise I don’t know what could cause a video card to overclock itself, short of a modified BIOS.

I’ve barely touched the BIOS. The only thing I did really was change the memory to Xmp 1 so it would work at advertised clock speeds.

In regards to the GPU clock speeds if I leave it at the speeds set by Lact that’s when most of the issues occur more often. I’ve booted in 3 times in a row and it crashed each time. When I lowered the clock speed down the crashing would only happen every few hours.

Also on the previous clock speed another issue was that if you would leave the PC idle for about 10 mins then boot it back up it would immediately crash. Since reducing clock speeds those issues haven’t occurred.

I’ve reset it anyway and haven’t had any issues yet. I will be using it throughout today and see if any more crashes happen.

Another element is I’ve had to put in a custom fan curve because it was taking a while for the fans to kick in. And again very similar story. When I put the fan curve in I got better temps and less crashing. Although that doesn’t mean the crashing vanished. It crashed twice last night on one light load and another heavy load, but that was after i’d been gaming for 3 or 4 hours with zero issues. It was only after when I went to look at YouTube that it crashed and another time when I was trying out a new Proton setting.