Whenever I connect a monitor to its USB-C ports (both the Thunderbold and the 3.2gen2 one) it later crashes Wayland, OR if it does not crash Wayland, it later hangs by never waking up from sleep (which also occurs when using X11).
So currently I am on the bluefin-dx-nvidia:stable branch, using X11 and I make sure to never connect external displays (which is quite annoying) in order to avoid the instability that arises from that…
I was wondering:
Does anyone have a HYBRID LAPTOP (not desktop) that “just works” with Nvidia?
What branch are you using?
Is there an a nvidia branch that has their open source version of the driver? How do I switch to that? In fact, how can I tell if the active driver is the proprietary version, or if it is the newer OSS version?
I have a laptop with Nvidia 3050 dGPU and Intel Iris Xe onboard. I’m using the aurora-dx-nvidia:latest image with zero issues. I connect/disconnect external displays via HDMI daily.
My question is, how do I know if this is the open source driver? I believe Nvidia has started using open source kernel drivers and recommends using those going forward?
I know I am not using the old nouveau driver via lsmod:
What I don’t know is whether the above nvidia_... modules are from the old closed source driver, or if they are from the newer open source “nvidia” driver.
Meanwhile, new driver version 565.57.01 has been deployed, so I’ll be testing the DisplayPort external monitor again. I hope it works (at least with X11 which is otherwise stable)…
EDIT: This is day 2 of me using an external DisplayPort monitor through the Thunderbolt connection. The laptop slept/resumed multiple times between disconnecting the day before and reconnecting the next day. This is quite encouraging. Note that I am on GTS channel as I want to stay on Fedora 40. AFAIK only change is the driver 565.57.01, still waiting on the above fixes.
Yeah, this didn’t last long. I’ll wait for the 3 fixes you mentioned above to trickle down to the GTS release. I hope you plan to include them in GTS? Or will these only be put in fedora 41-based builds?
Kernel log:
Oct 30 16:16:31 myhost kernel: BUG: unable to handle page fault for address: 00000000000a6304
Oct 30 16:16:31 myhost kernel: #PF: supervisor write access in kernel mode
Oct 30 16:16:31 myhost kernel: #PF: error_code(0x0002) - not-present page
Oct 30 16:16:31 myhost kernel: PGD 0 P4D 0
Oct 30 16:16:31 myhost kernel: Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
Oct 30 16:16:31 myhost kernel: CPU: 3 UID: 0 PID: 670520 Comm: kworker/3:2 Tainted: P W O 6.11.3-200.fc40.x86_64 #1
Oct 30 16:16:31 myhost kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
Oct 30 16:16:31 myhost kernel: Hardware name: LENOVO 83DN/INVALID, BIOS NKCN24WW 01/16/2024
Oct 30 16:16:31 myhost kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Oct 30 16:16:31 myhost kernel: RIP: 0010:_nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: Code: 1f 44 00 00 48 8d 7d 18 e8 af f5 ff ff 85 c0 75 e5 48 8b 7d 18 48 85 ff 74 dc e8 bd f0 ff ff 85 c0 75 d3 48 8b 7d 18 89 45 0c <83> 4f 4c 2>
Oct 30 16:16:31 myhost kernel: RSP: 0018:ffffa7e30999bd38 EFLAGS: 00010246
Oct 30 16:16:31 myhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000225c17d03
Oct 30 16:16:31 myhost kernel: RDX: 0000000000000000 RSI: ffffa7e30999bcd8 RDI: 00000000000a62b8
Oct 30 16:16:31 myhost kernel: RBP: ffff99d3c0212fc0 R08: 0000000000000000 R09: 00000000000a3b38
Oct 30 16:16:31 myhost kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000000
Oct 30 16:16:31 myhost kernel: R13: 00000000000a3b38 R14: 0000000000001002 R15: ffff99d39ad27808
Oct 30 16:16:31 myhost kernel: FS: 0000000000000000(0000) GS:ffff99dadef80000(0000) knlGS:0000000000000000
Oct 30 16:16:31 myhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304 CR3: 00000005ac42a003 CR4: 0000000000f70ef0
Oct 30 16:16:31 myhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 16:16:31 myhost kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Oct 30 16:16:31 myhost kernel: PKRU: 55555554
Oct 30 16:16:31 myhost kernel: Call Trace:
Oct 30 16:16:31 myhost kernel: <TASK>
Oct 30 16:16:31 myhost kernel: ? __die_body.cold+0x19/0x27
Oct 30 16:16:31 myhost kernel: ? page_fault_oops+0x15a/0x2f0
Oct 30 16:16:31 myhost kernel: ? exc_page_fault+0x7e/0x180
Oct 30 16:16:31 myhost kernel: ? asm_exc_page_fault+0x26/0x30
Oct 30 16:16:31 myhost kernel: ? _nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: ? _nv054270rm+0x53/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: _nv049414rm+0x1c1/0x360 [nvidia]
Oct 30 16:16:31 myhost kernel: rm_acpi_nvpcf_notify+0x35/0xf0 [nvidia]
Oct 30 16:16:31 myhost kernel: acpi_ev_notify_dispatch+0x48/0x80
Oct 30 16:16:31 myhost kernel: acpi_os_execute_deferred+0x17/0x30
Oct 30 16:16:31 myhost kernel: process_one_work+0x176/0x330
Oct 30 16:16:31 myhost kernel: worker_thread+0x252/0x390
Oct 30 16:16:31 myhost kernel: ? __pfx_worker_thread+0x10/0x10
Oct 30 16:16:31 myhost kernel: kthread+0xcf/0x100
Oct 30 16:16:31 myhost kernel: ? __pfx_kthread+0x10/0x10
Oct 30 16:16:31 myhost kernel: ret_from_fork+0x31/0x50
Oct 30 16:16:31 myhost kernel: ? __pfx_kthread+0x10/0x10
Oct 30 16:16:31 myhost kernel: ret_from_fork_asm+0x1a/0x30
Oct 30 16:16:31 myhost kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet r8152 mii overlay uinput snd_seq_>
Oct 30 16:16:31 myhost kernel: soundwire_generic_allocation coretemp snd_soc_acpi soundwire_bus snd_hda_scodec_tas2781_i2c libarc4 snd_hda_intel snd_soc_tas2781_fmwlib spi_no>
Oct 30 16:16:31 myhost kernel: intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec int3400_thermal intel_hid pmt_telemetry acpi_tad acpi_thermal_rel pmt_class spar>
Oct 30 16:16:31 myhost kernel: Unloaded tainted modules: nvidia_peermem(PO):1
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304
Oct 30 16:16:31 myhost kernel: ---[ end trace 0000000000000000 ]---
Oct 30 16:16:31 myhost kernel: RIP: 0010:_nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: Code: 1f 44 00 00 48 8d 7d 18 e8 af f5 ff ff 85 c0 75 e5 48 8b 7d 18 48 85 ff 74 dc e8 bd f0 ff ff 85 c0 75 d3 48 8b 7d 18 89 45 0c <83> 4f 4c 2>
Oct 30 16:16:31 myhost kernel: RSP: 0018:ffffa7e30999bd38 EFLAGS: 00010246
Oct 30 16:16:31 myhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000225c17d03
Oct 30 16:16:31 myhost kernel: RDX: 0000000000000000 RSI: ffffa7e30999bcd8 RDI: 00000000000a62b8
Oct 30 16:16:31 myhost kernel: RBP: ffff99d3c0212fc0 R08: 0000000000000000 R09: 00000000000a3b38
Oct 30 16:16:31 myhost kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000000
Oct 30 16:16:31 myhost kernel: R13: 00000000000a3b38 R14: 0000000000001002 R15: ffff99d39ad27808
Oct 30 16:16:31 myhost kernel: FS: 0000000000000000(0000) GS:ffff99dadef80000(0000) knlGS:0000000000000000
Oct 30 16:16:31 myhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304 CR3: 00000005ac42a003 CR4: 0000000000f70ef0
Oct 30 16:16:31 myhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 16:16:31 myhost kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Oct 30 16:16:31 myhost kernel: PKRU: 55555554
At the moment bluefin-nvidia:gts and bluefin-nvidia:stable are the same. Stable will only promote to F41 when CoreOS gives us a “stable” F41 build.
What that means:
All the nvidia fixed linked above are in GTS same as stable and latest. (Actually they are there for ALL new builds of any ublue nvidia image.)
For now, you have the nvidia “closed” driver, as that’s all we are providing in bluefin. We have done work to prepare to ship the “open” driver, but it’s a bit messy since it can really only be used for Ampere and newer (officially Turing, but there are several Turing specific bugs in the open driver, especially around power management). So when we do that it’ll complicate the communication around which ublue image to use.
I’m running bluefin-nvidia:stable (but as mentioned, same as GTS for now), with an Ampere hybrid laptop. I do have one known issue kernel trace in my dmesg output (yes, i know that’s a report on “open” but it seems to be the same on “closed”). So the one I linked is being addressed by nvidia, and I guess we just need to wait for the fix. Thankfully, on my hybrid laptop, I’m still able to use the card well enough.
Your stack trace is not something I’ve seen (nor have a few other members of ublue project). It seems a bit more generic than the known issue… so it could be a kernel 6.11 (I think you should be on kernel 6.11.3 if current) issue with your machine that’s not directly related to nvidia. I’m not sure.
I know one person with an Ada desktop card. I’ll see if they can reproduce this.
I would just like to report that this driver version seems to have improved Wayland sleep/resume. Early days still, but so far no sleep/resume crashes (though I have not tried connecting displayport so far).
Hopefully this is what was plaguing me. I’ll give it a couple of more days of closing/opening the lead without rebooting to make sure this works fine, but so far I’ve had no crarshes (as long as I stay away from DisplayPort connection).
So, it’s been 3 days of suspending/resuming while using also my Thunderbolt docking station to drive a DisplayPort monitor. I’ve been through 16 cycles so far without issue: