Best branch for Nvidia hybrid laptop?

Hello everyone,

I have a Yoga Pro 9I (2024 model 16IMH9) which has an Nvidia 4060 dGPU alongside the Intel Arc graphics onboard the 185H Meteor Lake CPU.

Unfortunately the Nvidia card support still has a couple of super-annoying issues, namely:

  • The Gtk4 rendered in Wayland is broken and you need to force ngl
  • Whenever I connect a monitor to its USB-C ports (both the Thunderbold and the 3.2gen2 one) it later crashes Wayland, OR if it does not crash Wayland, it later hangs by never waking up from sleep (which also occurs when using X11).

So currently I am on the bluefin-dx-nvidia:stable branch, using X11 and I make sure to never connect external displays (which is quite annoying) in order to avoid the instability that arises from that…

I was wondering:

  1. Does anyone have a HYBRID LAPTOP (not desktop) that “just works” with Nvidia?
  2. What branch are you using?
  3. Is there an a nvidia branch that has their open source version of the driver? How do I switch to that? In fact, how can I tell if the active driver is the proprietary version, or if it is the newer OSS version?
  4. How can I tell if a connected monitor uses Displaylink and could this be related or is it an nvidia issue?

I would really like to at least get rid of the external display issue, as it is a major usability hurdle…

I have a laptop with Nvidia 3050 dGPU and Intel Iris Xe onboard. I’m using the aurora-dx-nvidia:latest image with zero issues. I connect/disconnect external displays via HDMI daily.

13th Gen Intel(R) Core(TM) i9-13900H (20) @ 5.40 GHz
NVIDIA GeForce RTX 3050 4GB Laptop GPU [Discrete]
Intel Iris Xe Graphics @ 1.50 GHz [Integrated]

Ah, good old HDMI. I do have a port but I never use it.

Do you know if nvidia:latest is using the new OSS driver from nvidia?

NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6

So, running nvidia-smi give me the same:

> nvidia-smi 
Mon Oct 14 15:52:46 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8              1W /   55W |     207MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
...

My question is, how do I know if this is the open source driver? I believe Nvidia has started using open source kernel drivers and recommends using those going forward?

I know I am not using the old nouveau driver via lsmod:

 ~> lsmod | grep -E "nouveau|nvidia"
nvidia_wmi_ec_backlight    12288  0
nvidia_drm            135168  3
nvidia_modeset       1650688  4 nvidia_drm
video                  81920  5 nvidia_wmi_ec_backlight,ideapad_laptop,xe,i915,nvidia_modeset
wmi                    32768  4 video,nvidia_wmi_ec_backlight,wmi_bmof,ideapad_laptop
nvidia_uvm           6848512  4
nvidia              72577024  81 nvidia_uvm,nvidia_modeset

What I don’t know is whether the above nvidia_... modules are from the old closed source driver, or if they are from the newer open source “nvidia” driver.

I have found that there is an issue merged feat: build nvidia open source kernel module by p5 · Pull Request #220 · ublue-os/akmods · GitHub about this, but not sure how you actually use the new “open” drivers.

Ok so bsherman and qoijjj spent the better part of 2 days digging through the pain. Here are the issues filed that cover a bunch of things:

The corresponding fixes have been merged, we’re rebuilding new images now but this should bring in some fixes for optimus users.

1 Like

I’m excited for these fixes!

Meanwhile, new driver version 565.57.01 has been deployed, so I’ll be testing the DisplayPort external monitor again. I hope it works (at least with X11 which is otherwise stable)…

EDIT: This is day 2 of me using an external DisplayPort monitor through the Thunderbolt connection. The laptop slept/resumed multiple times between disconnecting the day before and reconnecting the next day. This is quite encouraging. Note that I am on GTS channel as I want to stay on Fedora 40. AFAIK only change is the driver 565.57.01, still waiting on the above fixes.

Yeah, this didn’t last long. I’ll wait for the 3 fixes you mentioned above to trickle down to the GTS release. I hope you plan to include them in GTS? Or will these only be put in fedora 41-based builds?

Kernel log:

Oct 30 16:16:31 myhost kernel: BUG: unable to handle page fault for address: 00000000000a6304
Oct 30 16:16:31 myhost kernel: #PF: supervisor write access in kernel mode
Oct 30 16:16:31 myhost kernel: #PF: error_code(0x0002) - not-present page
Oct 30 16:16:31 myhost kernel: PGD 0 P4D 0 
Oct 30 16:16:31 myhost kernel: Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
Oct 30 16:16:31 myhost kernel: CPU: 3 UID: 0 PID: 670520 Comm: kworker/3:2 Tainted: P        W  O       6.11.3-200.fc40.x86_64 #1
Oct 30 16:16:31 myhost kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
Oct 30 16:16:31 myhost kernel: Hardware name: LENOVO 83DN/INVALID, BIOS NKCN24WW 01/16/2024
Oct 30 16:16:31 myhost kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Oct 30 16:16:31 myhost kernel: RIP: 0010:_nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: Code: 1f 44 00 00 48 8d 7d 18 e8 af f5 ff ff 85 c0 75 e5 48 8b 7d 18 48 85 ff 74 dc e8 bd f0 ff ff 85 c0 75 d3 48 8b 7d 18 89 45 0c <83> 4f 4c 2>
Oct 30 16:16:31 myhost kernel: RSP: 0018:ffffa7e30999bd38 EFLAGS: 00010246
Oct 30 16:16:31 myhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000225c17d03
Oct 30 16:16:31 myhost kernel: RDX: 0000000000000000 RSI: ffffa7e30999bcd8 RDI: 00000000000a62b8
Oct 30 16:16:31 myhost kernel: RBP: ffff99d3c0212fc0 R08: 0000000000000000 R09: 00000000000a3b38
Oct 30 16:16:31 myhost kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000000
Oct 30 16:16:31 myhost kernel: R13: 00000000000a3b38 R14: 0000000000001002 R15: ffff99d39ad27808
Oct 30 16:16:31 myhost kernel: FS:  0000000000000000(0000) GS:ffff99dadef80000(0000) knlGS:0000000000000000
Oct 30 16:16:31 myhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304 CR3: 00000005ac42a003 CR4: 0000000000f70ef0
Oct 30 16:16:31 myhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 16:16:31 myhost kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Oct 30 16:16:31 myhost kernel: PKRU: 55555554
Oct 30 16:16:31 myhost kernel: Call Trace:
Oct 30 16:16:31 myhost kernel:  <TASK>
Oct 30 16:16:31 myhost kernel:  ? __die_body.cold+0x19/0x27
Oct 30 16:16:31 myhost kernel:  ? page_fault_oops+0x15a/0x2f0
Oct 30 16:16:31 myhost kernel:  ? exc_page_fault+0x7e/0x180
Oct 30 16:16:31 myhost kernel:  ? asm_exc_page_fault+0x26/0x30
Oct 30 16:16:31 myhost kernel:  ? _nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel:  ? _nv054270rm+0x53/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel:  _nv049414rm+0x1c1/0x360 [nvidia]
Oct 30 16:16:31 myhost kernel:  rm_acpi_nvpcf_notify+0x35/0xf0 [nvidia]
Oct 30 16:16:31 myhost kernel:  acpi_ev_notify_dispatch+0x48/0x80
Oct 30 16:16:31 myhost kernel:  acpi_os_execute_deferred+0x17/0x30
Oct 30 16:16:31 myhost kernel:  process_one_work+0x176/0x330
Oct 30 16:16:31 myhost kernel:  worker_thread+0x252/0x390
Oct 30 16:16:31 myhost kernel:  ? __pfx_worker_thread+0x10/0x10
Oct 30 16:16:31 myhost kernel:  kthread+0xcf/0x100
Oct 30 16:16:31 myhost kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 16:16:31 myhost kernel:  ret_from_fork+0x31/0x50
Oct 30 16:16:31 myhost kernel:  ? __pfx_kthread+0x10/0x10
Oct 30 16:16:31 myhost kernel:  ret_from_fork_asm+0x1a/0x30
Oct 30 16:16:31 myhost kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet r8152 mii overlay uinput snd_seq_>
Oct 30 16:16:31 myhost kernel:  soundwire_generic_allocation coretemp snd_soc_acpi soundwire_bus snd_hda_scodec_tas2781_i2c libarc4 snd_hda_intel snd_soc_tas2781_fmwlib spi_no>
Oct 30 16:16:31 myhost kernel:  intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec int3400_thermal intel_hid pmt_telemetry acpi_tad acpi_thermal_rel pmt_class spar>
Oct 30 16:16:31 myhost kernel: Unloaded tainted modules: nvidia_peermem(PO):1
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304
Oct 30 16:16:31 myhost kernel: ---[ end trace 0000000000000000 ]---
Oct 30 16:16:31 myhost kernel: RIP: 0010:_nv054270rm+0x5e/0x70 [nvidia]
Oct 30 16:16:31 myhost kernel: Code: 1f 44 00 00 48 8d 7d 18 e8 af f5 ff ff 85 c0 75 e5 48 8b 7d 18 48 85 ff 74 dc e8 bd f0 ff ff 85 c0 75 d3 48 8b 7d 18 89 45 0c <83> 4f 4c 2>
Oct 30 16:16:31 myhost kernel: RSP: 0018:ffffa7e30999bd38 EFLAGS: 00010246
Oct 30 16:16:31 myhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000225c17d03
Oct 30 16:16:31 myhost kernel: RDX: 0000000000000000 RSI: ffffa7e30999bcd8 RDI: 00000000000a62b8
Oct 30 16:16:31 myhost kernel: RBP: ffff99d3c0212fc0 R08: 0000000000000000 R09: 00000000000a3b38
Oct 30 16:16:31 myhost kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000000
Oct 30 16:16:31 myhost kernel: R13: 00000000000a3b38 R14: 0000000000001002 R15: ffff99d39ad27808
Oct 30 16:16:31 myhost kernel: FS:  0000000000000000(0000) GS:ffff99dadef80000(0000) knlGS:0000000000000000
Oct 30 16:16:31 myhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 30 16:16:31 myhost kernel: CR2: 00000000000a6304 CR3: 00000005ac42a003 CR4: 0000000000f70ef0
Oct 30 16:16:31 myhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 30 16:16:31 myhost kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Oct 30 16:16:31 myhost kernel: PKRU: 55555554

Yeah everything is in there, this may be something else, I’ll wait for one of the experts to chime in.

At the moment bluefin-nvidia:gts and bluefin-nvidia:stable are the same. Stable will only promote to F41 when CoreOS gives us a “stable” F41 build.

What that means:

All the nvidia fixed linked above are in GTS same as stable and latest. (Actually they are there for ALL new builds of any ublue nvidia image.)

For now, you have the nvidia “closed” driver, as that’s all we are providing in bluefin. We have done work to prepare to ship the “open” driver, but it’s a bit messy since it can really only be used for Ampere and newer (officially Turing, but there are several Turing specific bugs in the open driver, especially around power management). So when we do that it’ll complicate the communication around which ublue image to use.

I’m running bluefin-nvidia:stable (but as mentioned, same as GTS for now), with an Ampere hybrid laptop. I do have one known issue kernel trace in my dmesg output (yes, i know that’s a report on “open” but it seems to be the same on “closed”). So the one I linked is being addressed by nvidia, and I guess we just need to wait for the fix. Thankfully, on my hybrid laptop, I’m still able to use the card well enough.

Your stack trace is not something I’ve seen (nor have a few other members of ublue project). It seems a bit more generic than the known issue… so it could be a kernel 6.11 (I think you should be on kernel 6.11.3 if current) issue with your machine that’s not directly related to nvidia. I’m not sure.

I know one person with an Ada desktop card. I’ll see if they can reproduce this.

1 Like

The one person I know with Ada desktop could not reproduce this particular error. Granted it was a desktop not a laptop.

I would just like to report that this driver version seems to have improved Wayland sleep/resume. Early days still, but so far no sleep/resume crashes (though I have not tried connecting displayport so far).

nvidia-driver-565.57.01-4.fc41.x86_64

From Nvidia release notes:

Hopefully this is what was plaguing me. I’ll give it a couple of more days of closing/opening the lead without rebooting to make sure this works fine, but so far I’ve had no crarshes (as long as I stay away from DisplayPort connection).

1 Like

So, it’s been 3 days of suspending/resuming while using also my Thunderbolt docking station to drive a DisplayPort monitor. I’ve been through 16 cycles so far without issue:

> journalctl -b -t systemd | grep -E "nvidia-suspend|nvidia-resume" | grep Starting | wc -l
32

So everything finally works well for me! I can finally say Nvidia has a workable driver…

It’s like having a new laptop now!