Unstable Nvidia - "unable to handle page fault for address"

Hello,

I’ve been having trouble with Nvidia on Wayland/Gnome for the past month or so. I’ve been using Xorg to avoid system freezing. I tried Wayland again today and within hours my laptop froze again. Here is the stack trace:

Jul 07 14:44:27 myhostname kernel: BUG: unable to handle page fault for address: ffff9aae10357fe8
Jul 07 14:44:27 myhostname kernel: #PF: supervisor write access in kernel mode
Jul 07 14:44:27 myhostname kernel: #PF: error_code(0x0003) - permissions violation
Jul 07 14:44:27 myhostname kernel: PGD 36c801067 P4D 36c801067 PUD 102b51063 PMD 10874a063 PTE 8000000110357121
Jul 07 14:44:27 myhostname kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
Jul 07 14:44:27 myhostname kernel: CPU: 13 PID: 12881 Comm: kworker/13:2 Tainted: P        W  OE      6.9.7-200.fc40.x86_64 #1
Jul 07 14:44:27 myhostname kernel: Hardware name: LENOVO 83DN/LNVNB161216, BIOS NKCN25WW 02/05/2024
Jul 07 14:44:27 myhostname kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Jul 07 14:44:27 myhostname kernel: RIP: 0010:_nv044549rm+0x10/0x30 [nvidia]
Jul 07 14:44:27 myhostname kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 66 0f 1f 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 48 83 ec 08 48 83 ed 10 48 8d 7d 08 <48> c7 45 08 00 00 00 00>
Jul 07 14:44:27 myhostname kernel: RSP: 0018:ffffae8461dfbd18 EFLAGS: 00010282
Jul 07 14:44:27 myhostname kernel: RAX: 0000000000000000 RBX: ffffae8440ce78e8 RCX: ffff9ab55f4b57e8
Jul 07 14:44:27 myhostname kernel: RDX: ffff9aae01726208 RSI: 00000000000000c0 RDI: ffff9aae10357fe8
Jul 07 14:44:27 myhostname kernel: RBP: ffff9aae10357fe0 R08: 6e6d5e686f62606a R09: ffff9aae01a4d280
Jul 07 14:44:27 myhostname kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000004
Jul 07 14:44:27 myhostname kernel: R13: 0000000000000000 R14: ffffae8440ca9008 R15: ffff9aae1ac78008
Jul 07 14:44:27 myhostname kernel: FS:  0000000000000000(0000) GS:ffff9ab55f480000(0000) knlGS:0000000000000000
Jul 07 14:44:27 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 07 14:44:27 myhostname kernel: CR2: ffff9aae10357fe8 CR3: 000000036b428004 CR4: 0000000000f70ef0
Jul 07 14:44:27 myhostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 07 14:44:27 myhostname kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Jul 07 14:44:27 myhostname kernel: PKRU: 55555554
Jul 07 14:44:27 myhostname kernel: Call Trace:
Jul 07 14:44:27 myhostname kernel:  <TASK>
Jul 07 14:44:27 myhostname kernel:  ? __die_body.cold+0x19/0x27
Jul 07 14:44:27 myhostname kernel:  ? page_fault_oops+0x15a/0x2c0
Jul 07 14:44:27 myhostname kernel:  ? search_module_extables+0x19/0x60
Jul 07 14:44:27 myhostname kernel:  ? search_bpf_extables+0x5f/0x80
Jul 07 14:44:27 myhostname kernel:  ? exc_page_fault+0x170/0x180
Jul 07 14:44:27 myhostname kernel:  ? asm_exc_page_fault+0x26/0x30
Jul 07 14:44:27 myhostname kernel:  ? _nv044549rm+0x10/0x30 [nvidia]
Jul 07 14:44:27 myhostname kernel:  _nv014876rm+0x4d/0x90 [nvidia]
Jul 07 14:44:27 myhostname kernel:  _nv050384rm+0x18/0x60 [nvidia]
Jul 07 14:44:27 myhostname kernel:  _nv027133rm+0x61/0x90 [nvidia]
Jul 07 14:44:27 myhostname kernel:  rm_acpi_nvpcf_notify+0x1c/0xe0 [nvidia]
Jul 07 14:44:27 myhostname kernel:  ? acpi_os_release_object+0xe/0x20
Jul 07 14:44:27 myhostname kernel:  ? acpi_evaluate_object+0x1d8/0x340
Jul 07 14:44:27 myhostname kernel:  acpi_ev_notify_dispatch+0x48/0x80
Jul 07 14:44:27 myhostname kernel:  acpi_os_execute_deferred+0x17/0x30
Jul 07 14:44:27 myhostname kernel:  process_one_work+0x186/0x340
Jul 07 14:44:27 myhostname kernel:  worker_thread+0x278/0x3b0
Jul 07 14:44:27 myhostname kernel:  ? __pfx_worker_thread+0x10/0x10
Jul 07 14:44:27 myhostname kernel:  kthread+0xcf/0x100
Jul 07 14:44:27 myhostname kernel:  ? __pfx_kthread+0x10/0x10
Jul 07 14:44:27 myhostname kernel:  ret_from_fork+0x31/0x50
Jul 07 14:44:27 myhostname kernel:  ? __pfx_kthread+0x10/0x10
Jul 07 14:44:27 myhostname kernel:  ret_from_fork_asm+0x1a/0x30
Jul 07 14:44:27 myhostname kernel:  </TASK>
Jul 07 14:44:27 myhostname kernel: Modules linked in: uinput xt_addrtype xt_nat xt_mark xt_conntrack xt_comment xt_MASQUERADE nft_compat veth bridge stp llc overlay snd_seq_dummy rfcomm snd_hr>
Jul 07 14:44:27 myhostname kernel:  snd_sof_utils uvcvideo snd_soc_hdac_hda snd_hda_ext_core btusb snd_soc_acpi_intel_match rapl iwlmvm uvc soundwire_generic_allocation btrtl btintel videobuf2>
Jul 07 14:44:27 myhostname kernel:  int340x_thermal_zone snd soundcore intel_vsec int3400_thermal pmt_telemetry acpi_thermal_rel pmt_class acpi_tad intel_hid acpi_pad sparse_keymap joydev brcm>
Jul 07 14:44:27 myhostname kernel: Unloaded tainted modules: nvidia_peermem(POE):1
Jul 07 14:44:27 myhostname kernel: CR2: ffff9aae10357fe8
Jul 07 14:44:27 myhostname kernel: ---[ end trace 0000000000000000 ]---
Jul 07 14:44:27 myhostname kernel: RIP: 0010:_nv044549rm+0x10/0x30 [nvidia]
Jul 07 14:44:27 myhostname kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 66 0f 1f 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 48 83 ec 08 48 83 ed 10 48 8d 7d 08 <48> c7 45 08 00 00 00 00>
Jul 07 14:44:27 myhostname kernel: RSP: 0018:ffffae8461dfbd18 EFLAGS: 00010282
Jul 07 14:44:27 myhostname kernel: RAX: 0000000000000000 RBX: ffffae8440ce78e8 RCX: ffff9ab55f4b57e8
Jul 07 14:44:27 myhostname kernel: RDX: ffff9aae01726208 RSI: 00000000000000c0 RDI: ffff9aae10357fe8
Jul 07 14:44:27 myhostname kernel: RBP: ffff9aae10357fe0 R08: 6e6d5e686f62606a R09: ffff9aae01a4d280
Jul 07 14:44:27 myhostname kernel: R10: 000000000000000d R11: 000000000000000d R12: 0000000000000004
Jul 07 14:44:27 myhostname kernel: R13: 0000000000000000 R14: ffffae8440ca9008 R15: ffff9aae1ac78008
Jul 07 14:44:27 myhostname kernel: FS:  0000000000000000(0000) GS:ffff9ab55f480000(0000) knlGS:0000000000000000
Jul 07 14:44:27 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 07 14:44:27 myhostname kernel: CR2: ffff9aae10357fe8 CR3: 000000036b428004 CR4: 0000000000f70ef0
Jul 07 14:44:27 myhostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 07 14:44:27 myhostname kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Jul 07 14:44:27 myhostname kernel: PKRU: 55555554

My laptop is a Yoga Pro 9 16IMH9 (type 83DN) model with a 4060.

Has anyone else experienced similar crashes?

I should also note that my kernel has been logging this on every boot as well:

Jul 07 14:59:23 myhostname kernel: ------------[ cut here ]------------
Jul 07 14:59:23 myhostname kernel: Unpatched return thunk in use. This should not happen!
Jul 07 14:59:23 myhostname kernel: WARNING: CPU: 4 PID: 1103 at arch/x86/kernel/cpu/bugs.c:3023 __warn_thunk+0x2a/0x40
Jul 07 14:59:23 myhostname kernel: Modules linked in: wl(POE+) bluetooth(+) intel_uncore(+) snd_soc_acpi mac80211 soundwire_bus wmi_bmof processor_thermal_device_pci pcspkr processor_thermal_d>
Jul 07 14:59:23 myhostname kernel:  scsi_dh_alua sunrpc kvmfr(OE) loop dm_multipath nfnetlink zram dm_crypt xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper hid_sensor_hub in>
Jul 07 14:59:23 myhostname kernel: Unloaded tainted modules: nvidia_peermem(POE):1
Jul 07 14:59:23 myhostname kernel: CPU: 4 PID: 1103 Comm: (udev-worker) Tainted: P           OE      6.9.7-200.fc40.x86_64 #1
Jul 07 14:59:23 myhostname kernel: Hardware name: LENOVO 83DN/LNVNB161216, BIOS NKCN25WW 02/05/2024
Jul 07 14:59:23 myhostname kernel: RIP: 0010:__warn_thunk+0x2a/0x40
Jul 07 14:59:23 myhostname kernel: Code: 66 0f 1f 00 0f 1f 44 00 00 80 3d 01 19 77 02 00 74 05 c3 cc cc cc cc 48 c7 c7 68 86 b3 a8 c6 05 ec 18 77 02 01 e8 b6 51 0c 00 <0f> 0b c3 cc cc cc cc 66>
Jul 07 14:59:23 myhostname kernel: RSP: 0018:ffffa93241a97af0 EFLAGS: 00010282
Jul 07 14:59:23 myhostname kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
Jul 07 14:59:23 myhostname kernel: RDX: ffff918a9f0218c8 RSI: 0000000000000001 RDI: ffff918a9f0218c0
Jul 07 14:59:23 myhostname kernel: RBP: ffffa93241a97b40 R08: 0000000000000000 R09: ffffa93241a97a80
Jul 07 14:59:23 myhostname kernel: R10: ffffffffa8b3869f R11: 0000000000000000 R12: ffffffffc64ffbb8
Jul 07 14:59:23 myhostname kernel: R13: ffffa93241a97b88 R14: 00007fd9582a307d R15: ffffa93241a97c18
Jul 07 14:59:23 myhostname kernel: FS:  00007fd957aba980(0000) GS:ffff918a9f000000(0000) knlGS:0000000000000000
Jul 07 14:59:23 myhostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 07 14:59:23 myhostname kernel: CR2: 00007feb136ef3bc CR3: 0000000112dd0003 CR4: 0000000000f70ef0
Jul 07 14:59:23 myhostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 07 14:59:23 myhostname kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
Jul 07 14:59:23 myhostname kernel: PKRU: 55555554
Jul 07 14:59:23 myhostname kernel: Call Trace:
Jul 07 14:59:23 myhostname kernel:  <TASK>
Jul 07 14:59:23 myhostname kernel:  ? __warn_thunk+0x2a/0x40
Jul 07 14:59:23 myhostname kernel:  ? __warn.cold+0x8e/0xe8
Jul 07 14:59:23 myhostname kernel:  ? __warn_thunk+0x2a/0x40
Jul 07 14:59:23 myhostname kernel:  ? report_bug+0xff/0x140
Jul 07 14:59:23 myhostname kernel:  ? handle_bug+0x3c/0x80
Jul 07 14:59:23 myhostname kernel:  ? exc_invalid_op+0x17/0x70
Jul 07 14:59:23 myhostname kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jul 07 14:59:23 myhostname kernel:  ? __warn_thunk+0x2a/0x40
Jul 07 14:59:23 myhostname kernel:  warn_thunk_thunk+0x1a/0x30
Jul 07 14:59:23 myhostname kernel:  getvar+0x20/0x70 [wl]
Jul 07 14:59:23 myhostname kernel:  ? __UNIQUE_ID_vermagic434+0x56cd7fc90ebc/0x56cd7fc90ebc [wl]
Jul 07 14:59:23 myhostname kernel:  wl_module_init+0x17/0xa0 [wl]
Jul 07 14:59:23 myhostname kernel:  ? do_one_initcall+0x58/0x310
Jul 07 14:59:23 myhostname kernel:  ? do_init_module+0x90/0x250
Jul 07 14:59:23 myhostname kernel:  ? __do_sys_init_module+0x17a/0x1b0
Jul 07 14:59:23 myhostname kernel:  ? do_syscall_64+0x82/0x160
Jul 07 14:59:23 myhostname kernel:  ? vfs_read+0x237/0x360
Jul 07 14:59:23 myhostname kernel:  ? syscall_exit_to_user_mode_prepare+0x149/0x170
Jul 07 14:59:23 myhostname kernel:  ? syscall_exit_to_user_mode+0x75/0x230
Jul 07 14:59:23 myhostname kernel:  ? do_syscall_64+0x8e/0x160
Jul 07 14:59:23 myhostname kernel:  ? do_user_addr_fault+0x34e/0x620
Jul 07 14:59:23 myhostname kernel:  ? exc_page_fault+0x7e/0x180
Jul 07 14:59:23 myhostname kernel:  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jul 07 14:59:23 myhostname kernel:  </TASK>
Jul 07 14:59:23 myhostname kernel: ---[ end trace 0000000000000000 ]---

This does not crash the kernel, it still starts and I am able to use the laptop, but maybe it’s related to this instability.

Has anyone else seen similar freezes on hybrid Nvidia laptops? Do you get the “Unpatched return thunk in use” error at startup?