23 Nov 2024 - by 'Maurits van der Schee'
My main computer is a Deskmini x600 with an AMD 8700g. I am running Linux Mint 22 with kernel 6.8.0-49-generic. Suddenly I was seeing graphics artifacts and random crashes. I suspected the memory and tested that with memtest86+, but that didn't show faulty memory. I tried rolling back the latest kernel upgrade, but that didn't help. I tried stress testing and gaming, but it wasn't failing under load. This made me realize I had an interesting problem at hand and I started searching for more crash reports.
Some of the reports of similar issues on the open source AMD driver are:
Similar behavior happens on SteamDesk devices, GPD Win Max 2, but also system with on external graphics cards.
This shows as both screens completely freezing for a few seconds, the screens going black, then flashing as the graphics crash and recover.
In Windows this may show up in the Windows Event Viewer as "Display driver amduw23g stopped responding and has successfully recovered
" or in Linux as "[drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout
".
The GPU crashes with the message "ring gfx_0.0.0 timeout" appearing in the kernel logs "/var/log/syslog" as:.
2024-11-23T02:04:17.243245+01:00 deskmini-8700g kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=343534, emitted seq=343536
2024-11-23T02:04:17.243257+01:00 deskmini-8700g kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1771 thread Xorg:cs0 pid 1953
2024-11-23T02:04:17.243257+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
2024-11-23T02:04:17.470236+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
2024-11-23T02:04:17.894254+01:00 deskmini-8700g kernel: message repeated 2 times: [ amdgpu 0000:04:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839]
2024-11-23T02:04:18.032236+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.032246+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.162235+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.162241+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.292215+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.292220+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.422235+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.422242+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.552235+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.552241+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.682239+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.682246+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.812237+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.812243+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:18.942240+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:18.942248+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:19.072239+01:00 deskmini-8700g kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
2024-11-23T02:04:19.072246+01:00 deskmini-8700g kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
2024-11-23T02:04:19.278237+01:00 deskmini-8700g kernel: [drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
2024-11-23T02:04:19.279244+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
2024-11-23T02:04:19.314228+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
2024-11-23T02:04:19.314232+01:00 deskmini-8700g kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
2024-11-23T02:04:19.314232+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
2024-11-23T02:04:19.316239+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
2024-11-23T02:04:19.317238+01:00 deskmini-8700g kernel: [drm] DMUB hardware initialized: version=0x08003700
2024-11-23T02:04:19.475266+01:00 deskmini-8700g kernel: [drm] kiq ring mec 3 pipe 1 q 0
2024-11-23T02:04:19.477278+01:00 deskmini-8700g kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
2024-11-23T02:04:19.477285+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
2024-11-23T02:04:19.477286+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2024-11-23T02:04:19.477286+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2024-11-23T02:04:19.477287+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2024-11-23T02:04:19.477289+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
2024-11-23T02:04:19.477290+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
2024-11-23T02:04:19.477291+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
2024-11-23T02:04:19.477301+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
2024-11-23T02:04:19.477302+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
2024-11-23T02:04:19.477302+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
2024-11-23T02:04:19.477303+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2024-11-23T02:04:19.477303+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
2024-11-23T02:04:19.477304+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
2024-11-23T02:04:19.477305+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
2024-11-23T02:04:19.479237+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
2024-11-23T02:04:19.479244+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
2024-11-23T02:04:19.479244+01:00 deskmini-8700g kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(2) succeeded!
2024-11-23T02:04:19.578663+01:00 deskmini-8700g kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
It is annoying, as you can lose some unsaved work, but due to the graphics artifacts in Firefox and VSCode (due to hardware acceleration) you are somehow warned that the problem is about to happen.
Notice how some (not all) letters "r" are misplaced by Firefox in a plain textarea (where I am writing this post in Markdown).
I am not saying that none of these work, but let's list the options in the order I would try them:
power_dpm_force_performance_level
" to "high"I have been trying all of them, but as the problem is hard to reproduce, it is also hard to tell whether or not they work. It seems that everything I have tried so far has not worked. I'm still considering swapping out hardware components (motherboard, CPU and RAM).
PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.