[1815579 views]

[]

Odi's astoundingly incomplete notes

New entries | Code

Graphics board thermal death anatomy

stripes under cursorThe Radeon board of my old MacBook Pro is dying. During an update the laptop may have overheated. Suddenly the screen got garbled. From now on every few boots only end in a kernel panic (segfault). Compositing refuses to use OpenGL and falls back to using XRender. Also the mouse cursor now has stripes filling the normally empty area of the cursor sprite (see image). The kernel log contains the following messages:
[drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
OS-X doesn't want to boot any more. It gets stuck on an empty blue screen. Booting it in safe mode (holding down the shift key) or in single-user mode (holding command-S) works, though. Also there the cursor has the same stripy pattern.
Refit still boots and is usable, although I only get a black screen when it initializes the graphics card in "BIOS mode" most of the time. So I can control Grub only blindly, and I get no early boot messages from the kernel until the KMS driver loads.

Neat workaround for the funny barcode cursor: use the SWCursor option of the radeon driver.

Update:
After a few months it got worse. With the radeon driver a lot of horizontal stripes and totally unstable. The machine would hang ever so often. Falling back to the xf86-video-modesetting driver works fine. I still get a stripy mouse cursor but at least it's stable.

Seems it's time to replace this old buddy with a new one soon.

posted on 2012-05-06 16:15 UTC in Code | 3 comments | permalink
All the best! Looking forward to a new Linux Install guide on the latest Macbook Pro.

- KSS
Thanks for the kernel configs up to 3.3. My MBP 3.1 2.2Ghz is still kicking and I don't think I need the generic ubuntu kernel :)
TLDR:
Hibernate (not just suspend) and resume might fix the error mentioned in the blog post.


I know it's been 10 years, but the same thing just happened to me yesterday and during investigation I stumbled upon your blog post. I was using an old Sony Vaio VGN-A317M laptop (GPU: ATI Mobility Radeon X600, OS: Xubuntu 18.04) to control a PC via RDP, when suddenly glitches appeared all over the screen, making it virtually unreadable. At first I thought the hardware was permanently damaged since the error was persistent: even the BIOS POST logo was garbled and trying to boot Windows 7 caused a BSOD. So I booted Xubuntu again, ssh'd into it and had a look at the kernel log.

This was logged when the glitches first appeared:
radeon 0000:03:00.0: ring 0 stalled for more than 10232msec
radeon 0000:03:00.0: GPU lockup (current fence id 0x000000000001932a last fence id 0x000000000001932b on ring 0)

From now on during subsequent boots the "[drm:r100_ring_test]" entries mentioned in the blog post were logged.

Then I found this bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1029159
The reporter experienced the "[drm:r100_ring_test] *ERROR* radeon: ring test failed" error when resuming up from suspend. As a workaround they resumed from *hibernate* to fix it. So I gave it a try and simply running "systemctl hibernate" and resuming fixed it indeed! The glitches and error messages were gone and even Windows 7 booted again.