Saturday, December 4, 2010

Very peculiar system instability event

Just now I came home from some grocery shopping, popped the dinner into the microwave, and turned on my main desktop computer. From the LILO-menu, I decided to boot Windows. Got as far as "Disk Read Error", and then the system rebooted.

Ok, I thought, well, that drive is getting a bit old. But now LILO didn't even boot, I got the good old "L01 L01 L01 L01"-of-death screen. Not just a bad block, but some sort of progressive failure?

Okay, I thought, maybe I'll try the other LILO install I have on another MBR in case of emergencies like this. "L01 L01 L01"... and then it surprised me by actually loading, but that was as far as it went, as it behaved very erratically. I didn't manage to actually boot anything, and even navigating the menu was sluggish.

Now, the prospect of simultaneous disk failure seems a bit far fetched, and no other electronics have failed so I didn't consider a power spike particularly likely either. The CPU seems fine, as BIOS worked without trouble during the incident, and system temperature was also fine so a fan failure doesn't seem a likely explanation either.

Anyway, I'm digressing from the story. At this point I shut it down and let the computer rest for a while as I ate my now cooked dinner. Then I tried again, got as far as "L01 L01 L01" and then the main LILO suddenly worked. I managed to boot into Linux, and there were no obvious signs of error. So I rebooted into Windows, and it worked too.

I have no idea what this event was about, the most plausible explanation I've come up with is some sort of interference from the microwave (which I use on a daily basis, typically with the computer on and with no issues).

So fingers crossed that this was some sort of fluke event that won't repeat itself, but in case it was it's way of saying "Goodbye cruel world!", I've made several off-site backups of everything of importance.

Edit: Additional tests

I've performed some diagnostic tests on the machine, ran a bad block search with fsck and a memtest86 sweep. No errors. But still some residual instability. It seems that if I get past booting, everything is stable and okay, but there's a chance things go bad at the boot loader or during the kernel loading. Maybe the MRB, or some other early block (that wouldn't have been tested by fsck) is damaged?

No comments:

Post a Comment