Sector read error during RAID resync

I run a 4-drive RAID 10. Basically this means data is duplicated and then striped. Each drive is 2 TB so I get 4 TB of usable space. Theoretically the RAID can survive up to two drive failures, as long as those drives didn’t make up half of the array and it’s duplicate.

I shipped this collection of drives via USPS during my move from college to my apartment. The box arrived looking pretty beat up and, even though I thought I had packed everything well, many of the drives had circuit board damage, physical interface damage, and one had platter damage (something sounded loose inside the drive).

Two of the four drives in the array showed no visible damage, so I hoped this constituted the full array. It did, two replacement drives from Newegg arrived, and I began the resync.

Then I get a sector read failure on one of the original two drives. FFFFFFFUUUUUUUUUUUU.

Resync stops and the array is basically considered dead. My computer is still up, so I try to login at the console and I get a ext4 read error instead. (This occurred because all of my binaries and configs for this system lived on the RAID.) This is really bad news. I boot a live CD and the RAID refuses to assemble.

Here’s how I got my RAID back.

First, I let Western Digital’s Data Lifeguard tool on the Ultimate Boot CD remap the sector and recover whatever it could. It said it was successful.

Then, I booted into an Ubuntu 10.04 live distribution. I forced mdadm to assemble the array using:

mdadm --assemble --run <uuid of array>

Then added the two drives from Newegg and let the resync finish.

After the resync finished, I let fsck run over the filesystems on the RAID, fix a lot of orphaned inode errors, and everything seems to work flawlessly now.

Needless to say, I got really lucky

Written by notatypewriter

2011 July 16 at 9:23 pm

Posted in Nerding out

Tagged with linux

Notatypewriter's Blog