 |
 |
 |
 |
| Servers & Networking Discuss any Fedora server problems and Networking issues such as dhcp, IP numbers, wlan, modems, etc. |

10th April 2009, 04:36 PM
|
 |
Registered User
|
|
Join Date: Jun 2005
Location: Mission Control
Posts: 1,229

|
|
|
RAID 5 - Need Help - I'm an idiot!
Ok, I have a RAID-5 array consisting of 5x500GB SATA disks... sda-sde.
I've been shifting the hard disks around to make space between them (for cooling), and being an absolute moron, forgot to check that all disks were wired in correctly, and booted the array with a disk missing power - oops!
So sdc is now "removed", then to add to the fun, sdd died, but not thoroughly, I/O errors, but is still accessible after a power cycle. I've been fitting the disks into self-cooled caddys, but suspect they could have caused the IO errors, they're not simple pass throughs, so I'm rebuilding now without the caddys - 53% so far.
I've tried re-assembling the array, with --force, adding sdc back into the mix, but it gets to about 58% rebuilding, then sdd gets upset and it all goes belly up.
I can access the data on the array while it's rebuilding, and all essential data is backed up, but I'd rather not loose the other 1TB of data.
Anyone got any suggestions on this? other than taking myself outside and shooting myself?
I've been thinking about trying to do a dd if=/dev/sdd of=/dev/newdisk, and then trying to rebuild the array with a new disk for sdd.
Any magical bodges like this that could get my array back to life would be greatly appreciated.
Thanks, a feeling rather dense Savage
_____________
Update 1: Removing the caddy's didn't help, I'm currently trying dd to make a clone of sdd, and will then try to re-assemble the array with the new disk. Is there a way to fsck a "Linux RAID" partition?
Last edited by savage; 10th April 2009 at 05:48 PM.
|

11th April 2009, 05:36 AM
|
 |
Registered User
|
|
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302

|
|
|
I think your plan (dd the entire image and use the new disk) is a wise choice.
IIRC the disk ident is from a software generated UUID - so I think it;'s a clean
solution.
Please post if this works (or not), and also ...
how long is the rebuild taking ?
|

11th April 2009, 03:35 PM
|
 |
Registered User
|
|
Join Date: Jun 2005
Location: Mission Control
Posts: 1,229

|
|
Thanks for the reply, I have successfully dd'd the disk, it didn't seem to error, but it wont let me rebuild the array with it because it doesn't have a superblock, which has really thrown me off as I thought dd would copy EVERYTHING!
So anyway I've got another 2x500GB disks (they were destined to replace faulty disks), but I am using them as temporary storage, and currently backing up data to those - I am confused as I haven't had any kind of error retrieving data from the array running with 4 disks (including sdd - which fails consistenty during rebuilds).
Once I've got the bulk of stuff I don't want to loose, I'm going to try to add a new disk as sdc (the one I removed power from), hopefully writing a new superblock, and then 'dd' sdd onto it again.
I've seen another post on linuxquestions.org that suggests that using 'mdadm --create' won't shaft the data, but will write a superblock, but I wanted backups before trying that.
A normal rebuild of this array takes a good 3-6 hours.
As much as this does suck, my confidence levels with mdadm and RAID has gone through the roof
Like I say all the essential stuff is safe, but there's a lot of movies and music in on there that I'm trying to save too -- they're ripped from family DVDs, so no issue re-ripping them, but I'd rather avoid that if I can (most of my DVDs have been boxed for years).
I'll keep this thread updated with what happens, once I've got as much off as possible, I'll be free to go to town on it and try any and everything.
Savage
Update: I spoke too soon, the array falls over and then I got read errors copying some movies. I'll re-assemble and skip onto the next set of directories.
Last edited by savage; 11th April 2009 at 04:31 PM.
|

11th April 2009, 04:39 PM
|
 |
Registered User
|
|
Join Date: Jun 2005
Location: Mission Control
Posts: 1,229

|
|
|
Another question, I am getting really confused.
If the disk can be copied entirely without error, that says to me the disk itself is OK, but when rebuilding/copying data I get errors.
Is it possible for an ext3 file system error to trip up mdadm and make it think the disk is at fault?
----
I am well and truely beyond confused now. Smartctl (smartdctl?) says that the disk is fine. fsck on the md1 array running with 4 disks reports it's fine. Rebuilding failed.
The only good thing to come of this, is that sdc that I just tried adding should now have a valid superblock, so I'm dd'ing sdd onto that again, it'll take a few hours, but hopefully work. Then with any luck I can recover the array with that disk.
All in all, I really don't understand what's going on here, if the disk is fine, and the file system is fine, where's the problem!?
Last edited by savage; 11th April 2009 at 10:01 PM.
|

12th April 2009, 06:32 PM
|
 |
Registered User
|
|
Join Date: Jun 2005
Location: Mission Control
Posts: 1,229

|
|
Well in the end I was defeated, but I did manage to recover roughly 90% of the data, and important stuff was backed up.
For anyone else who accidentally removes 2 disks from a RAID-5 array, don't panic, it's not that bad to fix (provided you don't have satanic hard disks like me):
Code:
mdadm --assemble /dev/mdX --force
--force is required as the disks are flagged as removed, force will remove that flag.
If it complains the superblock is missing, you can apparently re-create the array without damaging data*:
Code:
mdadm --create /dev/mdX -l5 -n5 /dev/sda1 /dev/sdb1...
You'll then need to assemble it, as above.
* I didn't try this, as no matter what I did, it always failed around 53%.
|

12th April 2009, 11:23 PM
|
 |
Registered User
|
|
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302

|
|
|
I greatly doubt (unsure) that an ext3 error could glitch your rebuild.
Glad you recovered all the import stuff, but but 2 failures on a 5x RAID5 is beyond my experience.
As Meatloaf sang, "four out of five ain't bad". 3 of five is a problem.
|

13th April 2009, 05:33 PM
|
 |
Registered User
|
|
Join Date: Jun 2005
Location: Mission Control
Posts: 1,229

|
|
Quote:
Originally Posted by stevea
I greatly doubt (unsure) that an ext3 error could glitch your rebuild.
Glad you recovered all the import stuff, but but 2 failures on a 5x RAID5 is beyond my experience.
As Meatloaf sang, "four out of five ain't bad". 3 of five is a problem.
|
I know big corps like to blame the technology, but ultimately it was my fault. I was putting the disks into self-cooled caddys and missed the power off one, dropping it to 4 disks, while it was being re-added, another disk failed. If I hadn't missed the power, it wouldn't have been a problem.
As for the disk that failed, I have no idea, I moved it into this PC, reformatted it and left it creating a file from /dev/zero, it didn't error or crash, just filled.
I've been running a home server since 2005 (when I got into Linux), and that was my first major disaster with it. I came out pretty well, and am amazed by Linux for it.
I restored user documents from backup to the new array last night, as soon as I did, restarted a few services that were failing, and everything just clicked and was back running, I was massively impressed, 2 days ago I didn't expect to come out of this with just a few scratches.
Initially that "(Filesystem Recovery #): " prompt was intimidating, but soon became my best friend, and a relaxed environment where I could experiment.
I've come out of this a lot more confident about RAID and mdadm, and loving Linux even more.
Quote:
|
Originally Posted by Ralph Waldo Emerson
Bad times have a scientific value. These are occasions a good learner would not miss.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Current GMT-time: 11:56 (Friday, 24-05-2013)
|
|
 |
 |
 |
 |
|
|