Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Using Fedora
FedoraForum Search

Forgot Password? Join Us!

Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 11th July 2012, 03:59 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
e2fsck on a large volume taking days to run

I have a 6x2TB software RAID array configured as RAID6 (two redundant drives, 8TB of usable space) running in Fedora 17. After the power issues in the NE last week, the system booted up to some problems with the array and maybe some other issues since it forced into maintenance mode. I ran some checks on mdadm and found that i may have lost one drive, but i still had 5 online and the array was usable. I also found that the problem with mounting the array was in the filesystem on top of the array, not the array itself.

I tried running e2fsck right away but it couldn't detect the superblock. I ran testdisk and was able to identify the array and could browse the files so i knew i found my data. I exported the superblocks and ran e2fsck by entering the superblock. Then i started getting errors that e2fsck was running out of memory (shocking!), so i added a 500GB hard drive as a swap partition and this fixed the memory problem. Now e2fsck is running...

The problem is it's been running for 6 days. I realize its a huge filesystem, but 6 days seems like a long time. I know it hasn't stalled because i get a few update messages each day. They seem to be all "multiply claimed blocks". The original filename has dates that are inline with my files' dates, but the other blocks of "claimed" blocks have weird dates like 2015, 2040, 2070, etc. I know what these errors mean (sort of) but i don't know if finding them says anything about the chances of recovering data.

Any ideas what's going on? Any likelihood of getting my data? seeing my files in testdisk made me feel better about getting it back, but why is it taking so long? Thanks for any information you can provide!

Last edited by warnockm; 11th July 2012 at 11:16 AM.
Reply With Quote
  #2  
Old 11th July 2012, 10:39 AM
george_toolan Offline
Registered User
 
Join Date: Dec 2006
Posts: 1,717
linuxfirefox
Re: e2fsck on a large volume taking days to run

Only 8 GiB of usable space? Now that's what I call redundant.

I strongly hope you have a backup ;-)

What kind of file system are you using? Ext3 and ext4 file systems have a journal, so recovering from a power failure should only take a couple of seconds.

If you did "lose" one of your drives you really should add another drive since you don't seem to have a hot spare and use your raid manufacturer's tools to rebuild the array set. Maybe using RAID5 with only one parity drive and one hot spare would've been a better idea?

Unfortunately e2fsck doesn't know anything about your raid and it is trying to scrape files off your disks where about 1/4 of everything is missing. This will probably destroy your data.

How much memory do you have and how much memory is e2fsck using right now?

Checking a 2 TiB drive could take a couple of hours, but since you're running out of memory it would probably take forever.

Last edited by george_toolan; 11th July 2012 at 10:41 AM.
Reply With Quote
  #3  
Old 12th July 2012, 03:00 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

Quote:
Originally Posted by george_toolan View Post
Unfortunately e2fsck doesn't know anything about your raid and it is trying to scrape files off your disks where about 1/4 of everything is missing. This will probably destroy your data.
Also i had a question about this. What do you mean 1/4 of everything is missing? Are you referring to the missing drive? because its RAID6 i have two redundant drives so the partition should be in tact and usable to the OS.
Reply With Quote
  #4  
Old 12th July 2012, 03:57 AM
Gareth Jones Offline
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,707
linuxfirefox
Re: e2fsck on a large volume taking days to run

Quote:
Originally Posted by warnockm View Post
Also i had a question about this. What do you mean 1/4 of everything is missing? Are you referring to the missing drive? because its RAID6 i have two redundant drives so the partition should be in tact and usable to the OS.
If the RAID6 really only has one (or two) failed drive(s), fsck should see a complete file-system as you say. However the "missing" data must be regenerated on the fly from the remaining data and parity, which presumably could be slow with software RAID6. I've no idea if it should be that slow though!
Reply With Quote
  #5  
Old 12th July 2012, 04:01 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

Quote:
Originally Posted by Gareth Jones View Post
If the RAID6 really only has one (or two) failed drive(s), fsck should see a complete file-system as you say. However the "missing" data must be regenerated on the fly from the remaining data and parity, which presumably could be slow with software RAID6. I've no idea if it should be that slow though!
I'm getting the impression that there is no good reason for this to be taking 8 days. Even with an 8TB file system, even 90% full, even with large files. But the problem is i can't find any data that says otherwise... I just noticed a few minutes ago that i might have locked up...
Reply With Quote
  #6  
Old 11th July 2012, 11:27 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

Woops! 8TB of usable space. Also i've used 7 of the 8TB so its mostly in use. Most are large 1GB files; so i guess i have fewer, larger files filling the drive.

As for journal, i was running EXT4, but e2fsck originally gave me an error about my superblock has a bad journal and asked if i wanted to remove it. I assumed this reverted it back to EXT2 filesystem until it was rebuilt.

as for the array, i don't think i need to rebuild it because it is running correctly (but degraded by one drive). I could lose a second drive and still be running. I do plan to check that drive and replace it if it is truly bad and reintegrate it back to the array. I was also wondering if i had another drive go bad (running, but with poor performance) and it was reading very slowly causing e2fsck to run slowly. Not sure how I'd check that. smart tools?

I'm running 2GB on this machine (i don't run a GUI) and i have a 500GB hard drive temporarily connected as a swap partition. I've seen other postings about e2fsck taking a few days (none ever say if it ever stopped). I'm just wondering if this is normal and i should let it run.

Thanks!
Reply With Quote
  #7  
Old 12th July 2012, 05:23 AM
stevea's Avatar
stevea Offline
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,300
linuxfirefox
Re: e2fsck on a large volume taking days to run

First - it's not unusual for fsck to take days on really large file systems, sad but true. You can google examples of filesystems much smaller than your size taking ~6 days. Here is a really crude way to estimate e2fsk time, and if you iostat rads are as fast as the posrt AND
http://gparted-forum.surf4.info/viewtopic.php?id=13613

So I might guess yours will take several weeks to complete. Fortunately subsequent can be a lot fastr but ....

I assume you are running fsck on an unmounted volume, unclear from your statements.
fsck can end up in loops, but I think it;s pretty clear that yours will take a lot more than 1 week.


Just an FYI for next time. from mkfs.ext4 ...
Quote:
-G number-of-groups
Specify the number of block groups that will be packed together to create a larger virtual block group
(or "flex_bg group") in an ext4 filesystem. This improves meta-data locality and performance on meta-
data heavy workloads. The number of groups must be a power of 2 and may only be specified if the
flex_bg filesystem feature is enabled.
This would greatly improve several aspects of disk performance.

Given the size of the filesystem and large files you might also consider xfs.
btrfs has a lot of features that could help, but IMO it's too green to trust with your valuable data.

Also you can&should start e2fsck to show progress, or send is SIGUSR1 sas th eman page indicates.
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Reply With Quote
  #8  
Old 12th July 2012, 09:39 AM
george_toolan Offline
Registered User
 
Join Date: Dec 2006
Posts: 1,717
linuxfirefox
Re: e2fsck on a large volume taking days to run

How much swap space is already in use?

Code:
free

cat /proc/swaps
Swapping to a HDD is extremely slow especially if you're using more than 2 GiB.
Reply With Quote
  #9  
Old 12th July 2012, 11:49 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

Quote:
First - it's not unusual for fsck to take days on really large file systems, sad but true.
I appreciate the information. I'm glad to hear that this might be normal... on the other hand, its a mixed blessing because i might have to wait this long for it to complete.

Quote:
How much swap space is already in use?
Unfortunately because i'm in maintenance mode i can't log in from another terminal or SSH in, so i can't run any other commands.

As i said in the previous post, it appears to have locked up. I let it continue running overnight in case it would recover, but it still hasn't. I haven't canceled or rebooted it, but i'm curious what effect this might have. Does fsck fix on the fly or does it write the changes at the end? Do i have a weeks worth of fixes complete or do i start over?

I also read that testdisk and debugfs can extract some files of a bad filesystem. As i said, i can browse the filesystem with testdisk...

thanks for everyone's help!
Reply With Quote
  #10  
Old 12th July 2012, 05:15 PM
Gareth Jones Offline
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,707
linuxfirefox
Re: e2fsck on a large volume taking days to run

I think fsck fixes as it goes along. What makes you think it's hung though?

As for the proper way to proceed, it depends what your priorities are.

Fsck's primary purpose it to get the file-system into a consistent and safe state. Recovering data is a potential side-effect which fsck obviously tries to maximize, but fsck's fixes can also over-write lost data, causing irrecoverable loss.

Programs like testdisk (as I understand it – I've never actually needed to use it) are primarily concerned with recovering data, which crucially means not modifying the file-system in-place.

If you have a back-up, the fastest approach might be to simply run mkfs on the array (after checking the hardware) and restore the files.

If not, you're probably best off using testdisk etc. to retrieve the files, and then recreate the file-system.
Reply With Quote
  #11  
Old 12th July 2012, 05:40 PM
mikee's Avatar
mikee Offline
Registered User
 
Join Date: Aug 2011
Location: Minnesota
Posts: 435
linuxfirefox
Re: e2fsck on a large volume taking days to run

Are the lights on the disk drives flashing at all as to indicate activity?

---------- Post added at 11:40 AM ---------- Previous post was at 11:31 AM ----------

If there is a bad drive in the raid, it could greatly slow the performance. I am not familiar with RAID 6, but RAID in general
stores partity information and if a drive fails, it uses that information to reconstruct the data. Using this will
greatly slow down stuff.

Personal experience. We use ZFS in solaris in a raidz2 config on a satabeast with 14 luns. If a lun (disk) blows, it
takes a full week to reconstruct it from the pairty. This is a 20TB filesystem. The beast has big, slow, disks.
Reply With Quote
  #12  
Old 12th July 2012, 06:50 PM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

EDIT: It was non-responsive because the keyboard got disconnected It's still chugging along...

Hard drive lights are blinking. When the monitor goes to sleep, pressing "CTRL" on teh keyboard would wake it up. That's not the case anymore. After hitting a few keys some weird text came up. Looks like DMESG time stamp: [7 . 7] but with out any numbers except the 7s. (its been running for 700,000 and some odd seconds... 8 days, so that makes sense). Just can't get it to respond.

Last edited by warnockm; 12th July 2012 at 08:33 PM.
Reply With Quote
  #13  
Old 12th July 2012, 09:20 PM
mikee's Avatar
mikee Offline
Registered User
 
Join Date: Aug 2011
Location: Minnesota
Posts: 435
linuxfirefox
Re: e2fsck on a large volume taking days to run

I suspect it could still be checking things then!

It may fix stuff as it goes along, but I believe that if you stop it, it will go through all the disk again.
e2fsck normally doesn't run as the filesystem has a replay log. The log must have corrupted on your power outage.

Has it printed what pass it's on at all?

Last edited by mikee; 12th July 2012 at 09:22 PM. Reason: fix typo
Reply With Quote
  #14  
Old 13th July 2012, 02:26 AM
warnockm Offline
Registered User
 
Join Date: Aug 2004
Posts: 36
windows_7firefox
Re: e2fsck on a large volume taking days to run

Quote:
e2fsck normally doesn't run as the filesystem has a replay log. The log must have corrupted on your power outage.
I'm not sure i followed this. I tried searching on some of the terms but didn't find anything. are you referring to a log file (i.e. in /var/log) or is this some kind of logging in the file system?

i have not seen anything on a what pass it is on. Just lots of "multiply claimed blocks" for certain inodes and files. I can't get a straight answer on what that is or why its happening.

i did notice that in the last 24 hours that there are a lot more of these and they are showing up faster. for instance, earlier this week i would check the status every couple hours and see a few of these messages. Now they come up every few minutes. Just a lot more frequently. I used to see the main file have a normal date that i'd expect (i.e. 2010, 2011) and the other inodes had dates that were outta whack like 2040, 2070, etc. Now both the main the the duplicate have the same date. again, i don't understand this message so i don't know what it means.

Just trying to pass the time as this thing runs thanks for helping me out!
Reply With Quote
  #15  
Old 13th July 2012, 04:29 PM
george_toolan Offline
Registered User
 
Join Date: Dec 2006
Posts: 1,717
linuxfirefox
Re: e2fsck on a large volume taking days to run

He's talking about the journal ;-)

Even your "maintenance mode" has job control, doesn't it?

You should be able to suspend the process by pressing ctrl-z (don't use ctrl-c for obvious reasons) and then you should get your prompt back where you can use programs like free.

To test the reading speed of an individual drive or the whole raid you could use something like

Code:
hdparm -t -T /dev/sdX
If you type bg e2fsck will continue running as a background process like it was started with "e2fsck &" or type fg to keep it running in the foreground.

e2fsck fixes problems as it goes along which could be destructive.

"multiply claimed blocks" means your file system is out of whack. Large files consists of many blocks, but one block should only belong to one file. This problem can easily be solved and your file system put into a "consistent" state, but some of your files will be broken. You should note their file names so you can check them later.

Maybe you should've fixed your raid first. If it was running flawlessly then the file system wouldn't see any errors.
Reply With Quote
Reply

Tags
days, e2fsck, large, mdadm, run, taking, volume

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LVM Mount Physical Volume/Logical Volume without a working Volume Group mpivintis Using Fedora 1 20th November 2010 02:16 PM
how to identify how large the home volume of users? benicio Using Fedora 3 26th August 2010 01:25 AM
e2fsck glennzo Hardware & Laptops 6 13th July 2010 03:48 PM
e2fsck output phpdan Hardware & Laptops 0 28th August 2008 12:27 PM
e2fsck on LVM? ssdowd Using Fedora 1 8th August 2007 03:13 PM


Current GMT-time: 06:52 (Wednesday, 22-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat