Fedora Linux Support Community & Resources Center
  #1  
Old 26th September 2007, 12:35 PM
dr400 Offline
Registered User
 
Join Date: Sep 2007
Posts: 6
FC5 freeze + fsck failed

Hi everybody,

First as you can guess, I'm quite the newbie for linux things.
I'm running FC5 on my PC, with two HD :
- one small with different partitions containing everything related to my OSs (currently FC5 et WinXP)
- one bigger for the data, with one partition for WinXP and one ext3 partition mounted as /home for my linux.

Recently FC5 underwent a total freeze while the only program running was Firefox (downloading) :
- gnome GUI utterly frozen (no keyboard, no mouse, etc)
- tried to kill mozilla with a non-graphic shell (as root) -> no result.
- tried to 'shutdown now' (as root) -> no result
I felt all that was left for me to do was hard reboot (wrong ).

Unfortunately (but understandably) during linux initialization, "/home corrupted file system" was found. It dropped me down to a shell (as root) telling me to run fsck... which I did ("fsck -v /dev/sdb2" in my case).
- fsck.e2fsck version 1.38
- fsck started asking some stuff about fixing corrupted and/or orphan inodes, I answered 'yes' every time.
- after a few fixes I got a whole screen covered with strange output, which I can only describe as "insults" (now that's professional ). Things like "irq sequences..." "unknown boot...".
- all activity ceased afterwards (no disk access, no going back to the shell, nothing else happens for 10 min).
- also one 'funny' thing was that all the leds on my keyboard (caplocks, etc) were flashing in this state. I found another post with this behaviour but no explanation :
http://www.fedoraforum.org/forum/sho...highlight=fsck
Anyone knows more about this ?

Not knowing what to do I hard-rebooted once again. And got to fsck, which worked this time around.
My PC worked fine this week end, then yesterday the same pb happened again (mozilla freeze -> fsck failed)... only this time fsck won't miraculously succeed anymore still giving me this strange output / keyboard flashes.

I'm begining to suspect a virus or more likely a dying hard disk... I did'nt have much time yesterday, but this evening I'm thinking about checking /var/log/messages and using the rescue disk.
I'm open to any good suggestion (keep in mind I'm a noob, so please forgive me if I missed some obvious thing).

thanks.
Reply With Quote
  #2  
Old 26th September 2007, 12:51 PM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
Quote:
Originally Posted by dr400
Hi everybody,

First as you can guess, I'm quite the newbie for linux things.
I'm running FC5 on my PC, with two HD :
- one small with different partitions containing everything related to my OSs (currently FC5 et WinXP)
- one bigger for the data, with one partition for WinXP and one ext3 partition mounted as /home for my linux.

Recently FC5 underwent a total freeze while the only program running was Firefox (downloading) :
- gnome GUI utterly frozen (no keyboard, no mouse, etc)
- tried to kill mozilla with a non-graphic shell (as root) -> no result.
- tried to 'shutdown now' (as root) -> no result
I felt all that was left for me to do was hard reboot (wrong ).

Unfortunately (but understandably) during linux initialization, "/home corrupted file system" was found. It dropped me down to a shell (as root) telling me to run fsck... which I did ("fsck -v /dev/sdb2" in my case).
- fsck.e2fsck version 1.38
- fsck started asking some stuff about fixing corrupted and/or orphan inodes, I answered 'yes' every time.
- after a few fixes I got a whole screen covered with strange output, which I can only describe as "insults" (now that's professional ). Things like "irq sequences..." "unknown boot...".
- all activity ceased afterwards (no disk access, no going back to the shell, nothing else happens for 10 min).
- also one 'funny' thing was that all the leds on my keyboard (caplocks, etc) were flashing in this state. I found another post with this behaviour but no explanation :
http://www.fedoraforum.org/forum/sho...highlight=fsck
Anyone knows more about this ?

Not knowing what to do I hard-rebooted once again. And got to fsck, which worked this time around.
My PC worked fine this week end, then yesterday the same pb happened again (mozilla freeze -> fsck failed)... only this time fsck won't miraculously succeed anymore still giving me this strange output / keyboard flashes.
You might want to check for 'Drive...' entries in /var/log/messages. Chances are something is (physically) going wrong with your harddrive.

I had a similar situation in a system with 3 hard drives. Apparently, the power supply was insufficient for the power demands. At nights, one of the hard drives would spin down due to insufficient power. When it spinned down, it would consume less power and with the extra power available since the drive spinned down, it would spin back up only to spin down later on and the cycle would repeat itself ad infinitum/nauseum. Eventually the disk was unrecoverable.

Thus, check the capacity of your system's power supply. It also seems time to use the 'smartctl' utilities. Something like 'smartctl -a /dev/sda' (replace /dev/sda with /dev/hda for fc5) and look for attributes like 'Raw_Read_Error_Rate' and 'Seek_Error_Rate'.

David
Reply With Quote
  #3  
Old 26th September 2007, 07:04 PM
dr400 Offline
Registered User
 
Join Date: Sep 2007
Posts: 6
Hi,

first thx for your answer.
I checked /var/lg/messages and found no Drive error.
I tried smartctl, it recognized my HDD allright (option --a), and found the device was working properly (option --health -> ok). Except it says the device doesn't support auto-save when I try to read error logs (can't enable auto-save either).

So, where am I now ?
- checked connections / power supply : everything looks OK. No scratching noise from the HDD like it did with the last dead HDD I encountered )
- smartctl seems to find everything's OK.
- fsck still freeze after some fixes. I got a snapshot of this, I'll try to post it.
- my keyboard still flash funkily while fsck is frozen...

I still need to check samsung site for test program for HDD, and I'll check BIOS also, just in to be sure... After that rescue mode ?

Any other idea anyone ?

thanks
Reply With Quote
  #4  
Old 27th September 2007, 07:50 AM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
Quote:
Originally Posted by dr400
Hi,

first thx for your answer.
I checked /var/lg/messages and found no Drive error.
Is there something that suggests a kernel panic?

Quote:
Originally Posted by dr400
I tried smartctl, it recognized my HDD allright (option --a), and found the device was working properly (option --health -> ok). Except it says the device doesn't support auto-save when I try to read error logs (can't enable auto-save either).
That's fine. auto-save is apparently a feature your drive doesn't support.

Quote:
Originally Posted by dr400
So, where am I now ?
- checked connections / power supply : everything looks OK. No scratching noise from the HDD like it did with the last dead HDD I encountered )
Any rating on the power supply's capacity? 250W? 450W?

Quote:
Originally Posted by dr400
- smartctl seems to find everything's OK.
- fsck still freeze after some fixes. I got a snapshot of this, I'll try to post it.
- my keyboard still flash funkily while fsck is frozen...
The keyboard LEDs flashing is normally an indicator of a kernel panic. If the disk doesn't seem to be faulty, then it could be a faulty memory module. I assume you haven't overclocked your system or whatsoever. Normally, I'd try a kernel compile and see if it gets through without any segmentation violations. If there are segfaults, then I return my memory modules to the shop.

I believe the Fedora CDs/DVDs include a memtest86 utility? You could try that, although it doesn't tax the rest of your system while testing the memory. Checking the BIOS may reveal motherboard and CPU temperature and CPU fan speed. You could also try 'lmsensors' from Linux.

Good luck,

David
Reply With Quote
  #5  
Old 27th September 2007, 08:26 AM
dr400 Offline
Registered User
 
Join Date: Sep 2007
Posts: 6
Quote:
Is there something that suggests a kernel panic?
actually, I usually run my PC using hyperthread capacity + kernel smp (never really considered this issue before). yesterday I tried disabling hyperthread + not-smp kernel : at the end of the failure screen for fsck 2, more lines appeared showing a kernel panic (interruption failure).
of course, as I didn't tried this when my computer was still working, it doesn't lead me anywhere , but that's all I ever saw about a kernel panic on my installation

Quote:
Any rating on the power supply's capacity? 250W? 450W?
Fortron FSP400-60GLN - 400W , for 1 mother board + 2HDD + 2DVD + 1Floppy...

Quote:
The keyboard LEDs flashing is normally an indicator of a kernel panic.
Ah, thanks. At least I will have learned something

Quote:
Normally, I'd try a kernel compile and see if it gets through without any segmentation violations.
I believe the Fedora CDs/DVDs include a memtest86 utility? You could try that, although it doesn't tax the rest of your system while testing the memory.
Allright I'll check to investigate in the direction of faulty memory module. Maybe I could also check different settings / ports for the memory mods.
I never compiled the kernel before, now is a good time to learn how (but can I do this in runlevel 1 or rescue shell ?) The thing is, it'll be hard to know wether the fault comes from my wrong doing or a faulty memory

Quote:
Checking the BIOS may reveal motherboard and CPU temperature and CPU fan speed. You could also try 'lmsensors' from Linux.
I already ckecked this when I looked through the BIOS settings yesterday. Mother Board 36deg, CPU 45deg, Fan 1200~1600rpm. Didn't seemed wrong to me...

Quote:
Good luck,

David
thanks again for your help.
Reply With Quote
  #6  
Old 27th September 2007, 10:23 AM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
Quote:
Originally Posted by dr400
actually, I usually run my PC using hyperthread capacity + kernel smp (never really considered this issue before). yesterday I tried disabling hyperthread + not-smp kernel : at the end of the failure screen for fsck 2, more lines appeared showing a kernel panic (interruption failure).
of course, as I didn't tried this when my computer was still working, it doesn't lead me anywhere , but that's all I ever saw about a kernel panic on my installation
I assume you're back to the hyperthread+smp kernel?

Quote:
Originally Posted by dr400

Fortron FSP400-60GLN - 400W , for 1 mother board + 2HDD + 2DVD + 1Floppy...
Seems sufficient.
Quote:
Originally Posted by dr400

Ah, thanks. At least I will have learned something

Allright I'll check to investigate in the direction of faulty memory module. Maybe I could also check different settings / ports for the memory mods.
I never compiled the kernel before, now is a good time to learn how (but can I do this in runlevel 1 or rescue shell ?) The thing is, it'll be hard to know wether the fault comes from my wrong doing or a faulty memory
I'd first try memtest. I think it's an option when you boot the Fedora install/rescue CD/DVD.

Quote:
Originally Posted by dr400
I already ckecked this when I looked through the BIOS settings yesterday. Mother Board 36deg, CPU 45deg, Fan 1200~1600rpm. Didn't seemed wrong to me...
Seems fine.

BTW, did you run smartctl on the faulty drive? 'smartctl -a /dev/sda' tests your first drive, 'smartctl -a /dev/sdb' tests the 2nd (apparently faulty) drive. You should get some attributes, such as aforementioned 'Raw_Read_Error_Rate' etc.

David
Reply With Quote
  #7  
Old 27th September 2007, 11:57 AM
dr400 Offline
Registered User
 
Join Date: Sep 2007
Posts: 6
Quote:
I assume you're back to the hyperthread+smp kernel?
yup. This configuration worked for more than a year, and I never ran the not-smp kernel. So I think it would be best to keep with the smp, which at least I know worked at some time.
My only worry was that maybe running fsck from the rescue shell was somewhat incompatible with hyperthread-smp

Quote:
I'd first try memtest. I think it's an option when you boot the Fedora install/rescue CD/DVD.
I found a version for a bootable floppy disk, I think I'll try this. It should enable testing with all peripherals unplugged (except floppy) to narrow the error on the memory mods. As I have 2x512 Ko modules, I'll also try different configs.
I can't wait to do all these funny testing tonight

Quote:
BTW, did you run smartctl on the faulty drive? 'smartctl -a /dev/sda' tests your first drive, 'smartctl -a /dev/sdb' tests the 2nd (apparently faulty) drive. You should get some attributes, such as aforementioned 'Raw_Read_Error_Rate' etc.
I ran smartctl on the drive.
- it gave me a perfect ID for the drive (manufacturer, model, etc...)
- it told me the drive was "healthy" (smartcl -h, if I remember correctly).
- it told me the drive doesn't support error logging, tried to activate it with 'smartctl -S on' as it suggested, but failed (not supported).
- when I tried to ask for '--attributes' , I think it said the same as above (some 'feature not supported' message). I'll check this to be sure tonight as I don't remember whether I tried this option alone, or with '--log'.

the thing is, apart from file system being corrupted (which can be explained by hard-reboot during download) and fsck crashes (which I can't explain as of now), I can't seem to find anything wrong with my HDD (not hot, no noise, no problem with smartctl, no "drive ..." in /var/log/messages).
So I guess after a detailed look into BIOS settings (which didn't change for the last year), and the memory mods tests, I'll :
- try to format the HDD and see if the problem is (permanently) solved. But then I'll loose all my (not-so-precious-after-all) datas. (Let's be realistic : after crashing fsck on this drive a dozen of times, I don't think I'll get my datas back anyway )
- or buy a new one... A good way to justify buying a new HDD twice as big as the previous one

dr400
Reply With Quote
  #8  
Old 27th September 2007, 03:41 PM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
Quote:
Originally Posted by dr400
yup. This configuration worked for more than a year, and I never ran the not-smp kernel. So I think it would be best to keep with the smp, which at least I know worked at some time.
My only worry was that maybe running fsck from the rescue shell was somewhat incompatible with hyperthread-smp



I found a version for a bootable floppy disk,
This floppy won't contain the memtest (iirc). It's just to boot a system that can't boot from HD, CD/DVD or network. You'll still be prompted for the rescue/install CD/DVD, so if you can boot from CD/DVD, then you might as well skip the floppy.

Quote:
Originally Posted by dr400
I think I'll try this. It should enable testing with all peripherals unplugged (except floppy) to narrow the error on the memory mods. As I have 2x512 Ko modules, I'll also try different configs.
I can't wait to do all these funny testing tonight



I ran smartctl on the drive.
- it gave me a perfect ID for the drive (manufacturer, model, etc...)
- it told me the drive was "healthy" (smartcl -h, if I remember correctly).
- it told me the drive doesn't support error logging, tried to activate it with 'smartctl -S on' as it suggested, but failed (not supported).
- when I tried to ask for '--attributes' , I think it said the same as above (some 'feature not supported' message). I'll check this to be sure tonight as I don't remember whether I tried this option alone, or with '--log'.
Doesn't smartctl (-a) give you values like this:

Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   253   252   000    Old_age   Always       -
       3
  3 Spin_Up_Time            0x0027   226   226   063    Pre-fail  Always       -
       8598
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -
       278
  5 Reallocated_Sector_Ct   0x0033   251   251   063    Pre-fail  Always       -
       6
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -
       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -
       0
  8 Seek_Time_Performance   0x0027   247   236   187    Pre-fail  Always       -
...
Then, you may want to perform a longer (surface) test:

Code:
smartctl --test=long /dev/sdb
Quote:
Originally Posted by dr400
the thing is, apart from file system being corrupted (which can be explained by hard-reboot during download) and fsck crashes (which I can't explain as of now), I can't seem to find anything wrong with my HDD (not hot, no noise, no problem with smartctl, no "drive ..." in /var/log/messages).
So I guess after a detailed look into BIOS settings (which didn't change for the last year), and the memory mods tests, I'll :
- try to format the HDD and see if the problem is (permanently) solved. But then I'll loose all my (not-so-precious-after-all) datas. (Let's be realistic : after crashing fsck on this drive a dozen of times, I don't think I'll get my datas back anyway )
A lot of data may have been recovered in the 'lost+found' directory at the mount point of the drive/partition.

Quote:
Originally Posted by dr400
- or buy a new one... A good way to justify buying a new HDD twice as big as the previous one

dr400
The filesystem could be severely corrupted. But it's strange that fsck would stall/fail suggesting something else (possible hardware failure) taking place.

Anyway, yet again; good luck,

David
Reply With Quote
  #9  
Old 27th September 2007, 08:01 PM
dr400 Offline
Registered User
 
Join Date: Sep 2007
Posts: 6
Quote:
This floppy won't contain the memtest (iirc). It's just to boot a system that can't boot from HD, CD/DVD or network. You'll still be prompted for the rescue/install CD/DVD, so if you can boot from CD/DVD, then you might as well skip the floppy.
Apparently I did'nt explain myself very well
I was talking about a bootable floppy launching memtest86 without the need for any OS, the most simple / light configuration IMO, which I found here :
http://www.memtest.org/download/1.70....70.floppy.zip

Quote:
Doesn't smartctl (-a) give you values like this:
[...]
Then, you may want to perform a longer (surface) test
In fact smartctl says the device doesn't support smart (I checked it was enabled in the BIOS by the way).
The test finish without any display...

Quote:
The filesystem could be severely corrupted. But it's strange that fsck would stall/fail suggesting something else (possible hardware failure) taking place.
I think I may have found the problem
I ran testmem and to make it short one of the two memory modules was found with 4000+ errors, on both DDR ports, while the other one succeeded through a few passes of test each time.
I then remembered to enable full-boot test in BIOS The second module never gave the same memory size (100 or 200 instead of 448 Mo), plus half of the time the BIOS displayed a R/W error of the memory.

I'm now running FC5 on the PC with only the good module. At least fsck was able to finish normally, I already restarted a few time and encountered no problems... I think I'll try to burn a few DVDs now
I'll see if I can test the faulty module on another mother board, but I guess it will end up with me buying a new one (a lesser harm). I just hope it was an 'accident' and not the consequence of a wrong position of the modules on the DDR port (I double-checked in the manual the first time but...)

Quote:
A lot of data may have been recovered in the 'lost+found' directory at the mount point of the drive/partition.
this directory looks like this now :
Code:
total 66644
-rw------- 1 auclair auclair 69419008 Sep 20 18:05 #22686051
I don't know what I can/should do with this

In any case now the lesson is learned : afterwards I think the symptoms pointed at a memory fault (freeze of FC5, crash of fsck with segfault-like messages) so maybe I could have enabled the memory test in the BIOS before hard-rebooting a dozen times...

Well, David let me thank you again for your help and your good advice on this matter
You taught me a few useful things on the way.
I hope this time the problem is solved

dr400
Reply With Quote
  #10  
Old 27th September 2007, 09:51 PM
David Becker Offline
Registered User
 
Join Date: Feb 2006
Posts: 780
Quote:
Originally Posted by dr400
Apparently I did'nt explain myself very well
I was talking about a bootable floppy launching memtest86 without the need for any OS, the most simple / light configuration IMO, which I found here :
http://www.memtest.org/download/1.70....70.floppy.zip
Alright
Quote:
Originally Posted by dr400
I think I may have found the problem
I ran testmem and to make it short one of the two memory modules was found with 4000+ errors, on both DDR ports, while the other one succeeded through a few passes of test each time.
I then remembered to enable full-boot test in BIOS The second module never gave the same memory size (100 or 200 instead of 448 Mo), plus half of the time the BIOS displayed a R/W error of the memory.

I'm now running FC5 on the PC with only the good module.
Less is more.
Quote:
Originally Posted by dr400
At least fsck was able to finish normally, I already restarted a few time and encountered no problems... I think I'll try to burn a few DVDs now
That's great! What a relief.
Quote:
Originally Posted by dr400

I'll see if I can test the faulty module on another mother board, but I guess it will end up with me buying a new one (a lesser harm). I just hope it was an 'accident' and not the consequence of a wrong position of the modules on the DDR port (I double-checked in the manual the first time but...)



this directory looks like this now :
Code:
total 66644
-rw------- 1 auclair auclair 69419008 Sep 20 18:05 #22686051
I don't know what I can/should do with this
Examine the file contents to see whether it's something worth saving or salvaging.
Quote:
Originally Posted by dr400
In any case now the lesson is learned : afterwards I think the symptoms pointed at a memory fault (freeze of FC5, crash of fsck with segfault-like messages) so maybe I could have enabled the memory test in the BIOS before hard-rebooting a dozen times...
Doesn't necessarily safe guard for these situations, maybe time to read http://www.bitwizard.nl/sig11/

Quote:
Originally Posted by dr400
Well, David let me thank you again for your help and your good advice on this matter
You taught me a few useful things on the way.
I hope this time the problem is solved

dr400
Fingers crossed.

David
Reply With Quote
Reply

Tags
failed, fc5, freeze, fsck

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to run fsck? SharpyWarpy Using Fedora 9 24th June 2009 03:06 AM
Upgrade failed and does not boot (fsck.jfs fails?) elefant23 Installation and Live Media 0 19th December 2008 04:58 AM


Current GMT-time: 18:14 (Wednesday, 22-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat