PDA

View Full Version : Fedora 10 freezing



arnott
2nd March 2009, 04:27 AM
Hi all,
My computer keeps freezing up. It seems to have gotten better after I changed (http://forums.fedoraforum.org/showthread.php?t=54240&page=91) the nvidia driver for graphics. It used to freeze when I would right click on firefox. Now it seems better.

But it again froze few minutes ago. I had to reboot. I checked files in /var/log and did not find anything. Should I be looking for something specific in any log file ?

# uname -a
Linux pinehaven 2.6.27.15-170.2.24.fc10.i686 #1 SMP Wed Feb 11 23:58:12 EST 2009 i686 athlon i386 GNU/Linux

My computer : http://h10025.www1.hp.com/ewfrf/wc/document?lc=en&cc=us&docname=c00068941&dlc=en. I have upgraded memory to 1GB.

Could it be something wrong with my harddrive ?

Thanks
Arnott

Hlingler
2nd March 2009, 04:33 AM
Desktop Environment ? I-Kandi/Compositing/Effects enabled ?? Graphics hardware and specs (OK, I see the specs - all updates applied?) ???

If you are concerned about hardware problems, there are a number of utilities to check/test. For HDDs, that would be package smartmontools, executable=smartctl.

V

arnott
2nd March 2009, 05:01 AM
Desktop environment : Gnome
I had compiz turned on the last time when it froze.

I have installed all updates.

Output of 2 minute test:

smartctl -l selftest /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 253 263258973

what should I do ? Should I run the long test too ?

Thanks
Arnott

Hlingler
2nd March 2009, 05:05 AM
Uh-Oh!!!
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 253 263258973
what should I do ? Should I run the long test too ?Yes, but... I would expect similar results.

I strongly suggest that you back up all valuable data NOW.

V

P.S. Look also for HDD errors logged to /var/log/messages. Try sort for '/dev'.

P.P.S. Disabling desktop effects/Compiz[-Fusion] may help also while you troubleshoot/salvage.

Here's an example "healthy/pass" results:
~]$ sudo /usr/sbin/smartctl -l selftest /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 16507 -
# 2 Short offline Completed without error 00% 11548 -
# 3 Short offline Completed without error 00% 6093 -
# 4 Short offline Completed without error 00% 5098 -
# 5 Extended offline Completed without error 00% 4686 -
# 6 Extended offline Completed without error 00% 3889 -
# 7 Short offline Completed without error 00% 3643 -
# 8 Short offline Completed without error 00% 2522 -
# 9 Extended offline Completed without error 00% 2020 -
#10 Short offline Completed without error 00% 2002 -
#11 Short offline Completed without error 00% 376 -
#12 Short offline Completed without error 00% 0 -

arnott
2nd March 2009, 05:30 AM
Thanks for your help. I am backing up the data now. I did not have compiz running when I ran smartctl.

I am running the long test too.

JonathanR
2nd March 2009, 07:13 AM
Could be bad ram. Do a memtest.

arnott
3rd March 2009, 04:43 AM
smartctl -l selftest /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Output of long test. Computer crashed in the middle of backup.

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 267 263258973
# 2 Extended offline Completed: read failure 90% 253 263258973
# 3 Short offline Completed: read failure 90% 253 263258973

From /var/log/messages:

Mar 1 23:38:38 localhost kernel: res 51/40:20:5d:03:b1/00:00:00:00:00/ef Emask 0x9 (media error)
Mar 1 23:38:38 localhost kernel: ata1.00: status: { DRDY ERR }
Mar 1 23:38:38 localhost kernel: ata1.00: error: { UNC }
Mar 1 23:38:38 localhost kernel: ata1.00: configured for UDMA/100
Mar 1 23:38:38 localhost kernel: ata1: EH complete

Does it mean anything ?

Hlingler
3rd March 2009, 08:20 AM
Does it mean anything ?Yes: the long tests completed, and found failure at the exact same Logical Block Address (LBA) as the short test - all at LBA=263258973. I do not know the significance of the log messages, but "kernel: ata1.00: error: { UNC }" does not look good. Neither does "kernel: ata1.00: status: { DRDY ERR }". Google those blurbs. Or is ata1.00 the CD/DVD drive ? /dev/sda should be ata0.00 IIRC.

The HDD has only 267 hours shown as Life Time - still under warranty ?? If so, you could try returning it as defective, but that won't save your data. Try backing up smaller chunks, you may be successful, and may even find the location of the bad sector.

I don't see how or why a memory test will alter these facts: looks clearly like a bad HDD to me, based on results of S.M.A.R.T. tests.

V

arnott
4th March 2009, 04:44 AM
Ran memtest did 3 passes. did not find any errors.

Copying files to external hard disc.

The HDD came with the computer, more than 5 years old. Have switched on SMART daemons too.

If there are bad sectors in the HDD, can I ask the OS to ignore those ?

I have Windows XP in dual partition and its not complaining. So I need to replace the HDD completely ?

Should I use ddrescue ?

Thanks
arnott

Hlingler
4th March 2009, 04:59 AM
IIRC, you can flag sectors as "bad" - in fact, I thought that this was automatic, but since I've never experienced a HDD failure of any kind with Linux, I don't know for sure how/what/where. However, such errors are usually regarded as signs of impeding catastrophic failure.

Win* may not see errors on partitions that it doesn't use/mount (like your Fedora partitions).

V