PDA

View Full Version : Palimpsest Disk Utility (SMART errors)


ACiD GRiM
14th April 2009, 07:27 PM
Is it just me or did yesterday's update later in the day (US MTN Time) for the SMART give more people an error? I have a 13 day old laptop, and the smart palimpest is saying i have 52 bad sectors all of the sudden. It only appeared in my notification tray after i restarted. Should I be worried?

cheesie
14th April 2009, 09:10 PM
Same message

ACiD GRiM
14th April 2009, 10:07 PM

Ok Good, how "same" is it? Does it say 52 bad sectors, or some other number (I have a 320GB HDD)

PeTzZz
14th April 2009, 10:45 PM
I have one failing that needs replacing soon, it seems. The oldest one is still quite OK. Screenshots (120, 160 and 500 GB):
http://img140.imageshack.us/img140/6759/124gbataic35l120avv2071.th.png (http://img140.imageshack.us/my.php?image=124gbataic35l120avv2071.png) http://img530.imageshack.us/img530/5742/160gbatast3160023asmart.th.png (http://img530.imageshack.us/my.php?image=160gbatast3160023asmart.png) http://img22.imageshack.us/img22/6272/500gbatawdcwd5000aaks7s.th.png (http://img22.imageshack.us/my.php?image=500gbatawdcwd5000aaks7s.png)

http://fedoraproject.org/wiki/Features/DeviceKit
How To Test
...
5. Verify that palimpsest correctly reports smart data from disks which support it.

I don't have any experience with the SMART system, so I don't have any suggestions how to test the correctness of the reports.

ACiD GRiM
15th April 2009, 12:42 AM
I'm going to check the smart result on rescuecdlinux (gentoo) to compare when I get home.

glennzo
15th April 2009, 11:15 AM
Ladies and Gentlemen. Trouble in paradise this morning. I booted Fedora 11 Beta on my laptop and saw a strange (to me at least) icon in the top panel. Hovering the mouse pointer over the icon got me a tooltip that said One or more disks are failing. I'm thinking that this can't be good but this is Beta so I'm a little sceptical. I'll be booting Fedora 10 and Vista to see what happens there. In the meantime here's 3 screen shots of the Palimpsest Disk Utility.
http://img147.imageshack.us/img147/8356/screenshotharddisk1.th.png (http://img147.imageshack.us/my.php?image=screenshotharddisk1.png)
http://img147.imageshack.us/img147/8775/screenshotharddisk2.th.png (http://img147.imageshack.us/my.php?image=screenshotharddisk2.png)
http://img301.imageshack.us/img301/9241/screenshotharddisk3.th.png (http://img301.imageshack.us/my.php?image=screenshotharddisk3.png)
I'll post back with results from F10 and Vista.

glennzo
15th April 2009, 11:53 AM
No one has any idea? Not to bump the thread but just to update, I've since booted to Fedora 10. No notifications. No Palimpsest Disk Utility either. Then I booted to Windows Vista (Home Premium). No errors there. I imagine that this may have to do with new software for F11 (I assume that the disk utility is new) that has a few glitches. Time will tell.

Hlingler
15th April 2009, 11:57 AM
Hi Glenn:

You can run smartctl manually in F10 to verify:
yum install smartmontools
smartctl -a /dev/sd[a-z]

Test:
smartctl -t short[long,offline] /dev/sd[a-z]

Insert correct HDD designator (entire disk only, not partition), looks like this one is sda.

V

glennzo
15th April 2009, 12:25 PM
Just to update, and to be fair to the Fedora developers I guess, the Palimpsest Disk utility is part of the package gnome-disk-utility. It just wasn't installed on the Fedora 10 system. It is now.
sudo yum install gnome-disk-utility

@Vince
Yes, there is only one disk in the laptop.
[glenn@coolhand ~]$ sudo smartctl -t short /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Wed Apr 15 07:20:31 2009

Use smartctl -X to abort test.
[glenn@coolhand ~]$

Hlingler
15th April 2009, 12:30 PM
Wait at least 2 minutes as instructed, then check results:
sudo smartctl -l error /dev/sda

V

glennzo
15th April 2009, 12:36 PM
[root@coolhand ~]# smartctl -a /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA MK1637GSX
Serial Number: 272SF17WS
Firmware Version: DL030M
User Capacity: 160,041,885,696 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Apr 15 07:33:59 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 92) minutes.
SCT capabilities: (0x0001) SCT Status supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 1691
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2002
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 27
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 9263
10 Spin_Retry_Count 0x0033 139 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1873
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 213
193 Load_Cycle_Count 0x0032 026 026 000 Old_age Always - 746120
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 43 (Lifetime Min/Max 14/55)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 27
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 154
222 Loaded_Hours 0x0032 083 083 000 Old_age Always - 7027
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 322
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 9263 -
# 2 Short offline Aborted by host 40% 9263 -
# 3 Short offline Completed without error 00% 6016 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Hlingler
15th April 2009, 12:39 PM
Same results:
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 27
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 27

Whether or not this indicates impending HDD failure is the question. I am not sure.... It certainly appears to show 27 bad sectors.

V

P.S. Try a long test (92 minutes), but I would expect identical results.

glennzo
15th April 2009, 12:40 PM
Wait at least 2 minutes as instructed, then check results:
sudo smartctl -l error /dev/sda

V
Thank you Vince. You're too fast. I was looking at documentation on the home page for smartmontools.

[glenn@coolhand ~]$ sudo smartctl -l error /dev/sda
Password:
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

This looks good to me. I logged in to F11 this morning (after just a sniff of coffee) to this error. Thought it was going to be a long day which might have included a visit to NewEgg or TigerDirect, credit card in hand :eek:

glennzo
15th April 2009, 12:43 PM
Same results:
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 27
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 27

Whether or not this indicates impending HDD failure is the question. I am not sure.... It certainly appears to show 27 bad sectors.

V
Beat me again :) Maybe it doesn't look so good? I should probably back some stuff up then. Never know what will happen at the least convenient time. of course, since I don't want the disk to be bad all I saw was
SMART Error Log Version: 1
No Errors Logged

Hlingler
15th April 2009, 12:45 PM
Need someone more knowledgeable to state what the results mean. smartctl seems to think that the drive is performing acceptably. I'm uncertain.

V

P.S. Yeah, the short test simply ignored/by-passed the marked bad sectors.~]$ sudo /usr/sbin/smartctl -a /dev/sda
[...]
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
[...]
~]$ sudo /usr/sbin/smartctl -a /dev/sdb
[...]
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
[...]

glennzo
15th April 2009, 01:04 PM
I certainly appreciate your help Vince. I'll continue looking at documentation and see if there is anything of use there. Meanwhile, I' ll keep using the lappy as is nothing is wrong and keep the links to NewEgg and TigerDirect handy ;) If a new drive is warranted then maybe I can upgrade this 160GB to something like 500GB for relatively short money.

Edit: Well, we're not buying yet, but TigerDirect has drives for as little as $110 for 500GB, Western Digital (which I've sworn off long ago), Seagate 500GB for $99 or Hitachi 500GB for $99.

Hlingler
15th April 2009, 01:30 PM
I have three Western Digital Caviar paperweights sitting on my desk.... :rolleyes:

V

glennzo
15th April 2009, 01:33 PM
I too have a pile of WD paperweights / doorstops. I used to use nothing but, but I ran into a string of drives that failed and couldn't really afford to replace them at the time. Tends to make on wary of the brand name. I've since switched to Seagate and so far no troubles.

Hlingler
15th April 2009, 01:41 PM
A little info here on the attributes in question: http://smartmontools.sourceforge.net/faq.html

V

EDIT: Ooh! Here's an RPM of a nifty GUI for smartmontools: http://download.opensuse.org/repositories/home:/alex_sh/Fedora_10/

Very nice, gives much additional info on attributes: http://www.geocities.com/hlingler/gsmartcontrol-1.png

glennzo
15th April 2009, 02:12 PM
Vince, I downloaded and installed the GUI from the Suse repos. Nice tool. Instead of me posting more images here check this link, http://johnson.homelinux.net/mywiki/Hard_Disk_Is_Failing. I posted them there are the bottom of the page. The last 2 images.

Hlingler
15th April 2009, 02:21 PM
Hmm... according to the tooltip on gsmartcontrol, this could simply mean that operations are pending....

I'd still like to hear from an "expert". That would not be me.

V

glennzo
15th April 2009, 02:31 PM
Hopefully someone will see the thread and jump in. Thanks a million for all your interest :)

cheesie
16th April 2009, 06:38 PM
Here are my screenshots:

Screenshot 1 (http://www.dietenrain.com/Diverses/Screenshot1.png)

Screenshot 2 (http://www.dietenrain.com/Diverses/Screenshot2.png)

Screenshot 3 (http://www.dietenrain.com/Diverses/Screenshot3.png)

ACiD GRiM
16th April 2009, 10:25 PM
lol it looks like you need a new hard drive! 65 thousand bad sectors!!! I'm burning RescueCD Linux right now to check against a more reliable smart reader.

hephasteus
16th April 2009, 11:48 PM
I have one failing that needs replacing soon, it seems. The oldest one is still quite OK. Screenshots (120, 160 and 500 GB):

http://fedoraproject.org/wiki/Features/DeviceKit

I don't have any experience with the SMART system, so I don't have any suggestions how to test the correctness of the reports.

A good way to check it is to boot windows and download everest utility. The smart reporting from it is first rate.

ACiD GRiM
16th April 2009, 11:57 PM
It seems that somehow fedora 11 nerfed the smart stats, because even gentoo says the same thing. I'm running a extended scan to see if it will clear up some errors

hephasteus
17th April 2009, 05:01 AM
It seems that somehow fedora 11 nerfed the smart stats, because even gentoo says the same thing. I'm running a extended scan to see if it will clear up some errors

There has to be some way to clear it. Did you do power off and restart before switching or did you carry the junk smart info with ya through reboot?

The only thing I believe from your pics is the hard drive temps. The 64k sector count just happens to be max for drive. The 6 second spinup just happens to be the point where a drive resets itself and tries to achieve its stable operating RPM. etc. So somehow wrong numbers are getting reported as the limits to values because well that's what happens when wrong numbers with limits get reported by computers. LOL

ACiD GRiM
17th April 2009, 05:08 AM
I just ran an extended check and nothing changed. :(

gcell
17th April 2009, 06:32 AM
I do have the same problem too,just after the updates,i got messages :one or more disks are failing.
Detail infos : attribute "Reallocate sector count" and "Reallocation Count" were failed.
Was this a bug or a hardware problem?

ACiD GRiM
17th April 2009, 06:36 AM
I think It's a bug that CAUSED a hardware problem. I think We're going to have to get new drives, b/c I have a 2nd opinion from Gentoo telling me the exact same numbers, even after an offline check. I'm running fsck right to now to see if it will confirm or deny SMART. If It confirm's it, I don't know. But If it comes back saying that there's nothing wrong sector wise, F11 screwed SMART, which I didn't think was possible.

gcell
17th April 2009, 07:00 AM
Here are two associated bug-report:
https://bugzilla.redhat.com/show_bug.cgi?id=496153
https://bugzilla.redhat.com/show_bug.cgi?id=496087

glennzo
21st April 2009, 09:59 AM
OK, I posted a question last week about how under Fedora 11 the Palimpsest disk utility kept informing me that my laptop hard disk was failing. See this thread, http://forums.fedoraforum.org/showthread.php?t=219823. I'm now wondering if maybe this is a bug in the Palimpsest program itself as I get absolutely no notifications if I boot this laptop to Fedora 10, Windows Vista or Windows 7 Beta. I'm off to bugzilla land to look at what's going on there regarding palimpsest. I guess I'm just wondering if anyone else running F11 beta has seen this warning from this application?

RahulSundaram
21st April 2009, 10:18 AM
Hi,

Yep. I have seen it and not sure if it is bogus. Previous versions of Fedora has no integration with the desktop and smart daemon so it is not surprising that you didn't get any notifications

glennzo
21st April 2009, 10:40 AM
Hi Rahul. Thanks for your time. Just not sure if I should take this seriously or consider it a bug in the package for F11 beta.

glennzo
21st April 2009, 10:59 AM
I'm thinking that this is a bug in Palimpsest and that my disk is not actually failing.
https://bugzilla.redhat.com/show_bug.cgi?id=495956

gcell
21st April 2009, 11:49 AM
I have the same problem , I really hope someone could fix it,Thanks

glennzo
21st April 2009, 11:56 AM
Hi gcell. If it is indeed a bug then I'm sure it will be fixed in due time. In the interim, do what I did. Backup critical data just in case disaster strikes.

gcell
21st April 2009, 12:06 PM
Hi gcell. If it is indeed a bug then I'm sure it will be fixed in due time. In the interim, do what I did. Backup critical data just in case disaster strikes.
libatasmart's bug shouldn't cause disaster strikes, I don't think the backup is necessary ……

Demz
21st April 2009, 12:07 PM
best to backup data than be sorry :)

glennzo
21st April 2009, 12:23 PM
libatasmart's bug shouldn't cause disaster strikes, I don't think the backup is necessary ……
My point is that Palimpsest is telling me that the disk is failing. I'm fairly sure that this is an error as I quad boot this laptop and don't see any disk error notifications in the other three OS's. Just the same, I've backed up critical data just in case Palimpsest is correct. As Demz says, better safe than sorry.

phoenixpb
21st April 2009, 02:42 PM
no it's a bug your hard drive is oki
i have 3 hard drives here and it says the 3 are damaged :eek:

PeTzZz
21st April 2009, 04:15 PM
I merged 3 threads about the same thing. The talk didn't get messy, but just in case here are the links to the old threads: 1 (http://forums.fedoraforum.org/showthread.php?t=219784), 2 (http://forums.fedoraforum.org/showthread.php?t=219823), 3 (http://forums.fedoraforum.org/showthread.php?t=220215) (closed).

My point is that Palimpsest is telling me that the disk is failing. I'm fairly sure that this is an error as I quad boot this laptop and don't see any disk error notifications in the other three OS's. Just the same, I've backed up critical data just in case Palimpsest is correct. As Demz says, better safe than sorry.

You cannot tell that it is because of that, because previous Fedora versions and those other OSs (probably) do not have the notification feature of SMART errors like Rahul mentioned.

sideways
5th May 2009, 01:55 PM
I'm ignoring this silly palimpsest notification. It's telling me the pending sector count is bad but shows a negative number. 'devkit-disks --show-info' at least gives +ve numbers and tells me there are over 42 trillion current-pending-sectors, and it says that number exceeds a threshold - no **** sherlock!

$ devkit-disks --show-info /dev/sda
Showing information for /org/freedesktop/DeviceKit/Disks/devices/sda
native-path: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
device: 8:0
device-file: /dev/sda
by-id: /dev/disk/by-id/ata-FUJITSU_MHY2250BH_K432T81269H2
by-id: /dev/disk/by-id/scsi-SATA_FUJITSU_MHY2250_K432T81269H2
....
ATA SMART: Updated at Tue 05 May 2009 13:41:47 BST
assessment: PASSED
bad sectors: Yes
attributes: One ore more attributes exceed threshold
temperature: 46° C / 115° F
powered on: 2.37e+04 days
offline data: never collected (1009 second(s) to complete)
self-test status: success or never (0% remaining)
...
================================================== =============================
Attribute Current/Worst/Threshold Status Value Type Updates
================================================== =============================
raw-read-error-rate 100/100/ 46 good 116185 Prefail Online
spin-up-time 100/100/ 25 good 1 msec Prefail Online
start-stop-count 88/ 88/ 0 n/a 247613 Old-age Online
reallocated-sector-count 100/100/ 24 good 0 sectors Prefail Online
power-on-hours 82/ 82/ 0 n/a 2.37e+04 days Old-age Online
power-cycle-count 100/100/ 0 n/a 479 Old-age Online
g-sense-error-rate 100/100/ 0 n/a 638 Old-age Online
power-off-retract-count 100/100/ 0 n/a 33 Old-age Online
temperature-celsius-2 100/100/ 0 n/a 46C / 115F Old-age Online
reallocated-event-count 100/100/ 0 n/a 433570 Old-age Online
current-pending-sector 100/100/ 0 FAIL 42412030754819 sectors Old-age Online
offline-uncorrectable 100/100/ 0 n/a 162448070672387 sectors Old-age Online
udma-crc-error-count 100/100/ 0 n/a 22249107 Old-age Online
multi-zone-error-rate 100/100/ 0 n/a 56715108 Old-age Online


In the meantime I'm gonna read through the whole disk and see if it improves the analysis, this will take a couple of hours (250GB), will post back results
sudo dd if=/dev/sda of=/dev/null bs=2M

(check progress by typing 'sudo kill -USR1 $(pidof dd)' in another terminal, progress is output in the terminal running dd)

sideways
5th May 2009, 04:30 PM
No change, dd read through the whole disk without reporting errors, but 'devkit-disks --show-info /dev/sda' still shows over 42 trillion current-pending-sectors.

Wonder how many queries we'll get on this, not 42 trillion I hope. ;)

EDIT:
$ sudo dd if=/dev/sda of=/dev/null bs=2M
119237+1 records in
119237+1 records out
250059350016 bytes (250 GB) copied, 6031.74 s, 41.5 MB/s


and, just to check the full disk was read:
$ sudo fdisk -l /dev/sda

Disk /dev/sda: 250.0 GB, 250059350016 bytes
...

Note, that 2M = 2*1024*1024 = 2097152 bytes, and
$ bc <<< "250059350016/2097152"
119237

ie. exactly 119237 blocks of 2M were read.

sideways
5th May 2009, 10:01 PM
I submitted a bug report, there are a few for palimpsest. I would advise people not to get too worried just yet if palimpsest tells you you have a failing disk.

I really hope they fix this for F11 Final, imagine the reaction of people new to Fedora booting up only to be greeted by this quite scary message. There was/is a similar bug with gnome-power-manager that tells you your battery may be broken, nice. Perhaps these utilities should default to not plastering messages all over the screen until they are fully bug tested, how about just logging the messages until then?