PDA

View Full Version : lots of bad blocks


vinboy
2009-04-04, 10:05 PM CDT
my new WD hard drive got lots of errors.
How can I check the total working hours of this drive?

# sudo badblocks -sv -o badblocks.txt /dev/sdc1
Checking blocks 0 to 488384000
Checking for bad blocks (read-only test): done
Pass completed, 136 bad blocks found.

# cat badblocks.txt
126976
127012
127013
127014
127015
127016
127017
127018
127019
127020
127021
127022
127023
127024
127025
127026
127027
127028
127029
127030
127031
127032
127033
127034
127035
127036
127037
127038
127039
127040
127041
127042
127043
127044
127045
127046
127047
127048
127049
127050
127051
127052
127053
127054
127055
127056
127057
127058
127059
127060
127061
127062
127063
127064
127065
127066
127067
127068
127069
127070
127071
127072
127073
127074
127075
127076
127077
127078
127079
127080
127081
127082
127083
127084
127085
127086
127087
127088
127089
127090
127091
127092
127093
127094
127095
127096
127097
127098
127099
127100
127101
127102
127103
127104
127105
127106
127107
179448
179492
179493
179494
179495
32236504
32236548
32236549
32236550
32236551
41032952
41032992
41032993
41032994
41032995
116205720
116205776
116205777
116205778
116205779
116992904
116992908
116992909
116992910
116992911
117172352
117172388
117172389
117172390
117172391
118086616
118086660
118086661
118086662
118086663
118261816
118261817
118261818
118261819

Hlingler
2009-04-04, 11:13 PM CDT
How can I check the total working hours of this drive?Install package smartmontools. Then command:
~]$ sudo /usr/sbin/smartctl -a /dev/sdc~]$ sudo /usr/sbin/smartctl -a /dev/sda
[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 094 016 Pre-fail Always - 65536
2 Throughput_Performance 0x0005 160 100 050 Pre-fail Offline - 200
3 Spin_Up_Time 0x0007 113 100 024 Pre-fail Always - 308 (Average 308)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 309
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 140 100 020 Pre-fail Offline - 29
9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 21626
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 295
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 359
193 Load_Cycle_Count 0x0012 100 100 050 Old_age Always - 359
194 Temperature_Celsius 0x0002 130 101 000 Old_age Always - 42 (Lifetime Min/Max 17/54)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0
[...]V

Gambitt
2009-04-04, 11:34 PM CDT
oh. thats not good. I have heard that Linux on certain laptops reduces the life of harddisks to 8 months.
To query S.M.A.R.T. data of your drive you need to install smartmontools:
$ sudo aptitude install smartmontools
and then use the command as said by Hlinger

Linux (http://www.linux-archive.org/)

vinboy
2009-04-05, 12:07 AM CDT
LMAO, power_on_hours only 99 and I have so much bad sectors..
this is sad.

what to do now? any advise?

# sudo smartctl -a /dev/sdc
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: WDC WD5000AACS-00ZUB0
Serial Number: WD-WCASU7529034
Firmware Version: 01.01B01
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Apr 5 14:06:47 2009 MYT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (13200) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 154) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 199 198 051 Pre-fail Always - 282
3 Spin_Up_Time 0x0003 190 165 021 Pre-fail Always - 3500
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 120
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 99
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 117
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 108
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4764
194 Temperature_Celsius 0x0022 115 111 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 524 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 524 occurred at disk power-on lifetime: 97 hours (4 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8e e0 03 40

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 87 e0 03 00 08 13:08:50.071 READ FPDMA QUEUED
27 00 00 00 00 00 00 08 13:08:50.071 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 13:08:50.071 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 13:08:50.070 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 13:08:50.070 READ NATIVE MAX ADDRESS EXT

Error 523 occurred at disk power-on lifetime: 97 hours (4 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8e e0 03 40

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 87 e0 03 00 08 13:08:45.749 READ FPDMA QUEUED
27 00 00 00 00 00 00 08 13:08:45.749 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 13:08:45.748 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 13:08:45.748 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 13:08:45.748 READ NATIVE MAX ADDRESS EXT

Error 522 occurred at disk power-on lifetime: 97 hours (4 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8e e0 03 40

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 87 e0 03 00 08 13:08:41.670 READ FPDMA QUEUED
27 00 00 00 00 00 00 08 13:08:41.670 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 13:08:41.669 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 13:08:41.669 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 13:08:41.669 READ NATIVE MAX ADDRESS EXT

Error 521 occurred at disk power-on lifetime: 97 hours (4 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8e e0 03 40

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 87 e0 03 00 08 13:08:37.702 READ FPDMA QUEUED
27 00 00 00 00 00 00 08 13:08:37.702 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 13:08:37.702 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 13:08:37.701 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 13:08:37.701 READ NATIVE MAX ADDRESS EXT

Error 520 occurred at disk power-on lifetime: 97 hours (4 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8e e0 03 40

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 00 87 e0 03 00 08 13:08:33.734 READ FPDMA QUEUED
27 00 00 00 00 00 00 08 13:08:33.734 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 13:08:33.734 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 13:08:33.734 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 13:08:33.733 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

marcrblevins
2009-04-05, 03:22 AM CDT
Don't feel bad, I bought three 230 gig IDE Maxtor drives. One died in a month. Took a month to get it replaced. Then a month later a different drive died. Took another month to get it replaced thru RMA while Maxtor and Seagate was merging! Replacing drives thru RMA is slow.

My rig runs 24/7.

Hlingler
2009-04-05, 10:26 AM CDT
LMAO, power_on_hours only 99 and I have so much bad sectors..
this is sad.

what to do now? any advise?Return the HDD for a refund/exchange. Do NOT mention that you installed Linux. Simply say that you used smartmontools S.M.A.R.T. diagnostic after getting "bad sector" errors.

V

vinboy
2009-04-05, 12:39 PM CDT
Return the HDD for a refund/exchange. Do NOT mention that you installed Linux. Simply say that you used smartmontools S.M.A.R.T. diagnostic after getting "bad sector" errors.

V

why? what's wrong with installing linux????

vinboy
2009-04-05, 12:41 PM CDT
Wow, I just checked another WD drive, so many bad sectors too.... maybe I got the counterfeit WD hdd?
this is scary :mad:


# sudo badblocks -sv -o badblocks-sdd.txt /dev/sdd1
Checking blocks 0 to 488384000
Checking for bad blocks (read-only test): done
Pass completed, 113 bad blocks found.


35010240
35010260
35010261
35010262
35010263
35010264
35010265
35010266
35010267
35010268
35010269
35010270
35010271
35010272
35010273
35010274
35010275
35010276
35010277
35010278
35010279
35010280
35010281
35010282
35010283
35010284
35010285
35010286
35010287
35010288
35010289
35010290
35010291
35010292
35010293
35010294
35010295
35010296
35010297
35010298
35010299
35010300
35010301
35010302
35010303
35010304
35010305
35010306
35010307
35010308
35010309
35010310
35010311
35010312
35010313
35010314
35010315
35010316
35010317
35010318
35010319
35010320
35010321
35010322
35010323
35010324
35010325
35010326
35010327
35010328
35010329
35010330
35010331
35010332
35010333
35010334
35010335
35010336
35010337
35010338
35010339
35010340
35010341
35010342
35010343
35010344
35010345
35010346
35010347
35010348
35010349
35010350
35010351
35010352
35010353
35010354
35010355
35010356
35010357
35010358
35010359
35010360
35010361
35010362
35010363
35010364
35010365
35010366
35010367
35010368
35010369
35010370
35010371


# sudo smartctl -A /dev/sdd
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 184 153 021 Pre-fail Always - 3766
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 76
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 513
10 Spin_Retry_Count 0x0032 100 253 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 76
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 57
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 76
194 Temperature_Celsius 0x0022 106 101 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0

Hlingler
2009-04-05, 12:41 PM CDT
Many vendors will simply try to squirm out of the warrantee by claiming that you installed "unapproved" and/or "unsupported" software.

V

marcrblevins
2009-04-05, 07:27 PM CDT
Are you sure you are getting hardware error messages while using Fedora itself?

Try to low-format it, download from the hard drive's suport online site.

Hlingler
2009-04-05, 09:01 PM CDT
I just ran badblocks on my two HDDs on machine #1, and came up with blank output - nothing, nada, so: no bad blocks.
sda: 160 GB Hitachi Deskstar T7K250 series, 21647 power on hours
sdb: 250 GB Seagate Barracuda 7200.8 family, 20829 power on hours

/Methinks I'll avoid WD HDDS from here on out. BTW: I have three WD Caviar paperweights sitting on my desk right now.

V

vinboy
2009-04-06, 09:43 AM CDT
I just ran badblocks on my two HDDs on machine #1, and came up with blank output - nothing, nada, so: no bad blocks.
sda: 160 GB Hitachi Deskstar T7K250 series, 21647 power on hours
sdb: 250 GB Seagate Barracuda 7200.8 family, 20829 power on hours

/Methinks I'll avoid WD HDDS from here on out. BTW: I have three WD Caviar paperweights sitting on my desk right now.

V

nothing againt WD hdd.

I used to have WD that lasted for many many years no problem.
My new WD hdd's quality is unacceptable, maybe I got unlucky? or i got the counterfeited hdd?
:(

I'll try to get the store to replace it without the RMA waiting period.

leigh123linux
2009-04-06, 10:10 AM CDT
Why are you testing for badblocks on a partition ?, try testing the device instead !


i.e


badblocks -sv -o badblocks-sdd.txt /dev/sdd

vinboy
2009-04-06, 10:19 AM CDT
Why are you testing for badblocks on a partition ?, try testing the device instead !


does it make any difference??
yes I'm testing the device now on read-write mode.
# sudo badblocks -swv /dev/sdd
Checking for bad blocks in read-write mode
From block 0 to 625131863
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: ^C7.29% done, 9:34:32 elapsed (I STOPPED HERE)


sudo badblocks -swv /dev/sdc
Checking for bad blocks in read-write mode
From block 0 to 488386583
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: ^C3.66% done, 7:35:25 elapsed (I STOPPED HERE)


looks like the bad blocks are gone?

vinboy
2009-04-07, 12:38 AM CDT
WOW, bad sectors gone!

sudo badblocks -sv -o badblocks-sdd-new.txt /dev/sdd
Checking blocks 0 to 625131863
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.


sudo badblocks -sv -o badblocks-sdc-new.txt /dev/sdc
Checking blocks 0 to 488386583
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.

leigh123linux
2009-04-07, 01:32 AM CDT
Ok, I have looked at man badblocks and this program only tests for filesystem badblocks, this isn't the same as a harddrive bad sector/block .
A filesystem bad block isn't physical damage to the harddrive , it's just the partition that's corrupt ( main cause is bad shutdown ) .

http://en.wikipedia.org/wiki/Bad_sector


man badblocks


A filesystem bad block can be fixed with this command ( umount the partition first )

http://en.wikipedia.org/wiki/Ext3


su -
e2fsck -f -v -y /dev/sdd1


P.S I didn't know badblocks existed till I saw this thread.