FedoraForum.org - Fedora Support Forums and Community
Results 1 to 10 of 10
  1. #1
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Knowledgeable smartctl person needed

    After experiencing 4 drive failures in about 10 days, I decided to start using smartd. I also installed F26 from scratch on all my boxes with all reports sent to my email address.

    For an older laptop I get the same email sent to me every morning:
    SMART error (CurrentPendingSector) detected on host: airportlaptop
    The contents of which claim:
    Device: /dev/sda [SAT], 65532 Currently unreadable (pending) sectors
    With a promise at the end:
    No additional messages about this problem will be sent. But I get this same message at every cold boot.
    The box works fine. I wrote a script (eatDisk) that writes a 1G file from urandom to disk and then reads that file to write a duplicate and then read the duplicate to create the next duplicate, etc till disk exhaustion. No new errors.

    I decided to use that box as a test bed to increase my knowledge about SMART and smartctl. No matter what command sequence I enter, it appears the test is aborted at the 10% mark with it reporting a specific and always the same LBA_of_first_error . This is even if I specify a range of LBA to test that omits that LBA. Maddening!

    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Extended offline Completed: read failure 90% 14498 533327240
    # 2 Selective offline Completed: read failure 90% 14493 533327240
    # 4 Extended offline Completed: read failure 90% 14492 533327240
    # 5 Selective offline Completed: read failure 90% 14492 533327240
    #13 Extended offline Completed: read failure 90% 14491 533327240
    #16 Extended offline Completed: read failure 90% 14482 533300216

    Questions:
    1) If SMART sees an error, why doesn't the drive take care of it by redirecting to a good LBA?
    2) What happened to LBA 533300216 which was the first error reported and is less than 533327240?
    3) The drive has 625142447 LBA's. Remaining is always 90%, which suggests the testing is being done from highest LBA to lowest. i.e (625142447-533327240)/625142447=15% but that's around 10%.
    4) How do I get the drive to remap the bad spot?
    5) How do I get smartctl to scan the whole drive and NOT quit with 90% remaining?
    6) How can I see the supposed 65532 Currently unreadable (pending) sectors list?

  2. #2
    Join Date
    Feb 2005
    Location
    London, UK
    Posts
    642

    Re: Knowledgeable smartctl person needed

    I'm not an expert on smartctl, but this blog post looks useful:

    http://nwsmith.blogspot.co.uk/2007/0...nreadable.html

  3. #3
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Re: Knowledgeable smartctl person needed

    HaydnH :
    Thanks for the link, but the article is way out of date. Almost nothing in it works on F26.

    It did give me an idea for further testing though, so it did provide a spark.

  4. #4
    Join Date
    Sep 2009
    Posts
    2,173

    Re: Knowledgeable smartctl person needed

    You have to force a write, so you will need to install hdparm if it's not already installed. This thread tells you how.

    dd_wizard

  5. #5
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Re: Knowledgeable smartctl person needed

    dd_wizard:

    I want to find out what file is involved with the failure, if any, and try to take corrective action before blasting the disk. If there is a file involve, I'd rename it to some trash name and copy in the file from another box assuming it's an O/S file. Writing nulls to the renamed file for the file length should then do as you suggest.

    I found this page: https://www.smartmontools.org/wiki/B...wto#LVMrepairs that worked well till I discovered that dumpe2fs won't open my LVM volume. Notice that's the smartmontools site handing out information that no longer works. Typical of Linux documentation.

  6. #6
    Join Date
    Jul 2005
    Posts
    882

    Re: Knowledgeable smartctl person needed

    ..till I discovered that dumpe2fs won't open my LVM volume
    Double-check the path you used to your lvm volume device, the example in their wiki is strange-looking. I just ran dumpe2fs /dev/mapper/fedora_raptor-root, where fedora_raptor is the volume group name and root is the logical volume name. This test worked fine on this f25 laptop.
    ======
    Doug G
    ======

  7. #7
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Re: Knowledgeable smartctl person needed

    Doug G

    I've been at this for 2 days.

    I discovered the "secret" to getting debugfs to open my volume. It's as you indicated - use the LV Path which in my case is /dev/fedora_airportlaptop/root . I love the Linux O/S but the documentation is atrocious.

    I booted the box using the USB stick containing F26 and just opted to run the O/S in its test setting so the real hard drive isn't mounted. Executing an su - in a terminal session provides necessary root access.

    I ran a badblocks on the volume and it discovered 4 errors in 7 hours. I used debugfs to test those bad blocks to see if any of them were in existing files, and none were. Then I ran an e2fsck using that badblock list to account for those blocks. I ran badblocks and e2fsck separately because I wanted to know how many blocks would be produced since I started this because smartctl sent me emails indicating 64K worth of blocks were bad. That just didn't seem reasonable on a drive that was working perfectly. I knew that the -c option to e2fsck would run badblocks itself but I wanted the information and the opportunity to test those blocks to see if files were involved.

    In case anyone reads this later:
    As root on an unmounted file system:
    badblocks -nvs -b 4096 -o badBlocks /dev/fedora_airportlaptop/root

    debugfs
    open /dev/fedora_airportlaptop/root
    testb 1234567890
    ...
    ...
    quit

    e2fsck -B 4096 -l badBlocks /dev/fedora_airportlaptop/root

    Bounce the box and let it boot off the hard drive.

    As root, a quick
    smartctl -t short /dev/sda
    followed by
    smartctl -a /dev/sda
    2 minutes later revealed no Current_Pending_Sectors

    I'll start a
    smartctl -t long /dev/sda
    and go to bed (it's 1AM here). I'll check in the morning.

    A good URL to use is https://lsandig.org/blog/2015/06/yat...bad-blocks/en/
    Last edited by BillGradwohl; 8th September 2017 at 03:39 PM.

  8. #8
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Re: Knowledgeable smartctl person needed

    At 1AM I didn't notice that the short test DID log a new LBA_of_first_error even though it DID NOT have a Current_Pending_Sectors count above 0. I don't understand how it can locate an error after I run a badblocks for 7 hours and that new error doesn't raise the Current_Pending_Sectors to at least 1.

    The subsequent long test stumbled over the same new LBA_of_first_error and still didn't raise the Current_Pending_Sectors to at least 1. I don't get it.

    I'm running another badblocks as before to see if it finds anything. I'll know in 7 hours.
    Last edited by BillGradwohl; 8th September 2017 at 11:25 PM.

  9. #9
    Join Date
    Feb 2009
    Location
    Island of Roatan in the Caribbean
    Posts
    281

    Re: Knowledgeable smartctl person needed

    For anyone that may stumble upon this ...
    These are my conclusions, YMMV.
    I discovered that there are two approaches to handling bad blocks in a 'linear' LVM partition; the SMART hardware version and the Linux software file system version. Long story short, I finally got the SMART hardware version working after spending many hours learning about the software method.

    The software method:
    If your smartd is set up properly, it will send you a message when it detects an issue. It will tell you which partition is involved and then you can take the brute force software approach to remove bad blocks from being considered as storage locations. This involves running the badblocks utility against the partition as explained in a previous post and then running e2fsck to mask them off from further consideration BY THE OPERATING SYSTEM. They have not been handled at the hardware level, so smartctl will continue to report on them. Not ideal. PITA actually.

    Along the way I learned that running badblocks multiple times against a partition will return not just new bad blocks after one set of them has been handled via e2fsck, but will return the same bad blocks over and over again even if they've already been shunted aside via an e2fsck execution as previously explained.

    The hardware method:
    This involves taking the output from smartctl -a /dev/something and working through bad blocks one by one. smartctl only lists the FIRST bad block on its report and there's no indication how many more there may be as it quits after finding one. The smartmontools web page previously listed by me offers a lengthy set of instruction for calculating a bad block number relative to a file system that IS the equivalent of the bad block number displayed by smartctl relative to the device.

    Getting that file system block number is involved. I wrote a spreadsheet that does all the calculations once many utilities are executed to provide values for the spreadsheet cells. I took all the example values listed in the smartmontools web page previously mentioned in a prior post and inserted them into the spreadsheet and I got their answer, so I believe the spreadsheet works to do the arithmetic.

    Getting a clean smartctl -a listing amounts to looping a set of steps, namely:
    1. run smartctl -a /dev/.... to see there is a bad block
    2. run numerous utilities to get certain pieces of information needed to calculate a file system equivalent bad block
    3. run debugfs, if you're so inclined, to see if the bad block is inside an existing file so you can take some corrective action
    4. run two versions of the dd command to tell SMART at the drive level to mask off a bad block
    5. run smart -t (short or long) /dev/.... and wait the specified amount of time
    Start over at 1 until the smartctl -a report is clean.

    I'm going to attempt to attach the spreadsheet to this post.
    Column A has the labels for column B's values.
    Column C has instructions on what utility to execute for any associated column B cell.
    Column A's cell width isn't wide enough to see some cell contents, so click a cell to see its contents in the formula bar. The spreadsheet makes sense if you review the smartmontools web page https://www.smartmontools.org/wiki/B...wto#LVMrepairs and follow along.
    The values in the cells are for one of my cases. If you insert the values from the smartmontools page you'll get their example answer.

    Instructions for using the spreadsheet are listed as command line commands in column A with column B providing context. Using the spreadsheet, I was able to get a clean smartctl report.
    Attached Files Attached Files

  10. #10
    Join Date
    Nov 2017
    Location
    Cuenca, Ecuador
    Posts
    1

    Thumbs up Re: Knowledgeable smartctl person needed

    Quote Originally Posted by BillGradwohl
    For anyone that may stumble upon this ...

    ...

    I'm going to attempt to attach the spreadsheet to this post.

    ...

    Instructions for using the spreadsheet are listed as command line commands in column A with column B providing context. Using the spreadsheet, I was able to get a clean smartctl report.
    Thank you! I've done the smartctl/dd sector remap shuffle before but never with LVM and the calculations were a little miserable. Your spreadsheet was just the ticket, the calculations worked for me as well (I verified correctly) and I'm slowly decrementing my 'Current_Pending_Sector' count on an irritating disk. Much appreciated.

Similar Threads

  1. [specifically for anyone highly knowledgeable & experienced in in UX] plasma, ui, and
    By dorahard in forum Desktop Environments / Window Managers
    Replies: 0
    Last Post: 10th March 2017, 06:18 AM
  2. smartctl Wear_Leveling_Count
    By marko in forum Using Fedora
    Replies: 0
    Last Post: 19th January 2013, 03:35 AM
  3. New SSD and Bad sectors in smartctl
    By lost_pro in forum Hardware & Laptops
    Replies: 2
    Last Post: 29th September 2010, 09:13 PM
  4. FC9 smartmontools:is, smartctl:not
    By kondrix in forum Using Fedora
    Replies: 5
    Last Post: 25th July 2008, 07:54 PM
  5. Please-kind person needed
    By Pb_90 in forum Using Fedora
    Replies: 1
    Last Post: 12th June 2005, 05:11 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •