PDA

View Full Version : [SOLVED] f16 gpt corrupt header, repair it and later the other one gets corrupted



rtguille
21st November 2011, 01:21 PM
I recently installed F16, later i noticed that one of the gpt header get corrupted
and i am tring to find why.

# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.1

Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: damaged

************************************************** **************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
************************************************** **************************
Disk /dev/sda: 625142448 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4F85375A-FCC5-4582-B687-916C9DE4D12B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 2101247 1024.0 MiB EF00 ext4
3 2101248 75501567 35.0 GiB 0700
4 75501568 390381567 150.1 GiB 0700
5 390381568 474267647 40.0 GiB 0700
6 474267648 482656255 4.0 GiB 8200
7 482656256 625141759 67.9 GiB 0700


# parted -l
Error: The backup GPT table is corrupt, but the primary appears OK, so that will
be used.
OK/Cancel? ok
Model: ATA SAMSUNG HM320HJ (scsi)
Disk /dev/sda: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 1049kB 2097kB 1049kB bios_grub
2 2097kB 1076MB 1074MB ext4 ext4 boot
3 1076MB 38.7GB 37.6GB ext4
4 38.7GB 200GB 161GB ext4
5 200GB 243GB 42.9GB
6 243GB 247GB 4295MB linux-swap(v1)
7 247GB 320GB 73.0GB ext4


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f: 42.9GB
Sector size (logical/physical): 512B/512B

Partition Table: loop

Number Start End Size File system Flags
1 0.00B 42.9GB 42.9GB ext4


Something happens at boot, that either the main or backup gpt header "corrupts". One time the main, the other
the backup. I created the gpt partitions with anaconda at install time, with custom partitioning.

To repair it, i do:

# gdisk /dev/sda
GPT fdisk (gdisk) version 0.8.1

Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: damaged

************************************************** **************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
************************************************** **************************

Command (? for help): r

Recovery/transformation command (? for help): ?
b use backup GPT header (rebuilding main)
c load backup partition table from disk (rebuilding main)
d use main GPT header (rebuilding backup)
e load main partition table from disk (rebuilding backup)
f load MBR and build fresh GPT from it
g convert GPT into MBR and exit
h make hybrid MBR
i show detailed information on a partition
l load partition data from a backup file
m return to main menu
o print protective MBR data
p print the partition table
q quit without saving changes
t transform BSD disklabel partition
v verify disk
w write table to disk and exit
x extra functionality (experts only)
? print this menu

Recovery/transformation command (? for help): d

Recovery/transformation command (? for help): p
Disk /dev/sda: 625142448 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4F85375A-FCC5-4582-B687-916C9DE4D12B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 2101247 1024.0 MiB EF00 ext4
3 2101248 75501567 35.0 GiB 0700
4 75501568 390381567 150.1 GiB 0700
5 390381568 474267647 40.0 GiB 0700
6 474267648 482656255 4.0 GiB 8200
7 482656256 625141759 67.9 GiB 0700

Recovery/transformation command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT).
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.

# partprobe

# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 625142448 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4F85375A-FCC5-4582-B687-916C9DE4D12B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 2101247 1024.0 MiB EF00 ext4
3 2101248 75501567 35.0 GiB 0700
4 75501568 390381567 150.1 GiB 0700
5 390381568 474267647 40.0 GiB 0700
6 474267648 482656255 4.0 GiB 8200
7 482656256 625141759 67.9 GiB 0700

# parted -l
Model: ATA SAMSUNG HM320HJ (scsi)
Disk /dev/sda: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 1049kB 2097kB 1049kB bios_grub
2 2097kB 1076MB 1074MB ext4 ext4 boot
3 1076MB 38.7GB 37.6GB ext4
4 38.7GB 200GB 161GB ext4
5 200GB 243GB 42.9GB
6 243GB 247GB 4295MB linux-swap(v1)
7 247GB 320GB 73.0GB ext4


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: loop

Number Start End Size File system Flags
1 0.00B 42.9GB 42.9GB ext4


I belive that sometimes it gets corrupted. Sometimes, but not allways,
when i power-on the laptop, systemd ask for root password or to press ctrl+d
for root login, with no visible cause of the question (no error), i just
press ctrl+d and it boots ok. Also grub2 adds like 10-15 seconds to the 'start'
boot process (which is more than the 5 seconds, since it is the first
time i do use it, i will not change it for now.)

srs5694
23rd November 2011, 04:23 AM
What, if anything, is installed on the computer other than Fedora 16? Are you using any unusual boot loaders or other pre-boot software? Some such programs, especially if they're GPT-unaware, will overwrite some of the sectors used by the primary GPT data, which could be at least part of the problem.

You might try running Boot Info Script (http://sourceforge.net/projects/bootinfoscript/) and posting the RESULTS.TXT file that it generates. That provides information on your boot loaders and other low-level boot information. (Please post using code tags, though -- [ code ] before the text and [ /code ] after, but closing up the spaces within the brackets. That preserves columnar formatting, which the forum software otherwise removes.)

Another possibility is that you're running into hardware problems -- a bad hard disk, disk interface, RAM, or even CPU. You could try checking your SMART data (accessible from Palimpsest Disk Utility or other tools), or check dmesg output for hints of hardware problems.

rtguille
23rd November 2011, 11:15 AM
F16 uses the whole drive. The partitions where created by anaconda (using custom partitioning).
Boot time is unusually high compared to F15 (which was previously installed), specially the time it takes
to grub2 menu to appear is just gigantic (it seems like 10s) and then it waits other 5s.
I do not execute anything in particular regarding to bootloaders, i suspect that something at boot time corrupts it, probably grub2.

# cat /etc/grub2.cfg
#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub2-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
load_env
fi
set default="0"
if [ "${prev_saved_entry}" ]; then
set saved_entry="${prev_saved_entry}"
save_env saved_entry
set prev_saved_entry=
save_env prev_saved_entry
set boot_once=true
fi

function savedefault {
if [ -z "${boot_once}" ]; then
saved_entry="${chosen}"
save_env saved_entry
fi
}

function load_video {
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
}

set timeout=5
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Fedora (3.1.1-2.fc16.x86_64)' --class fedora --class gnu-linux --class gnu --class os {
load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod ext2
set root='(hd0,gpt2)'
search --no-floppy --fs-uuid --set=root f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc
echo 'Loading Fedora (3.1.1-1.fc16.x86_64)'
linux /vmlinuz-3.1.1-2.fc16.x86_64 root=UUID=7a5bcedb-cbc1-4c07-962a-b44c86be8c0d ro rd.md=0 rd.lvm=0 rd.dm=0 quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8
echo 'Loading initial ramdisk ...'
initrd /initramfs-3.1.1-2.fc16.x86_64.img
}
menuentry 'Fedora (3.1.1-1.fc16.x86_64)' --class fedora --class gnu-linux --class gnu --class os {
load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod ext2
set root='(hd0,gpt2)'
search --no-floppy --fs-uuid --set=root f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc
echo 'Loading Fedora (3.1.1-1.fc16.x86_64)'
linux /vmlinuz-3.1.1-1.fc16.x86_64 root=UUID=7a5bcedb-cbc1-4c07-962a-b44c86be8c0d ro rd.md=0 rd.lvm=0 rd.dm=0 quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8
echo 'Loading initial ramdisk ...'
initrd /initramfs-3.1.1-1.fc16.x86_64.img
}
menuentry 'Fedora Linux, with Linux 3.1.0-7.fc16.x86_64' --class fedora --class gnu-linux --class gnu --class os {
load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod ext2
set root='(hd0,gpt2)'
search --no-floppy --fs-uuid --set=root f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc
echo 'Loading Linux 3.1.0-7.fc16.x86_64 ...'
linux /vmlinuz-3.1.0-7.fc16.x86_64 root=UUID=7a5bcedb-cbc1-4c07-962a-b44c86be8c0d ro rd.md=0 rd.lvm=0 rd.dm=0 quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8
echo 'Loading initial ramdisk ...'
initrd /initramfs-3.1.0-7.fc16.x86_64.img
}
menuentry 'Fedora Linux, with Linux 3.1.0-7.fc16.x86_64 (recovery mode)' --class fedora --class gnu-linux --class gnu --class os {
load_video
set gfxpayload=keep
insmod gzio
insmod part_gpt
insmod ext2
set root='(hd0,gpt2)'
search --no-floppy --fs-uuid --set=root f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc
echo 'Loading Linux 3.1.0-7.fc16.x86_64 ...'
linux /vmlinuz-3.1.0-7.fc16.x86_64 root=UUID=7a5bcedb-cbc1-4c07-962a-b44c86be8c0d ro single rd.md=0 rd.lvm=0 rd.dm=0 quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8
echo 'Loading initial ramdisk ...'
initrd /initramfs-3.1.0-7.fc16.x86_64.img
}
### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_linux_xen ###
### END /etc/grub.d/20_linux_xen ###

### BEGIN /etc/grub.d/30_os-prober ###
### END /etc/grub.d/30_os-prober ###

### BEGIN /etc/grub.d/40_custom ###
# This file provides an easy way to add custom menu entries. Simply type the
# menu entries you want to add after this comment. Be careful not to change
# the 'exec tail' line above.
### END /etc/grub.d/40_custom ###

### BEGIN /etc/grub.d/41_custom ###
if [ -f $prefix/custom.cfg ]; then
source $prefix/custom.cfg;
fi
### END /etc/grub.d/41_custom ###

### BEGIN /etc/grub.d/90_persistent ###
### END /etc/grub.d/90_persistent ###


# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Fedora"
GRUB_DEFAULT=saved
GRUB_CMDLINE_LINUX="rd.md=0 rd.lvm=0 rd.dm=0 quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 KEYTABLE=es LANG=en_US.UTF-8"

dmesg:
[ 1.959237] scsi 1:0:0:0: CD-ROM MAT****A DVD+-RW UJ8A2 1.02 PQ: 0 ANSI: 5
[ 1.986195] Alternate GPT is invalid, using primary GPT.
[ 1.986250] sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7

srs5694
23rd November 2011, 05:11 PM
Please post the Boot Info Script results. Also, please use code tags!

rtguille
24th November 2011, 02:30 AM
Boot Info Script 0.60 from 17 May 2011


============================= Boot Info Summary: ===============================

=> Grub2 (v1.99) is installed in the MBR of /dev/sda and looks at sector 2048
of the same hard drive for core.img. core.img is at this location and
looks for ?? on this drive.

sda1: __________________________________________________ ________________________

File system: BIOS Boot partition
Boot sector type: Grub2's core.img
Boot sector info:

sda2: __________________________________________________ ________________________

File system: ext4
Boot sector type: -
Boot sector info:
Operating System:
Boot files: /grub2/core.img

sda3: __________________________________________________ ________________________

File system: ext4
Boot sector type: -
Boot sector info:
Operating System: Fedora release 16 (Verne)
Kernel on an ()
Boot files: /etc/fstab /boot/grub2/core.img

sda4: __________________________________________________ ________________________

File system: ext4
Boot sector type: -
Boot sector info:
Operating System:
Boot files:

sda5: __________________________________________________ ________________________

File system: crypto_LUKS
Boot sector type: Unknown
Boot sector info:

sda6: __________________________________________________ ________________________

File system: swap
Boot sector type: -
Boot sector info:

sda7: __________________________________________________ ________________________

File system: ext4
Boot sector type: -
Boot sector info:
Operating System:
Boot files:

============================ Drive/Partition Info: =============================

Drive: sda __________________________________________________ ___________________

Disk /dev/sda: 320.1 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders, total 625142448 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes

Partition Boot Start Sector End Sector # of Sectors Id System

/dev/sda1 1 625,142,447 625,142,447 ee GPT


GUID Partition Table detected.

Partition Start Sector End Sector # of Sectors System
/dev/sda1 2,048 4,095 2,048 BIOS Boot partition
/dev/sda2 4,096 2,101,247 2,097,152 EFI System partition
/dev/sda3 2,101,248 75,501,567 73,400,320 Data partition (Windows/Linux)
/dev/sda4 75,501,568 390,381,567 314,880,000 Data partition (Windows/Linux)
/dev/sda5 390,381,568 474,267,647 83,886,080 Data partition (Windows/Linux)
/dev/sda6 474,267,648 482,656,255 8,388,608 Swap partition (Linux)
/dev/sda7 482,656,256 625,141,759 142,485,504 Data partition (Windows/Linux)

"blkid" output: __________________________________________________ ______________

Device UUID TYPE LABEL

/dev/mapper/luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f d258aa4c-e613-4491-a226-45bd1863e294 ext4
/dev/sda2 f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc ext4
/dev/sda3 7a5bcedb-cbc1-4c07-962a-b44c86be8c0d ext4 _Fedora-16-x86_6
/dev/sda4 e21fac71-cc35-4c18-a736-1284b04ebccd ext4
/dev/sda5 0de9acb9-cb50-4a89-b93e-f9bc74b5724f crypto_LUKS
/dev/sda6 a0e6bd27-80e6-43b7-928b-4a39e1963654 swap
/dev/sda7 db0a257a-83a4-48e6-a640-c4278b26eb89 ext4

========================= "ls -R /dev/mapper/" output: =========================

/dev/mapper:
control
luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f

================================ Mount points: =================================

Device Mount_Point Type Options

/dev/mapper/luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f /home ext4 (rw,relatime,seclabel,user_xattr,acl,barrier=1,dat a=ordered)
/dev/sda2 /boot ext4 (rw,relatime,seclabel,user_xattr,acl,barrier=1,dat a=ordered)
/dev/sda3 / ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=or dered)
/dev/sda4 /virtual ext4 (rw,relatime,seclabel,user_xattr,acl,barrier=1,dat a=ordered)
/dev/sda7 /data1 ext4 (rw,relatime,seclabel,user_xattr,acl,barrier=1,dat a=ordered)


=================== sda2: Location of files loaded by Grub: ====================

GiB - GB File Fragment(s)

0.158065796 = 0.169721856 initramfs-3.1.0-7.fc16.x86_64.img 1
0.181510925 = 0.194895872 initramfs-3.1.1-1.fc16.x86_64.img 1
0.205032349 = 0.220151808 initramfs-3.1.1-2.fc16.x86_64.img 1
0.135044098 = 0.145002496 vmlinuz-3.1.0-7.fc16.x86_64 1
0.164344788 = 0.176463872 vmlinuz-3.1.1-1.fc16.x86_64 1
0.187652588 = 0.201490432 vmlinuz-3.1.1-2.fc16.x86_64 1

================= sda2: Location of files loaded by Syslinux: ==================

GiB - GB File Fragment(s)

0.129699707 = 0.139264000 extlinux/cat.c32 1
0.130611420 = 0.140242944 extlinux/chain.c32 1
0.129665375 = 0.139227136 extlinux/cmd.c32 1
0.129814148 = 0.139386880 extlinux/config.c32 1
0.129707336 = 0.139272192 extlinux/cpuid.c32 1
0.130130768 = 0.139726848 extlinux/cpuidtest.c32 1
0.130554199 = 0.140181504 extlinux/disk.c32 1
0.129787445 = 0.139358208 extlinux/dmitest.c32 1
0.130462646 = 0.140083200 extlinux/elf.c32 1
0.129852295 = 0.139427840 extlinux/ethersel.c32 1
0.130592346 = 0.140222464 extlinux/gfxboot.c32 1
0.130210876 = 0.139812864 extlinux/gpxecmd.c32 1
0.130882263 = 0.140533760 extlinux/hdt.c32 1
0.130226135 = 0.139829248 extlinux/host.c32 1
0.129806519 = 0.139378688 extlinux/ifcpu64.c32 1
0.129871368 = 0.139448320 extlinux/ifcpu.c32 1
0.129890442 = 0.139468800 extlinux/ifplop.c32 1
0.130622864 = 0.140255232 extlinux/kbdmap.c32 1
0.130432129 = 0.140050432 extlinux/linux.c32 1
0.130168915 = 0.139767808 extlinux/ls.c32 1
0.130119324 = 0.139714560 extlinux/lua.c32 1
0.130207062 = 0.139808768 extlinux/mboot.c32 1
0.129821777 = 0.139395072 extlinux/meminfo.c32 1
0.130542755 = 0.140169216 extlinux/menu.c32 1
0.130413055 = 0.140029952 extlinux/pcitest.c32 1
0.130569458 = 0.140197888 extlinux/pmload.c32 1
0.130546570 = 0.140173312 extlinux/pwd.c32 1
0.130172729 = 0.139771904 extlinux/reboot.c32 1
0.130485535 = 0.140107776 extlinux/rosh.c32 1
0.129791260 = 0.139362304 extlinux/sanboot.c32 1
0.130157471 = 0.139755520 extlinux/sdi.c32 1
0.129749298 = 0.139317248 extlinux/sysdump.c32 1
0.129882812 = 0.139460608 extlinux/vesainfo.c32 1
0.130378723 = 0.139993088 extlinux/vesamenu.c32 1
0.129798889 = 0.139370496 extlinux/vpdtest.c32 1
0.129753113 = 0.139321344 extlinux/whichsys.c32 1


============== sda2: Version of COM32(R) files used by Syslinux: ===============

extlinux/cat.c32 : COM32R module (v4.xx)
extlinux/chain.c32 : COM32R module (v4.xx)
extlinux/cmd.c32 : COM32R module (v4.xx)
extlinux/config.c32 : COM32R module (v4.xx)
extlinux/cpuid.c32 : COM32R module (v4.xx)
extlinux/cpuidtest.c32 : COM32R module (v4.xx)
extlinux/disk.c32 : COM32R module (v4.xx)
extlinux/dmitest.c32 : COM32R module (v4.xx)
extlinux/elf.c32 : COM32R module (v4.xx)
extlinux/ethersel.c32 : COM32R module (v4.xx)
extlinux/gfxboot.c32 : COM32R module (v4.xx)
extlinux/gpxecmd.c32 : COM32R module (v4.xx)
extlinux/hdt.c32 : COM32R module (v4.xx)
extlinux/host.c32 : COM32R module (v4.xx)
extlinux/ifcpu64.c32 : COM32R module (v4.xx)
extlinux/ifcpu.c32 : COM32R module (v4.xx)
extlinux/ifplop.c32 : COM32R module (v4.xx)
extlinux/kbdmap.c32 : COM32R module (v4.xx)
extlinux/linux.c32 : COM32R module (v4.xx)
extlinux/ls.c32 : COM32R module (v4.xx)
extlinux/lua.c32 : COM32R module (v4.xx)
extlinux/mboot.c32 : COM32R module (v4.xx)
extlinux/meminfo.c32 : COM32R module (v4.xx)
extlinux/menu.c32 : COM32R module (v4.xx)
extlinux/pcitest.c32 : COM32R module (v4.xx)
extlinux/pmload.c32 : COM32R module (v4.xx)
extlinux/pwd.c32 : COM32R module (v4.xx)
extlinux/reboot.c32 : COM32R module (v4.xx)
extlinux/rosh.c32 : COM32R module (v4.xx)
extlinux/sanboot.c32 : COM32R module (v4.xx)
extlinux/sdi.c32 : COM32R module (v4.xx)
extlinux/sysdump.c32 : COM32R module (v4.xx)
extlinux/vesainfo.c32 : COM32R module (v4.xx)
extlinux/vesamenu.c32 : COM32R module (v4.xx)
extlinux/vpdtest.c32 : COM32R module (v4.xx)
extlinux/whichsys.c32 : COM32R module (v4.xx)

=============================== sda3/etc/fstab: ================================

--------------------------------------------------------------------------------

#
# /etc/fstab
# Created by anaconda on Fri Nov 18 22:22:17 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=7a5bcedb-cbc1-4c07-962a-b44c86be8c0d / ext4 defaults 1 1
UUID=f09f1c85-31ca-48d6-a2d3-825b2b8bd4cc /boot ext4 defaults 1 2
UUID=db0a257a-83a4-48e6-a640-c4278b26eb89 /data1 ext4 defaults 1 2
/dev/mapper/luks-0de9acb9-cb50-4a89-b93e-f9bc74b5724f /home ext4 defaults 1 2
UUID=e21fac71-cc35-4c18-a736-1284b04ebccd /virtual ext4 defaults 1 2
UUID=a0e6bd27-80e6-43b7-928b-4a39e1963654 swap swap defaults 0 0
--------------------------------------------------------------------------------

=================== sda3: Location of files loaded by Grub: ====================

GiB - GB File Fragment(s)

1.158065796 = 1.243463680 boot/initramfs-3.1.0-7.fc16.x86_64.img 1
1.181510925 = 1.268637696 boot/initramfs-3.1.1-1.fc16.x86_64.img 1
1.205032349 = 1.293893632 boot/initramfs-3.1.1-2.fc16.x86_64.img 1
1.135044098 = 1.218744320 boot/vmlinuz-3.1.0-7.fc16.x86_64 1
1.164344788 = 1.250205696 boot/vmlinuz-3.1.1-1.fc16.x86_64 1
1.187652588 = 1.275232256 boot/vmlinuz-3.1.1-2.fc16.x86_64 1

================= sda3: Location of files loaded by Syslinux: ==================

GiB - GB File Fragment(s)

1.129699707 = 1.213005824 boot/extlinux/cat.c32 1
1.130611420 = 1.213984768 boot/extlinux/chain.c32 1
1.129665375 = 1.212968960 boot/extlinux/cmd.c32 1
1.129814148 = 1.213128704 boot/extlinux/config.c32 1
1.129707336 = 1.213014016 boot/extlinux/cpuid.c32 1
1.130130768 = 1.213468672 boot/extlinux/cpuidtest.c32 1
1.130554199 = 1.213923328 boot/extlinux/disk.c32 1
1.129787445 = 1.213100032 boot/extlinux/dmitest.c32 1
1.130462646 = 1.213825024 boot/extlinux/elf.c32 1
1.129852295 = 1.213169664 boot/extlinux/ethersel.c32 1
1.130592346 = 1.213964288 boot/extlinux/gfxboot.c32 1
1.130210876 = 1.213554688 boot/extlinux/gpxecmd.c32 1
1.130882263 = 1.214275584 boot/extlinux/hdt.c32 1
1.130226135 = 1.213571072 boot/extlinux/host.c32 1
1.129806519 = 1.213120512 boot/extlinux/ifcpu64.c32 1
1.129871368 = 1.213190144 boot/extlinux/ifcpu.c32 1
1.129890442 = 1.213210624 boot/extlinux/ifplop.c32 1
1.130622864 = 1.213997056 boot/extlinux/kbdmap.c32 1
1.130432129 = 1.213792256 boot/extlinux/linux.c32 1
1.130168915 = 1.213509632 boot/extlinux/ls.c32 1
1.130119324 = 1.213456384 boot/extlinux/lua.c32 1
1.130207062 = 1.213550592 boot/extlinux/mboot.c32 1
1.129821777 = 1.213136896 boot/extlinux/meminfo.c32 1
1.130542755 = 1.213911040 boot/extlinux/menu.c32 1
1.130413055 = 1.213771776 boot/extlinux/pcitest.c32 1
1.130569458 = 1.213939712 boot/extlinux/pmload.c32 1
1.130546570 = 1.213915136 boot/extlinux/pwd.c32 1
1.130172729 = 1.213513728 boot/extlinux/reboot.c32 1
1.130485535 = 1.213849600 boot/extlinux/rosh.c32 1
1.129791260 = 1.213104128 boot/extlinux/sanboot.c32 1
1.130157471 = 1.213497344 boot/extlinux/sdi.c32 1
1.129749298 = 1.213059072 boot/extlinux/sysdump.c32 1
1.129882812 = 1.213202432 boot/extlinux/vesainfo.c32 1
1.130378723 = 1.213734912 boot/extlinux/vesamenu.c32 1
1.129798889 = 1.213112320 boot/extlinux/vpdtest.c32 1
1.129753113 = 1.213063168 boot/extlinux/whichsys.c32 1

============== sda3: Version of COM32(R) files used by Syslinux: ===============

boot/extlinux/cat.c32 : COM32R module (v4.xx)
boot/extlinux/chain.c32 : COM32R module (v4.xx)
boot/extlinux/cmd.c32 : COM32R module (v4.xx)
boot/extlinux/config.c32 : COM32R module (v4.xx)
boot/extlinux/cpuid.c32 : COM32R module (v4.xx)
boot/extlinux/cpuidtest.c32 : COM32R module (v4.xx)
boot/extlinux/disk.c32 : COM32R module (v4.xx)
boot/extlinux/dmitest.c32 : COM32R module (v4.xx)
boot/extlinux/elf.c32 : COM32R module (v4.xx)
boot/extlinux/ethersel.c32 : COM32R module (v4.xx)
boot/extlinux/gfxboot.c32 : COM32R module (v4.xx)
boot/extlinux/gpxecmd.c32 : COM32R module (v4.xx)
boot/extlinux/hdt.c32 : COM32R module (v4.xx)
boot/extlinux/host.c32 : COM32R module (v4.xx)
boot/extlinux/ifcpu64.c32 : COM32R module (v4.xx)
boot/extlinux/ifcpu.c32 : COM32R module (v4.xx)
boot/extlinux/ifplop.c32 : COM32R module (v4.xx)
boot/extlinux/kbdmap.c32 : COM32R module (v4.xx)
boot/extlinux/linux.c32 : COM32R module (v4.xx)
boot/extlinux/ls.c32 : COM32R module (v4.xx)
boot/extlinux/lua.c32 : COM32R module (v4.xx)
boot/extlinux/mboot.c32 : COM32R module (v4.xx)
boot/extlinux/meminfo.c32 : COM32R module (v4.xx)
boot/extlinux/menu.c32 : COM32R module (v4.xx)
boot/extlinux/pcitest.c32 : COM32R module (v4.xx)
boot/extlinux/pmload.c32 : COM32R module (v4.xx)
boot/extlinux/pwd.c32 : COM32R module (v4.xx)
boot/extlinux/reboot.c32 : COM32R module (v4.xx)
boot/extlinux/rosh.c32 : COM32R module (v4.xx)
boot/extlinux/sanboot.c32 : COM32R module (v4.xx)
boot/extlinux/sdi.c32 : COM32R module (v4.xx)
boot/extlinux/sysdump.c32 : COM32R module (v4.xx)
boot/extlinux/vesainfo.c32 : COM32R module (v4.xx)
boot/extlinux/vesamenu.c32 : COM32R module (v4.xx)
boot/extlinux/vpdtest.c32 : COM32R module (v4.xx)
boot/extlinux/whichsys.c32 : COM32R module (v4.xx)


======================== Unknown MBRs/Boot Sectors/etc: ========================

Unknown BootLoader on sda5

00000000 4c 55 4b 53 ba be 00 01 61 65 73 00 00 00 00 00 |LUKS....aes.....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 78 74 73 2d 70 6c 61 69 |........xts-plai|
00000030 6e 36 34 00 00 00 00 00 00 00 00 00 00 00 00 00 |n64.............|
00000040 00 00 00 00 00 00 00 00 73 68 61 31 00 00 00 00 |........sha1....|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 40 |...............@|
00000070 a0 46 63 f9 ed a9 13 1b 28 06 e5 14 cf 71 ca 71 |.Fc.....(....q.q|
00000080 0b 41 39 78 3a e4 12 37 99 84 aa 91 40 03 7c 94 |.A9x:..7....@.|.|
00000090 e8 a4 c7 a4 ea 03 8e 6f 2a 74 52 5a b1 58 9a ad |.......o*tRZ.X..|
000000a0 a2 d2 ef 70 00 00 a6 81 30 64 65 39 61 63 62 39 |...p....0de9acb9|
000000b0 2d 63 62 35 30 2d 34 61 38 39 2d 62 39 33 65 2d |-cb50-4a89-b93e-|
000000c0 66 39 62 63 37 34 62 35 37 32 34 66 00 00 00 00 |f9bc74b5724f....|
000000d0 00 ac 71 f3 00 02 9b 7b ae ce e2 07 96 56 04 c9 |..q....{.....V..|
000000e0 69 8c a9 04 ef 21 06 82 c2 75 25 11 38 07 7d be |i....!...u%.8.}.|
000000f0 66 e6 c7 7c f0 e8 94 2d 00 00 00 08 00 00 0f a0 |f..|...-........|
00000100 00 ac 71 f3 00 02 ac e3 b6 7d d7 30 b6 c6 05 fa |..q......}.0....|
00000110 ba e5 6f da 45 11 4a ad e8 da cc e1 68 7b 45 b2 |..o.E.J.....h{E.|
00000120 f4 47 42 a9 19 bf 04 bf 00 00 02 00 00 00 0f a0 |.GB.............|
00000130 00 00 de ad 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 00 00 03 f8 00 00 0f a0 |................|
00000160 00 00 de ad 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000180 00 00 00 00 00 00 00 00 00 00 05 f0 00 00 0f a0 |................|
00000190 00 00 de ad 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001b0 00 00 00 00 00 00 00 00 00 00 07 e8 00 00 0f a0 |................|
000001c0 00 00 de ad 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 00 00 00 00 00 00 09 e0 00 00 0f a0 |................|
000001f0 00 00 de ad 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000200


=============================== StdErr Messages: ===============================

File descriptor 7 (pipe:[81051]) leaked on lvscan invocation. Parent PID 7712: /bin/bash
No volume groups found
mdadm: No arrays found in config file or automatically


ok, it is all of it.

srs5694
24th November 2011, 05:16 PM
I don't see anything in your Boot Info Script output that clearly identifies the source of the problem. I only have one idea at this point, and it's a long shot: I see you're using LUKS for disk encryption. I know very little about LUKS, but if it were misconfigured to access the whole disk rather than just /dev/sda5, it's conceivable that it would damage the partition table data. You might therefore look into your LUKS configuration, but as I know next to nothing about it, I can't help you with more precise suggestions.

chrismurphy
24th November 2011, 11:23 PM
This is a physical machine, not a VM? I know nothing about LUKS either but if it were misconfigured for whole disk instead of just sda5, I think it would in short order hose the entire disk and make it unbootable.

It's interesting that something is causing just a problem with the backup GPT. I'd look at smartctl:

smartctl -s on /dev/sda enable SMART
smartctl -x /dev/sda show all info on device

I'd check the SMART Attributes section and see if you're getting any errors: raw read error, seek error, reallocation events, pending sectors, CRC or zone errors. And in any event I'd do a long test, and walk away from the computer - you can boot in recovery mode and do this, which will avoid services interrupting the testing since it's an offline test (i.e. the test occurs when the disk is idle).

smartctl -t long /dev/sda

It will give you a time estimate for completion as a result. After that time you can:
smartctl -x /dev/sda

And see if any new errors have appeared. If so, just get rid of the disk. If there are no errors, realize SMART is only catching something like 40% of drive failures in advance of failure.

I wonder if it would be useful to see the full main and backup GPT entries if you're able to reproduce this problem, to see exactly how the backup GPT is being hosed.

srs5694
25th November 2011, 04:23 PM
Chris has some good suggestions. If you care to show us the raw GPT data, you'll need to use dd to create two files:



dd if=/dev/sda of=primary.gpt bs=512 count=34
dd if=/dev/sda of=backup.gpt bs=512 count=33 skip=625142415


I'm not sure how the forum software copes with attaching raw binary files of a type that it doubtless would consider unknown, but you can e-mail them to me, if you like. (rodsmith@rodsbooks.com)

rtguille
26th November 2011, 01:49 AM
for some strange reason, my post did not show up. :confused: (it was rejected due to being to big, i just
assumed it worked...)

It seems that -x is very much better than just -a (as i normally check the drives).

I extrated the gpt headers, then repaired backup, then extracted them again for comparison.
after poweroff and booted sysrescdd, backup was also corrupted.

It is a laptop, a dell e4310.

rtguille
26th November 2011, 02:03 AM
Here are:

* main ok, backup bad
* main ok, backup ok
after fixing it, i power-off the laptop and booted into sysrescd, it reported that
backup was bad again
* backup bad again

chrismurphy
26th November 2011, 02:07 AM
Someone correct me if I'm reading the attached smartctl report incorrectly!!

I am seeing four problems: a few read errors; but 3200+ spin up time errors. If either of these get a a certain threshold, they will cause the drive to give you a prefailure. So in a sense maybe you're in pre-pre-failure?

But then there's the g-sense error rate and free fall rate. Looks like maybe the drive has been dropped or banged around a bit?

I would for sure do a long test:

smartctl -t long /dev/sda
And come back and do another -a or -x to see if any of the attributes have changed. I'm wondering if you just so happen to have a bad sector right where this backup GPT is located. It's strange for the same thing to keep happening. Could still also be a bug somewhere. But I'd try to track it down. Bad sectors tend to grow.

chrismurphy
26th November 2011, 02:24 AM
Only difference between good and bad backup GPTs I'm seeing is in the screen shot attached.

rtguille
26th November 2011, 02:59 AM
Thanks, i did use hexedit and noticed it after posting.

Currently i managed to boot without corruption.
I will try to reboot several times and also perform several power-cycles to confirm.

Currently i installed just F16, but previously there was a dualboot work-setup.
I switched from RAID to AHCI since it is no longer necessary.
Was it a RAID mode "metadata" issue from the long gone other-os? (too suspicious to
be a faulty sector, but murphy is between us every time...)

I will start testing reboot & power cycle intermediately.

---------- Post added at 10:59 PM ---------- Previous post was at 10:36 PM ----------

I power cycled it two times and rebooted other three times.
Both main and backup gpt headers are ok for now.

I think it was related to RAID mode in the sata controller.
The firmware was slow to boot f16, i thought it was grub2... but evidently it was de firmware itself.
The controller comes in RAID mode :doh: , no option when dual-booting, but as i installed f16 on the whole disk i was
able to try with ahci.

Thanks everybody for the help provided.
I will mark it as resolved if all is ok for the following 10 reboots or power-cycles.

chrismurphy
26th November 2011, 03:40 AM
I still suggest a long smartctl test. The spin up numbers alone concern me. But you'd have to dig up a technical document from the manufacturer on this drive delineating the meaning of the SMART raw values, which aren't the same between vendors.

rtguille
26th November 2011, 02:33 PM
I confirmed that it was the SATA controller in RAID mode. In AHCI i found no problems,
rebooted several times, power-cycled, all ok, no gpt corruption.
Then i switched back to RAID and the corruption of the secondary header returned in the
first boot and went away after switching to AHCI (and repairing it, of course).


So:

When the SATA controller in a Dell e4310 with FW/BIOS A05 (in my case) is set to RAID,
it corrupts the secondary/backup gpt header at poweron, a big pause -5 or more seconds-
is added to the time it takes to grub2 menu to appear.

There was no problem with MBR boot and RAID with dual-boot, using F15.
F16 was installed with the whole disk, using custom partitioning.


The issue manifested with these partitions:


# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 625142448 sectors, 298.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4F85375A-FCC5-4582-B687-916C9DE4D12B
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 625142414
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 2101247 1024.0 MiB EF00 ext4
3 2101248 75501567 35.0 GiB 0700
4 75501568 390381567 150.1 GiB 0700
5 390381568 474267647 40.0 GiB 0700
6 474267648 482656255 4.0 GiB 8200
7 482656256 625141759 67.9 GiB 0700


# fdisk -l /dev/sda

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 320.1 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders, total 625142448 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sda1 1 625142447 312571223+ ee GPT





SATA Controller in RAID mode:
00:1f.2 RAID bus controller [0104]: Intel Corporation Mobile 82801 SATA RAID Controller [8086:282a] (rev 05)



SATA Controller in AHCI mode:
00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b2f] (rev 05)

Regarding the disk:

The only option i have is for it to wait for it to fail, making sure to make backups, because it wont be changed if it is not 'broken' or
if the other os works, etc. And the hdd actual situation does not help at all.

I really appreciate the help provided. I have learned that RAID mode can also corrupt disks.
Thanks for the support provided. :cool:

chrismurphy
26th November 2011, 08:06 PM
I would not expect a controller to behave in this manner. Even misconfigured this sounds like a bug for it to damage a backup GPT so consistently. Much of silent data corruption points to firmware bugs, not just a function of SNR, temperature, particles, oxidation of media surface, etc.

http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html

srs5694
29th November 2011, 06:29 PM
AFAIK, any sort of RAID mode necessarily involves writing extra data to the disk. For the most part, firmware-based RAID options are software RAID, and if the Linux kernel doesn't support the firmware's version of RAID, or if Linux is configured to ignore it, I could easily see the firmware's RAID feature attempting to use sectors of the disk that Linux wants to use for other things -- like a partition table. Thus, the solution doesn't seem all that surprising to me, although this is the first time I've heard of this specific complication.

I agree with Chris that the SMART errors are troubling. I note that the machine was described as a laptop, so that could account for the g-sense errors -- moving the computer while it's powered on, or using the computer in a vehicle that bounces around, could easily produce such problems. I recommend trying to avoid such uses in the future, if at all possible. Spinning discs are very delicate things, and although manufacturers do a remarkable job in making them resistant to damage caused by small bumps, there's only so much they can do on this score.

chrismurphy
29th November 2011, 08:44 PM
I wonder if the firmware is GPT unaware. MBR doesn't have a backup located at the end of a disk like GPT does, so I wonder if this particular firmware's RAID implementation possibly predates GPT, and thinks the region its mucking up is fair game.

---------- Post added at 12:44 PM ---------- Previous post was at 12:35 PM ----------



So:

When the SATA controller in a Dell e4310 with FW/BIOS A05 (in my case) is set to RAID,
it corrupts the secondary/backup gpt header at poweron

There is an Oct 2011 BIOS update for this model. I didn't read what it fixes. Whether you've applied it or not, I'd report your results thus far to Dell support. You've already done all of the work. No apparent corruption with MBR or other data, but consistent corruption of just the backup GPT when RAID is enabled. Me thinks they haven't anticipated GPT with their RAID implementation, and probably not just on this laptop.

AdamW
29th November 2011, 08:52 PM
yeah, I've had issues with SATA controller set to RAID mode in the past. On my system, for e.g., it can't boot a native EFI install with the SATA controller set to RAID mode, for some reason. Flipping it to AHCI makes the system boot, no other changes. Yay, BIOSes.

srs5694
30th November 2011, 12:04 AM
I wonder if the firmware is GPT unaware. MBR doesn't have a backup located at the end of a disk like GPT does, so I wonder if this particular firmware's RAID implementation possibly predates GPT, and thinks the region its mucking up is fair game.

I don't think it has to do with GPT-awareness or -unawareness. My hypothesis is just this: When set to RAID mode, the disk controller writes data to the hard disk where its RAID implementation normally reserves data. The intention is that drivers in the OS will then read that data, combine together portions of multiple disks, and present the combined disk portions as a single disk to the rest of the OS. (The "portions" would be the vast majority of the disk, though.) In the case in this thread, what happened was that the firmware wrote the data to the disk in the expectation of a RAID-using OS, but Linux didn't use the RAID data, and instead accessed the disk directly.

Put another way, the firmware used its own RAID system, with the intention of having the OS use just part of the disk, but Linux ignored this and used the whole disk. The two systems just worked under different assumptions and created a collision.

By this analysis, the firmware might not know or care what type of partitioning scheme is in use. (A pure BIOS system doesn't need to know a thing about partitioning.) It just happened that GPT was sensitive to damage caused by the collision, which resulted in its detection. If an MBR disk's last few sectors were overwritten, the issue might never have caused data corruption -- or if a partition were placed in that space, the result might have been a damaged filesystem or damaged file data, once that area was used.

bodo666
21st April 2012, 11:52 AM
Hi All,

just wanted to thank rtguille for posting step-by-step (first post), it rescued my 3tb wd ezrx hdd contents.

I am running very simple no lvm,no raid no nothing setup. just gpt to be able to use the whole setup with 1 single ext4 partition. nothing could go wrong, right? ;-)
looks like gpt is nowhere near stability of old-style msdos label system. good though there are all these rescue options.
what did i do to deserve the trouble of partition not mounting anymore? nothing really, I was just adding 2hdds to my system, setting bios boot order and that was all. after booting up old msdos 2tb came up with no fuss, 3tb was gone just like that!
using gentoo x64, with kernel 2.6.39
mounting or any fsck (with different superblock addresses etc) or other thooting attempts, I was getting infamous:
wrong fs type, bad option, bad superblock on /dev/sda7,missing codepage or helper program

for a moment there I thought i am in a really deep trouble, then found another (now I know) great tool: testdisk, which had no problem at all reading the drive... like nothing happened! mount on OS still didnt work though, rescue options from testdisk didnt work either unfortunately (write command)
then eventually landed here and thanks again to OP!!!

I really wonder what happened? my best idea - possibly flaky implementation of gpt in the kernel...

srs5694
21st April 2012, 03:50 PM
bodo666,

I have never heard of GPT problems in the kernel. You haven't presented enough details of what your problem is to make me willing to venture a diagnosis, but for rtguille's problem, my suggestion in post #20 is still my best guess: An errant RAID setting in the firmware was causing inconsistencies in how the firmware and Linux itself treated the disk, which in turn led to recurring corruption of the GPT data. That might be what's happened to you, too, or it could be something completely unrelated. (Yes, I know you say you're not using RAID -- but have you checked your firmware's RAID settings? They can be enabled even if your OS isn't using them, and if the two don't match, it's a recipe for problems!) Note that I'm the author of GPT fdisk, a Linux GPT partitioning tool, so I understand GPT pretty well and I've had lots of experience with it.

One more point: Don't rely too much on TestDisk. It's a useful tool, but by its very nature it's unreliable. Sometimes it works wonders, but other times it fails miserably, and can even make matters worse. I'm not trying to criticize TestDisk, just point out its limitations. It's meant for recovering partitions when their partition table entries have been completely lost, but leftover data from old partition setups can sometimes make this task very difficult. I consider it a partitioning tool of last resort (or next-to-last resort, the last being going in with a sector editor to try to find your partitions manually -- but that's well beyond most peoples' ability).

chrismurphy
21st April 2012, 08:06 PM
looks like gpt is nowhere near stability of old-style msdos label system.

I really wonder what happened? my best idea - possibly flaky implementation of gpt in the kernel...

Really? That's your best idea? The GPT is a static thing. If it's correct, the kernel has the contents read, makes a decision, and then moves on to other things that have nothing to do with the GPT. It's not at all like a file system, which is constantly being read, modified, and written. Yet you think something that is rarely, often never again, modified is the problem?

Either you want to track down the real source of your problem, and you're willing to do the work required to track it down. Or not. Let us know which it is and try not to sound like you're trolling.

bodo666
21st April 2012, 10:53 PM
hey guys, didn't mean to troll, sometimes it just comes out that way :doh:, apologies. also had no idea would find such an audience here :-). first time tried gpt and such an ugly surprise after just 2 weeks using it after all I did was shifting a few disks around.

if you are interested in following it and maybe helping i can share further experiences here...bear in mind this might get gentoo-specific. yeah, problem is not completely fixed, although i got a good step closer thx to info from op.

so here is the latest from what it looks like:
- gpt table got messed up a bit by testdisk [write] command, like srs hinted might happen
- it appears table was good the whole time (yes, apologies again :) )
- anyway i was getting very similar messages/errors like op mentioned, same steps as mentioned by op seemed to fix and get me my data back
- ...but all stopped working after reboot, again "wrong fs type, bad option, bad superblock on /dev/sda7,missing codepage or helper program"
- ok, so i rechecked the tables as before with gdisk/parted etc, all seemed well and good this time, so this didnt compute at all
- tried redetection then with "partprobe" and voila! could magically mount the fs again right after

what does it tell me? not yet sure, don't know this part of OS too well. any hints/clues welcomed. I will do some furhter research in meantime and possibly upgrade to kernel 3.2-something, oughta be good fun - didnt do it in a while.

about the controller - this is jetway nf93 mobo, penryn based itx board, i am using onboard intel ICH9, set to AHCI in BIOS(A07) (1-SSD,3 big HDDs, 4 ports total)
in linux 2 drivers enabled only:
- sata ahci
- pata_jmicron (i have a 2.5 hdd pata connected here too, currently wiped (dd if=/dev/zero etc) to be used later)


current dmesg:
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133
ata6.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata6.00: configured for UDMA/133
scsi 5:0:0:0: Direct-Access ATA WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5
sd 5:0:0:0: [sdd] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
sd 5:0:0:0: [sdd] 4096-byte physical blocks
sd 5:0:0:0: [sdd] Write Protect is off
sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdd: detected capacity change from 0 to 3000592982016
sdd: sdd1
sd 5:0:0:0: [sdd] Attached SCSI disk
---
EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (sdd1): VFS: Can't find ext4 filesystem

chrismurphy
21st April 2012, 11:21 PM
Well there's not much information to go on because you don't have the problem anymore it seems, or a reproducible condition. While you're making more changes, like kernel versions, as well.

If you're getting bad superblock messages on reboot, then I'm suspicious of the following:

a.) What time is it per the computer? Is your computer losing time or are you changing time zones? e2fsprogs currently gets pissy when the superblock date is in the future. So if your computer is in the past compared to the superblock, you may be getting a bogus error message.

b.) Like the original poster, are you certain you don't have any BIOS based RAID enabled? No GPT tools screw around with superblocks. Something else is. The fact partprobe is resolving some condition makes me wonder if something is messing with your MBR, like software RAID, and the kernel is starting out with GPT, then switching to MBR after a partprobe.

c.) What is the result of

fdisk -l /dev/sdd
I assume sdd is the problem disk? 3TB, so you have no choice but to use GPT if you want to use all 3TB.

d.) Have you done any SMART testing using smartmontools?

smartctl -a /dev/sdd
Post the results of that for starters.

Then later do:

smartctl -t long /dev/sdd
and then just leave the computer alone until the predicted complete time. You could still use the computer for casual tasks but you're beter off just leaving it along because testing is suspended unless the disk is idle.

Please post your results formatted properly using code tags. When you paste results into the forum reply to thread window, highlight the text, and click on the # button in the tool bar to add the code tags.

---------- Post added at 04:21 PM ---------- Previous post was at 04:16 PM ----------

And you know what? I think you should post these details in a whole new thread once you actually have something to go on. And a clear description of the problem. If you want you can references this thread's URL as background. But this thread is marked as solved and there's zero evidence your problem and this one are related.

And plus, resurrecting it after 5 months is kinda inappropriate, and just clutters up both your problem as well as the past thread.

srs5694
21st April 2012, 11:25 PM
Chrismurphy has posted some good suggestions for diagnostics. Another is this: Post the RESULTS.txt file that you can get by running the Boot Info Script. (http://sourceforge.net/projects/bootinfoscript/) This script collects several basic technical details on partitioning, filesystem, and boot loader configuration in one output file. Although it doesn't sound like your boot loader is messed up, the partitioning and filesystem data may be valuable in isolating the cause of the problem.

bodo666
21st April 2012, 11:58 PM
thx guys, fully agreed. i will open new thread if i get stuck then, with all info requested here.
btw. no there is no raid ON in bios. only 2 controllers are:
- onboard intel sata, set to ahci (other choice was ide or raid), legacy mode disabled
- onboard jmicron pata (has on/off setting only in bios)

bodo666
22nd April 2012, 05:56 PM
:dance:

that was extremely simple in the end...:
File Systems
Partition Types
EFI GUID Partition support (NEW)
in kernel source.

...yet weird:
- for 2 weeks my system was working without it :-)
- partprobe makes it work

thx very much for your responses. EOT

chrismurphy
22nd April 2012, 06:21 PM
You're compiling your own kernel?

---------- Post added at 11:21 AM ---------- Previous post was at 11:01 AM ----------

Aha. Gentoo Linux 2.6.39. Not Fedora, hence the confusion. And for whatever reason GPT isn't baked into that kernel by default it seems.

bodo666
22nd April 2012, 06:29 PM
YES, manual kernel config. for last 15 or so years and (rather) wouldn't have it any other way. i think it's still default for gentoo, too (no predefined .config)
problem is kernels are huge/full of options these days and last couple of years i am doing this quite infrequently and tend to forget my own golden rule ("in case of doubt always check the freaking kernel config first") and somehow I thought I had all partition types I ever used/would use proactively/just-in-case enabled, this was not true. also it didn't help that I had gpt disk working with GPT option disabled.

anyway hope it helps if sbd comes across this thread in the future.

thx again for yours and srs5694 responses.

manuelmongeg
12th June 2012, 07:15 AM
Very helpful thread. I just solved the no partitions issue by changing the SATA mode from "RAID On" to AHCI. Thanks guys.