PDA

View Full Version : Stop the flipflop.


glennzo
3rd March 2012, 09:07 PM
How can I stop /dev/sda from randomly becoming /dev/sdb upon reboot? This is keeping me from booting some of the 9-10 installed OS's on my desktop computer. Is there some BIOS setting I can change? Here's fdisk right now:

[root@phenom16 glenn>$ fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000001

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 104859647 52428800 7 HPFS/NTFS/exFAT
/dev/sda2 104859648 209717247 52428800 83 Linux
/dev/sda3 209717248 314574847 52428800 83 Linux
/dev/sda4 314576894 1953523711 819473409 5 Extended
/dev/sda5 314576896 1363152895 524288000 83 Linux
/dev/sda6 1363154944 1468012543 52428800 83 Linux
/dev/sda7 1468014592 1572872191 52428800 83 Linux
/dev/sda8 1572874240 1677731839 52428800 83 Linux
/dev/sda9 1677733888 1782591487 52428800 83 Linux
/dev/sda10 1782593536 1887451135 52428800 83 Linux
/dev/sda11 1887453184 1949329407 30938112 83 Linux
/dev/sda12 1949331456 1953523711 2096128 82 Linux swap / Solaris

Disk /dev/sdb: 250.1 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders, total 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x2feb88e8

Device Boot Start End Blocks Id System
/dev/sdb1 2048 488396799 244197376 7 HPFS/NTFS/exFAT

Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x06afd788

Device Boot Start End Blocks Id System
/dev/sdc1 2048 976773119 488385536 83 Linux


/dev/sda is SATA
/dev/sdb is USB
/dev/sdc is SATA

If I reboot this computer there's a good chance that the drives will flipflop and suddenly what is now /dev/sda will become /dev/sdb, or it might decide to become /dev/sdc. Who knows?

The system I'm posting from is on what is now /dev/sda2, Fedora 16. That boots no matter what the drive becomes. Everything in the boot menu works too unless the drives flipflop. Seems that then I have trouble with /dev/sda11, another Fedora 16. I think /dev/sda10 is OK too. If I remember correctly that is Mint 12. I never have a problem booting that one either. It's /dev/sda11 that is troublesome.

Maybe it's not a case of drive "flipflop" that is causing my hair loss but something about booting from /dev/sda11 or booting from higher numbered partitions that grub2 doesn't like?

This is becoming annoying. I even removed the 4th drive, an EIDE disk, to see if that was causing problems but it didn't make any difference.

jpollard
3rd March 2012, 09:25 PM
That is why UUID mount specifications got added.

If you set the volumn label on the filesystem (make sure all are unique though) you can mount that way.

The problem is that it depends on how the devices are scanned. If all of them power up and are ready before the scan (not likely) then they will be consistent and in numerical order. Unfortunately, drives are different, and respond differently to scans.

I have 3 SATA drives, unit 0, 1, 2 (by the controller cables). However, during powerup/scan, 0,1 (both are Samsung HD250HJs) they originally came up as sda and sdb. When I added the third drive (a 2 TB WDC WD20EARS-00M ) then suddenly they show up as sda and sdc... with the 2TB drive as sdb.

And looking at /proc/devices doesn't identify them properly either.
The two Samsung drives show up:

cat scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD250HJ Rev: FH10
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: ATA Model: WDC WD20EARS-00M Rev: 51.0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: SAMSUNG HD250HJ Rev: FH10
Type: Direct-Access ANSI SCSI revision: 05


So even the channel/id/lun identification is lost.

That results in the only way to identify partitions on them is by UUID.

I'm still using legacy grub, (and there should be an equivalent with grub2) and I specify the boots (and mounts)
by UUID:

UUID=83445c29-479e-4545-a625-f696ea854f70 / ext4 defaults 1 1
UUID=1e46656e-dbb7-4f72-947c-d7dc07a77d0a /boot ext4 defaults 1 2
UUID=8244ec1d-e06e-4523-af8c-105eb72a7da2 swap swap defaults 0 0


And the grub.conf file also has UUID specs like:

title Fedora 14 (2.6.35.14-106.fc14.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.35.14-106.fc14.x86_64 ro root=UUID=aebf0d86-4dbb-47e8-b90f-b494ca9c8050 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
initrd /initramfs-2.6.35.14-106.fc14.x86_64.img


The only downside is that UUIDs change when you recreate a filesystem (mkfs). The nice part is that you can set the UUID as you prefer (at least for ext2/3/4.. with tune2fs, and swap partitions with swaplabel)

BTW, I do realize the grub.conf entry I showed is not for the current system - it is my backup safety net for Fedora 14 while I'm still fighting F16 bugs (mostly the HP printer now).

glennzo
3rd March 2012, 09:31 PM
If I read correctly there is no way to get these drives to sit still then.

marko
3rd March 2012, 10:06 PM
If I read correctly there is no way to get these drives to sit still then.

No, I think in general your UUIDs will be distinct per physical partition just like jpollard said.
If you have three devices and the second one in the middle goes away (say you umount and power down a usb disk) and you reboot, the 1st and 3rd UUID will still rebind to the same /dev/sdX and /dev/sdZ they did before (ie. that usb disk /dev/sdY is gone).

But if you want to be sure, try running blkid and save the output to a file, then reboot in a way that in the past messes up the mapping then re-run blkid again. The two should have matching entries (some may be added or gone due to powering up or down devices between boots, but any devices in both should match)

sonoran
3rd March 2012, 11:07 PM
Both legacy grub and grub2 can use the device.map file to map BIOS drives to OS devices.
The format is GRUB_DEVICE OS_FILE. In my legacy grub device.map the OS_FILE is specified
as /dev/sdx - I'm wondering if you could use UUIDs there, like /dev/disk/by-uuid/xxxx

Similar to jpollard's grub.conf, my arch menu.lst uses UUIDs, with a slightly different syntax:

title Arch Linux 0
root (hd0,0)
kernel /vmlinuz-linux root=/dev/disk/by-uuid/aad83825-7195-435d-8d73-64a41764e30a rootfstype=ext4 ro
initrd /initramfs-linux.img

I'm not sure grub can handle a device.map written that way but it might be worth a shot.
From info grub:
The reason why the grub shell gives you the device map file is that
it cannot guess the map between BIOS drives and OS devices correctly in
some environments. For example, if you exchange the boot sequence
between IDE and SCSI in your BIOS, it gets the order wrong.

glennzo
3rd March 2012, 11:54 PM
OK Gentlemen. Thank you for a ton of very interesting information. I appreciate the your time and effort but I think I found the real issue for me, and I stand firm on my claim that the drives are moving around. /dev/sda11 is the only partition that will not boot reliably. Occasionally it will but for the most part no. That said, I've rebooted this box several times now and the blkid information remains the same regardless of whether I reboot or power down. And, regardless, /dev/sda11 probably won't boot. I've also looked at the boot menu in detail several times. While all entries have a line similar to this
linux /boot/vmlinuz-x.x.x.x root=UUID=rgt4645dh54uyblahblahblah Fedora on /dev/sda11 has the following
linux /boot/vmlinuz-3.2.7-1.fc16.x86_64 root=/dev/sda11
If I change root=/dev/sda11 to root=/dev/sdb11 it boots. How can I make the change permanent?

For what it's worth I'm booting Fedora 16 on 2 different partitions, Fedora15, Fedora14, CentOS, Scientific, Arch and Windows 7. These are all in the same physical disk, /dev/sdb (apparently).

marko
4th March 2012, 12:59 AM
OK Gentlemen. Thank you for a ton of very interesting information. I appreciate the your time and effort but I think I found the real issue for me, and I stand firm on my claim that the drives are moving around. /dev/sda11 is the only partition that will not boot reliably. Occasionally it will but for the most part no. That said, I've rebooted this box several times now and the blkid information remains the same regardless of whether I reboot or power down. And, regardless, /dev/sda11 probably won't boot. I've also looked at the boot menu in detail several times. While all entries have a line similar to this
linux /boot/vmlinuz-x.x.x.x root=UUID=rgt4645dh54uyblahblahblah Fedora on /dev/sda11 has the following
linux /boot/vmlinuz-3.2.7-1.fc16.x86_64 root=/dev/sda11If I change root=/dev/sda11 to root=/dev/sdb11 it boots. How can I make the change permanent?

For what it's worth I'm booting Fedora 16 on 2 different partitions, Fedora15, Fedora14, CentOS, Scientific, Arch and Windows 7. These are all in the same physical disk, /dev/sdb (apparently).


Can't you change that one Fedora that has the /dev/sdb11 style root label to use the proper UUID for its partition? I would think you could boot to it, run blkid and take the UUID for "/" and put that in its grub file. Is that one Fedora so old that it didn't have UUID support? Only in that case would be not have that option.

glennzo
4th March 2012, 10:45 AM
Can't you change that one Fedora that has the /dev/sdb11 style root label to use the proper UUID for its partition? I would think you could boot to it, run blkid and take the UUID for "/" and put that in its grub file. Is that one Fedora so old that it didn't have UUID support? Only in that case would be not have that option.

This is a fairly recent install of Fedora 16, within the last 2 months.

Where do you propose I add the BLKID information? This is all GRUB2 stuff.

rtguille
5th March 2012, 01:21 AM
Divide your issue into two:

i1- device number flip-flop (very funny thead name)

i2- configuring linux/fedora to be imune to shuch scenarios.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~

i1 - Device number change:

* This should happen if you remove or insert a hard-disk or mass storage device.
If you do not add or remove devices, the device order should not change for no reason.

* Device order can also be changed by grub/grub2/lilo (for example to boot a windows in disk1 from disk0)
you mentioned that you have a very dense operating system population on your disks, check your
bootloaders and check that no remaping is taking place.

i2 - Configuring linux/fedora to support disk device order change.

* the previous comments seems to be adecuate in this matter.