PDA

View Full Version : Disk failing ?



hiwe
26th May 2008, 10:32 AM
I get regular error messages like this one in /var/log/messages


May 26 11:07:14 haka kernel: sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
May 26 11:07:14 haka kernel: sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
May 26 11:07:14 haka kernel: end_request: I/O error, dev sdb, sector 0
May 26 11:07:14 haka kernel: Buffer I/O error on device sdb, logical block 0


The sdb device reference is confusing me. This is a HP DL360 with hardware RAID (Smart array P400i)


[root@haka]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HL-DT-ST Model: CD-ROM GCR-8240N Rev: 2.03
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: COMPAQ Model: HSV110 (C)COMPAQ Rev: 3010
Type: RAID ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 00 Lun: 01
Vendor: COMPAQ Model: HSV110 (C)COMPAQ Rev: 2001
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 01 Lun: 00
Vendor: COMPAQ Model: HSV110 (C)COMPAQ Rev: 3010
Type: RAID ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 01 Lun: 01
Vendor: COMPAQ Model: HSV110 (C)COMPAQ Rev: 2001
Type: Direct-Access ANSI SCSI revision: 02



[root@haka]# cat /proc/driver/cciss/cciss0
cciss0: HP Smart Array P400i Controller
Board ID: 0x3235103c
Firmware Version: 2.08
IRQ: 2292
Logical drives: 1
Max sectors: 2048
Current Q depth: 0
Current # commands on controller: 0
Max Q depth since init: 17
Max # commands on controller since init: 27
Max SG entries since init: 31
Sequential access devices: 0

cciss/c0d0: 733.91GB RAID 5


I addition I have a PCI card with a fiber link into a SAN. Here is the relevant lines from lspci:


06:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 01)
0b:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)


The output of mount is:


[root@haka]# mount
/dev/cciss/c0d0p3 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/cciss/c0d0p5 on /home type ext3 (rw)
/dev/cciss/c0d0p1 on /boot11 type ext3 (rw)
/dev/mapper/mpath0p1 on /backup type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)


The /backup directory is (should be) the SAN.

Which disk is actually sdb (and sda, sda1) ?

notageek
26th May 2008, 11:09 AM
As you prolly already know the disks in HP DL360 are seen as



/dev/cciss/c0d0p3 on / type ext3 (rw)
/dev/cciss/c0d0p5 on /home type ext3 (rw)
/dev/cciss/c0d0p1 on /boot11 type ext3 (rw)

Where c0 is controller 0, d0 disk0, and pX is partition.

Having said that, it is indeed confusing to see /dev/sdbX in var log messages, I'm suspecting maybe a USB thumb drive is connected to the box?

Is there a possibility that you can run fdisk -ls on the box? It'd be interesting to see its output.

hiwe
26th May 2008, 11:52 AM
No USB drive connected, I have double checked. I should also mention that everything seems to work fine. The only indication I have for something being wrong is the log. Here is the fdisk -ls output:

[root@haka]# fdisk -ls


Disk /dev/cciss/c0d0: 733.9 GB, 733910294528 bytes
255 heads, 63 sectors/track, 89226 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0004c284

Device Boot Start End Blocks Id System
/dev/cciss/c0d0p1 * 1 13 104391 83 Linux
/dev/cciss/c0d0p2 14 140 1020127+ 82 Linux swap / Solaris
/dev/cciss/c0d0p3 141 25636 204796620 83 Linux
/dev/cciss/c0d0p4 25637 89226 510786675 5 Extended
/dev/cciss/c0d0p5 25637 89226 510786643+ 83 Linux

Disk /dev/sda: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0007aec7

Device Boot Start End Blocks Id System
/dev/sda1 * 1 52216 419424988+ 83 Linux

Disk /dev/dm-0: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0007aec7

Device Boot Start End Blocks Id System
/dev/dm-0p1 * 1 52216 419424988+ 83 Linux

Disk /dev/dm-1: 429.4 GB, 429491188224 bytes
255 heads, 63 sectors/track, 52215 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

Disk /dev/dm-1 doesn't contain a valid partition table

notageek
26th May 2008, 11:59 AM
I hate device mapper!!

Well atleast we found /dev/sda which according to the output seems is used by dev mapper with /dev/dm-0.

And my suspicion is /dev/sdb is /dev/dm-1.

I presume this is a server, hence perhaps it hasn't been rebooted in a while, otherwise information on /dev/sdX can be found by dmesg|grep sd


Edit: This is rather strange, since all hard drives attached to this box should be connected to the cciss controller, and for a drive to be seen as /dev/sdX on this box means its not connected to the controller. The best chance of identifying the drive is dmesg|sd, beyond that I really don't know what's going on. :D

JEO
26th May 2008, 01:40 PM
fdisk doesn't list the partition device names very well (it guesses them). /dev/dm-1 is probably the first partition on the device mapper drive. /dev/dm-0 is the partition table. fdisk is trying to list /dev/dm-1 as a partition table thus the error.

hiwe
26th May 2008, 02:20 PM
The dmsg command gives nothing more than repeats of


Buffer I/O error on device sdb, logical block 104857584
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0

But /var/log/dmesg tells me more. Here are the important lines:


SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
HP CISS Driver (v 3.6.14)
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 16 (level, low) -> IRQ 16
cciss0: <0x3230> at PCI 0000:06:00.0 IRQ 2292 using DAC
blocks= 1433418544 block_size= 512
heads=255, sectors=32, cylinders=175665

blocks= 1433418544 block_size= 512
heads=255, sectors=32, cylinders=175665

cciss/c0d0: p1 p2 p3 p4 < p5 >
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
libata version 3.00 loaded.
ata_piix 0000:00:1f.1: version 2.12
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1f.1 to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x500 irq 14
ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x508 irq 15

[...]

scsi: waiting for bus probes to complete ...
scsi 2:0:0:0: RAID COMPAQ HSV110 (C)COMPAQ 3010 PQ: 0 ANSI: 2
scsi 2:0:0:1: Direct-Access COMPAQ HSV110 (C)COMPAQ 2001 PQ: 0 ANSI: 2
scsi 2:0:1:0: RAID COMPAQ HSV110 (C)COMPAQ 3010 PQ: 0 ANSI: 2
scsi 2:0:1:1: Direct-Access COMPAQ HSV110 (C)COMPAQ 2001 PQ: 0 ANSI: 2
sd 2:0:0:1: [sda] 838860800 512-byte hardware sectors (429497 MB)
sd 2:0:0:1: [sda] Write Protect is off
sd 2:0:0:1: [sda] Mode Sense: 97 00 10 08
sd 2:0:0:1: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 2:0:0:1: [sda] 838860800 512-byte hardware sectors (429497 MB)
sd 2:0:0:1: [sda] Write Protect is off
sd 2:0:0:1: [sda] Mode Sense: 97 00 10 08
sd 2:0:0:1: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
sda: sda1
sd 2:0:0:1: [sda] Attached SCSI disk
sd 2:0:1:1: [sdb] 838860800 512-byte hardware sectors (429497 MB)
sd 2:0:1:1: [sdb] Test WP failed, assume Write Enabled
sd 2:0:1:1: [sdb] Asking for cache data failed
sd 2:0:1:1: [sdb] Assuming drive cache: write through
sd 2:0:1:1: [sdb] 838860800 512-byte hardware sectors (429497 MB)
sd 2:0:1:1: [sdb] Test WP failed, assume Write Enabled
sd 2:0:1:1: [sdb] Asking for cache data failed
sd 2:0:1:1: [sdb] Assuming drive cache: write through
sdb:<6>sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
Dev sdb: unable to read RDB block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
unable to read partition table
sd 2:0:1:1: [sdb] Attached SCSI disk
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 838860672
Buffer I/O error on device sdb, logical block 104857584
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 838860672
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 8
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: cciss/c0d0p3: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 49777913
ext3_orphan_cleanup: deleting unreferenced inode 48657779
ext3_orphan_cleanup: deleting unreferenced inode 30638085
ext3_orphan_cleanup: deleting unreferenced inode 30638157
ext3_orphan_cleanup: deleting unreferenced inode 19235808
ext3_orphan_cleanup: deleting unreferenced inode 19235811
ext3_orphan_cleanup: deleting unreferenced inode 19235806
ext3_orphan_cleanup: deleting unreferenced inode 25135128
ext3_orphan_cleanup: deleting unreferenced inode 19234931
ext3_orphan_cleanup: deleting unreferenced inode 36175873
EXT3-fs: cciss/c0d0p3: 10 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
SELinux: Unregistering netfilter hooks
audit(1210234407.749:2): selinux=0 auid=4294967295
scsi 0:0:0:0: Attached scsi generic sg0 type 5
scsi 2:0:0:0: Attached scsi generic sg1 type 12
sd 2:0:0:1: Attached scsi generic sg2 type 0
scsi 2:0:1:0: Attached scsi generic sg3 type 12
sd 2:0:1:1: Attached scsi generic sg4 type 0
Driver 'sr' needs updating - please use bus_type methods
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray

[...]
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 838860792
printk: 18 messages suppressed.
Buffer I/O error on device sdb, logical block 104857599
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 838860792
Buffer I/O error on device sdb, logical block 104857599
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 838860792

[...]

sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 0
rport-2:0-4: blocked FC remote port time out: removing rport
sd 2:0:1:1: [sdb] Device not ready: Sense Key : Not Ready [current]
sd 2:0:1:1: [sdb] Device not ready: Add. Sense: Logical unit not ready, initializing command required
end_request: I/O error, dev sdb, sector 280
device-mapper: multipath: Failing path 8:16.
EXT3 FS on cciss/c0d0p3, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on cciss/c0d0p5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on cciss/c0d0p1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1020116k swap on /dev/cciss/c0d0p2. Priority:-1 extents:1 across:1020116k



Im no SCSI expert but it seems like one device/disk on the SCSI bus (for some reason named sdb) has problems.

The HP DL360 have 6 physical disks. If one of them is failing, are we able to tell which one from the dmesg output above ?

notageek
26th May 2008, 03:22 PM
The HP DL360 have 6 physical disks. If one of them is failing, are we able to tell which one from the dmesg output above ?

The dmesg above only tells /dev/sdb is bad.

There's a HP RAID utility that lets you monitor drives in the array without having to reboot the box (which I believe is your limitation, the easiest way to detect failing disk is to find it in RAID BIOS, for which you'll have to reboot)

I'm afraid you'll have to sift through "hpacucli help" (the command) to get the right result. The primary intent of giving you the link to this utility is for you to find out the serial number of the failing hdd, so that you can identify the failing hard drive.

Found here
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&swItem=MTX-66b08e49c28f4bd49f4641ed80&jumpid=reg_R1002_USEN


Hope this helps.