PDA

View Full Version : Fedora Core 6, Dell PowerEdge, and random lockups



frbird400
2nd February 2007, 11:10 PM
I installed Fedora Core 6 on a new Dell PowerEdge 2900. System was stable for the past month. Last night, some services became unresponsive (samba, ssh) yet it would respond to pings, nmap scans, etc. messages logfile reported the following:

Feb 1 17:41:43 webserver kernel: list_del corruption. prev->next should be e52e0218, but was eb000000
Feb 1 17:41:43 webserver kernel: ------------[ cut here ]------------
Feb 1 17:41:43 webserver kernel: kernel BUG at lib/list_debug.c:65!
Feb 1 17:41:43 webserver kernel: invalid opcode: 0000 [#1]
Feb 1 17:41:43 webserver kernel: SMP
Feb 1 17:41:43 webserver kernel: last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Feb 1 17:41:43 webserver kernel: Modules linked in: nls_utf8 ppdev nfs fscache nfsd exportfs lockd nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport st sg bnx2 ide_cd serio_raw cdrom pcspkr dm_snapshot dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Feb 1 17:41:43 webserver kernel: CPU: 0
Feb 1 17:41:43 webserver kernel: EIP: 0060:[<c04e99ab>] Not tainted VLI
Feb 1 17:41:43 webserver kernel: EFLAGS: 00010296 (2.6.18-1.2868.fc6 #1)
Feb 1 17:41:43 webserver kernel: EIP is at list_del+0x23/0x6c
Feb 1 17:41:43 webserver kernel: eax: 00000048 ebx: e52e0218 ecx: c067e1d0 edx: 00000096
Feb 1 17:41:43 webserver kernel: esi: e52e01c8 edi: f7d08240 ebp: 00000282 esp: f7feff38
Feb 1 17:41:43 webserver kernel: ds: 007b es: 007b ss: 0068
Feb 1 17:41:43 webserver kernel: Process events/0 (pid: 14, ti=f7fef000 task=f7d29140 task.ti=f7fef000)
Feb 1 17:41:43 webserver kernel: Stack: c0641c4f e52e0218 eb000000 e52e01c0 c04bbb5c e52e01c0 e52e01c8 f7d08240
Feb 1 17:41:43 webserver kernel: c04bb7a3 c068b260 c068b264 c0433c18 00000246 f7d08240 f7d08260 c04bb6ec
Feb 1 17:41:43 webserver kernel: 00000000 f7d08260 f7d08240 f7d08258 00000000 c0434508 00000001 00000000
Feb 1 17:41:43 webserver kernel: Call Trace:
Feb 1 17:41:43 webserver kernel: [<c04bbb5c>] keyring_destroy+0x28/0x65
Feb 1 17:41:43 webserver kernel: [<c04bb7a3>] key_cleanup+0xb7/0xd0
Feb 1 17:41:43 webserver kernel: [<c0433c18>] run_workqueue+0x83/0xc5
Feb 1 17:41:43 webserver kernel: [<c0434508>] worker_thread+0xd9/0x10d
Feb 1 17:41:43 webserver kernel: [<c04369db>] kthread+0xc0/0xed
Feb 1 17:41:43 webserver kernel: [<c0404dab>] kernel_thread_helper+0x7/0x10
Feb 1 17:41:43 webserver kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
Feb 1 17:41:43 webserver kernel: Leftover inexact backtrace:
Feb 1 17:41:43 webserver kernel: =======================
Feb 1 17:41:43 webserver kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 4f 1c 64 c0 e8 2b be f3 ff <0f> 0b 41 00 8c 1c 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04
Feb 1 17:41:43 webserver kernel: EIP: [<c04e99ab>] list_del+0x23/0x6c SS:ESP 0068:f7feff38
Feb 1 17:41:53 webserver kernel: <3>BUG: soft lockup detected on CPU#0!
Feb 1 17:41:53 webserver kernel: [<c04051db>] dump_trace+0x69/0x1af
Feb 1 17:41:53 webserver kernel: [<c0405339>] show_trace_log_lvl+0x18/0x2c
Feb 1 17:41:53 webserver kernel: [<c04058ed>] show_trace+0xf/0x11
Feb 1 17:41:53 webserver kernel: [<c04059ea>] dump_stack+0x15/0x17
Feb 1 17:41:53 webserver kernel: [<c044da6d>] softlockup_tick+0xad/0xc4
Feb 1 17:41:53 webserver kernel: [<c042e57a>] update_process_times+0x39/0x5c
Feb 1 17:41:53 webserver kernel: [<c0418914>] smp_apic_timer_interrupt+0x5c/0x64
Feb 1 17:41:53 webserver kernel: [<c0404ad3>] apic_timer_interrupt+0x1f/0x24
Feb 1 17:41:53 webserver kernel: DWARF2 unwinder stuck at apic_timer_interrupt+0x1f/0x24
Feb 1 17:41:53 webserver kernel: Leftover inexact backtrace:
Feb 1 17:41:53 webserver kernel: [<c06123ff>] __write_lock_failed+0xf/0x20
Feb 1 17:41:53 webserver kernel: [<c04e9891>] _raw_write_lock+0x5d/0x74
Feb 1 17:41:53 webserver kernel: [<c04bc073>] keyring_publish_name+0x2c/0x6d
Feb 1 17:41:53 webserver kernel: [<c04bc0c2>] keyring_instantiate+0xe/0x13
Feb 1 17:41:53 webserver kernel: [<c04bb010>] __key_instantiate_and_link+0x2f/0xa8
Feb 1 17:41:53 webserver kernel: [<c04bc283>] keyring_alloc+0x53/0x6a
Feb 1 17:41:53 webserver kernel: [<c04bd900>] alloc_uid_keyring+0x4c/0xb2
Feb 1 17:41:53 webserver kernel: [<c042e9f1>] alloc_uid+0x95/0x13c
Feb 1 17:41:53 webserver kernel: [<c0431850>] set_user+0xb/0x8e
Feb 1 17:41:53 webserver kernel: [<c043311b>] sys_setresuid+0x111/0x1dd
Feb 1 17:41:53 webserver kernel: [<c0404013>] syscall_call+0x7/0xb
Feb 1 17:41:53 webserver kernel: =======================
Feb 1 17:42:48 webserver kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
Feb 1 17:42:48 webserver kernel: printing eip:
Feb 1 17:42:48 webserver kernel: c04e6890
Feb 1 17:42:48 webserver kernel: *pde = 40349067
Feb 1 17:42:48 webserver kernel: Oops: 0000 [#2]
Feb 1 17:42:48 webserver kernel: SMP
Feb 1 17:42:48 webserver kernel: last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Feb 1 17:42:48 webserver kernel: Modules linked in: nls_utf8 ppdev nfs fscache nfsd exportfs lockd nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport st sg bnx2 ide_cd serio_raw cdrom pcspkr dm_snapshot dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Feb 1 17:42:48 webserver kernel: CPU: 1
Feb 1 17:42:48 webserver kernel: EIP: 0060:[<c04e6890>] Not tainted VLI
Feb 1 17:42:48 webserver kernel: EFLAGS: 00210282 (2.6.18-1.2868.fc6 #1)
Feb 1 17:42:48 webserver kernel: EIP is at __rb_rotate_left+0xc/0x50
Feb 1 17:42:48 webserver kernel: eax: e52e0448 ebx: e52e0448 ecx: e52e0b48 edx: 00000000
Feb 1 17:42:48 webserver kernel: esi: f621eec8 edi: c080f8c0 ebp: e52e0b48 esp: e7db9ed0
Feb 1 17:42:48 webserver kernel: ds: 007b es: 007b ss: 0068
Feb 1 17:42:48 webserver kernel: Process smbd (pid: 7505, ti=e7db9000 task=f7f372c0 task.ti=e7db9000)
Feb 1 17:42:48 webserver kernel: Stack: e52e01c8 e52e0448 e52e01c8 c04e69b0 c080f8c0 e52e01d0 00000000 e52e0ba0
Feb 1 17:42:48 webserver kernel: d6bd4840 c04bb4ef 00000200 c068b2a0 00000000 e52e0b40 e52e0b40 0000001d
Feb 1 17:42:48 webserver kernel: 00000200 f621d940 e7db9f50 c07fdf10 c04bc260 ffffffff f7f372c0 1f3f0000

frbird400
2nd February 2007, 11:11 PM
Feb 1 17:42:48 webserver kernel: Call Trace:
Feb 1 17:42:48 webserver kernel: [<c04e69b0>] rb_insert_color+0x8c/0xad
Feb 1 17:42:48 webserver kernel: [<c04bb4ef>] key_alloc+0x267/0x330
Feb 1 17:42:48 webserver kernel: [<c04bc260>] keyring_alloc+0x30/0x6a
Feb 1 17:42:48 webserver kernel: [<c04bd900>] alloc_uid_keyring+0x4c/0xb2
Feb 1 17:42:48 webserver kernel: [<c042e9f1>] alloc_uid+0x95/0x13c
Feb 1 17:42:48 webserver kernel: [<c0431850>] set_user+0xb/0x8e
Feb 1 17:42:49 webserver kernel: [<c043311b>] sys_setresuid+0x111/0x1dd
Feb 1 17:42:49 webserver kernel: [<c0404013>] syscall_call+0x7/0xb
Feb 1 17:42:49 webserver kernel: DWARF2 unwinder stuck at syscall_call+0x7/0xb
Feb 1 17:42:49 webserver kernel: Leftover inexact backtrace:
Feb 1 17:42:49 webserver kernel: =======================
Feb 1 17:42:49 webserver kernel: Code: 00 00 00 8b 45 04 83 c1 14 d3 e2 85 d0 75 05 09 d0 89 45 04 5a 89 f0 59 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 8b 50 04 89 c3 8b 30 <8b> 4a 08 83 e6 fc 85 c9 89 48 04 74 09 8b 01 83 e0 03 09 d8 89
Feb 1 17:42:49 webserver kernel: EIP: [<c04e6890>] __rb_rotate_left+0xc/0x50 SS:ESP 0068:e7db9ed0
Feb 1 17:42:58 webserver kernel: <3>BUG: soft lockup detected on CPU#1!
Feb 1 17:42:58 webserver kernel: [<c04051db>] dump_trace+0x69/0x1af
Feb 1 17:42:58 webserver kernel: [<c0405339>] show_trace_log_lvl+0x18/0x2c
Feb 1 17:42:58 webserver kernel: [<c04058ed>] show_trace+0xf/0x11
Feb 1 17:42:58 webserver kernel: [<c04059ea>] dump_stack+0x15/0x17
Feb 1 17:42:58 webserver kernel: [<c044da6d>] softlockup_tick+0xad/0xc4
Feb 1 17:42:58 webserver kernel: [<c042e57a>] update_process_times+0x39/0x5c
Feb 1 17:42:58 webserver kernel: [<c0418914>] smp_apic_timer_interrupt+0x5c/0x64
Feb 1 17:42:58 webserver kernel: [<c0404ad3>] apic_timer_interrupt+0x1f/0x24
Feb 1 17:42:58 webserver kernel: DWARF2 unwinder stuck at apic_timer_interrupt+0x1f/0x24
Feb 1 17:42:58 webserver kernel: Leftover inexact backtrace:
Feb 1 17:42:58 webserver kernel: [<c052007b>] acpi_bus_check_device+0x5d/0x63
Feb 1 17:42:58 webserver kernel: [<c04e97c7>] _raw_spin_lock+0x6f/0xdc
Feb 1 17:42:58 webserver kernel: [<c04bb45b>] key_alloc+0x1d3/0x330
Feb 1 17:42:58 webserver kernel: [<c04bc260>] keyring_alloc+0x30/0x6a
Feb 1 17:42:58 webserver kernel: [<c04bd900>] alloc_uid_keyring+0x4c/0xb2
Feb 1 17:42:58 webserver kernel: [<c042e9f1>] alloc_uid+0x95/0x13c
Feb 1 17:42:58 webserver kernel: [<c0431850>] set_user+0xb/0x8e
Feb 1 17:42:58 webserver kernel: [<c043311b>] sys_setresuid+0x111/0x1dd
Feb 1 17:42:58 webserver kernel: [<c0404013>] syscall_call+0x7/0xb
Feb 1 17:42:58 webserver kernel: =======================
Feb 1 17:43:13 webserver kernel: BUG: soft lockup detected on CPU#3!
Feb 1 17:43:13 webserver kernel: [<c04051db>] dump_trace+0x69/0x1af
Feb 1 17:43:13 webserver kernel: [<c0405339>] show_trace_log_lvl+0x18/0x2c
Feb 1 17:43:13 webserver kernel: [<c04058ed>] show_trace+0xf/0x11
Feb 1 17:43:13 webserver kernel: [<c04059ea>] dump_stack+0x15/0x17
Feb 1 17:43:13 webserver kernel: [<c044da6d>] softlockup_tick+0xad/0xc4
Feb 1 17:43:13 webserver kernel: [<c042e57a>] update_process_times+0x39/0x5c
Feb 1 17:43:13 webserver kernel: [<c0418914>] smp_apic_timer_interrupt+0x5c/0x64
Feb 1 17:43:13 webserver kernel: [<c0404ad3>] apic_timer_interrupt+0x1f/0x24
Feb 1 17:43:13 webserver kernel: DWARF2 unwinder stuck at apic_timer_interrupt+0x1f/0x24
Feb 1 17:43:13 webserver kernel: Leftover inexact backtrace:
Feb 1 17:43:13 webserver kernel: [<c052007b>] acpi_bus_check_device+0x5d/0x63
Feb 1 17:43:13 webserver kernel: [<c04e97c7>] _raw_spin_lock+0x6f/0xdc
Feb 1 17:43:13 webserver kernel: [<c04bb45b>] key_alloc+0x1d3/0x330
Feb 1 17:43:13 webserver kernel: [<c04bc260>] keyring_alloc+0x30/0x6a
Feb 1 17:43:13 webserver kernel: [<c04bd900>] alloc_uid_keyring+0x4c/0xb2
Feb 1 17:43:13 webserver kernel: [<c042e9f1>] alloc_uid+0x95/0x13c
Feb 1 17:43:13 webserver kernel: [<c0431850>] set_user+0xb/0x8e
Feb 1 17:43:13 webserver kernel: [<c043311b>] sys_setresuid+0x111/0x1dd
Feb 1 17:43:13 webserver kernel: [<c0404013>] syscall_call+0x7/0xb
Feb 1 17:43:13 webserver kernel: =======================


I rebooted and updated all packages including the latest kernel hoping to solve the problem. Same thing happened this morning with the following in the logfile:



Feb 2 11:17:43 webserver kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
Feb 2 11:17:43 webserver kernel: printing eip:
Feb 2 11:17:43 webserver kernel: c04f09db
Feb 2 11:17:43 webserver kernel: *pde = c47c2067
Feb 2 11:17:43 webserver kernel: Oops: 0000 [#1]
Feb 2 11:17:43 webserver kernel: SMP
Feb 2 11:17:43 webserver kernel: last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:06:00.0/0000:07:00.0/0000:08:00.0/0000:09:00.0/irq
Feb 2 11:17:43 webserver kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport st ata_piix libata pcspkr bnx2 ide_cd sg serio_raw cdrom dm_snapshot dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Feb 2 11:17:43 webserver kernel: CPU: 0
Feb 2 11:17:43 webserver kernel: EIP: 0060:[<c04f09db>] Not tainted VLI
Feb 2 11:17:43 webserver kernel: EFLAGS: 00010246 (2.6.19-1.2895.fc6 #1)
Feb 2 11:17:43 webserver kernel: EIP is at rb_erase+0xf6/0x22f
Feb 2 11:17:43 webserver kernel: eax: 00000001 ebx: 00000000 ecx: 00000000 edx: f636bec8
Feb 2 11:17:43 webserver kernel: esi: f636bec8 edi: ea34af48 ebp: c086b900 esp: f7feff44
Feb 2 11:17:43 webserver kernel: ds: 007b es: 007b ss: 0068
Feb 2 11:17:43 webserver kernel: Process events/0 (pid: 14, ti=f7fef000 task=f7d49110 task.ti=f7fef000)
Feb 2 11:17:43 webserver kernel: Stack: 00000001 ea34af40 ea34af48 f7d22540 00000282 c04c466b c06a3e40 c06a3e44
Feb 2 11:17:43 webserver kernel: c04368c7 00000282 f7d22540 f7d22560 c04c4624 00000000 f7d22560 f7d22540
Feb 2 11:17:43 webserver kernel: f7d22558 00000000 c0437284 00000001 00000000 00000000 00010000 00000000
Feb 2 11:17:43 webserver kernel: Call Trace:
Feb 2 11:17:43 webserver kernel: [<c04c466b>] key_cleanup+0x47/0xd0
Feb 2 11:17:43 webserver kernel: [<c04368c7>] run_workqueue+0x97/0xdd
Feb 2 11:17:43 webserver kernel: [<c0437284>] worker_thread+0xd9/0x10d
Feb 2 11:17:43 webserver kernel: [<c0439810>] kthread+0xc0/0xec
Feb 2 11:17:43 webserver kernel: [<c0404c03>] kernel_thread_helper+0x7/0x10
Feb 2 11:17:43 webserver kernel: =======================
Feb 2 11:17:43 webserver kernel: Code: 05 89 5a 08 eb 08 89 5a 04 eb 03 89 5d 00 83 3c 24 01 0f 85 46 01 00 00 e9 12 01 00 00 8b 4e 08 39 d9 0f 85 85 00 00 00 8b 4e 04 <8b> 01 a8 01 75 14 83 c8 01 89 ea 89 01 89 f0 83 26 fe e8 1e fd
Feb 2 11:17:43 webserver kernel: EIP: [<c04f09db>] rb_erase+0xf6/0x22f SS:ESP 0068:f7feff44



Can anybody make heads or tails out of these messages? I am going to replace the memory to see if that helps. The logfile does not contain any messages immediately before the kernel messages so it is difficult to pinpoint what is causing this. Any help would be GREATLY appreciated.