PDA

View Full Version : System freezes need to know where to look to find errors



tmick
31st October 2007, 05:16 AM
Hi,
I am running Linux 2.6.22.9-91.fc7 #1 SMP Thu Sep 27 23:10:59 EDT 2007 i686 athlon i386 GNU/Linux with the Nvidia-96xx drivers and the computer locks up, bad enough I have to power cycle the thing to get control of it back.
I have no idea why it is doing it and even less of an idea what is causing it. I have ran the dmesg command and can see nothing in there as to what the cause might be where else to look for errors would be greatly appreciated.

stevea
31st October 2007, 06:25 AM
Among all the problems the unexpected system hang is about the worst.

Does the display lock or go black ?
Does it fail when you use the vesa vid driver ?
How frequently does the error occur ?

One thing worthy of suspicion is the disk (or the driver or the interface cable or ...). When these fail you get no recorded messages. Memory failure will usually but not always gain you a screen message.

If the problem occurs frequently (say < 12 hours) I'd try some test runs. Run memtest86+ overnight to wring out memory. Perhaps boot into single user mode and beating all night on the disk (while true; do dd if=/dev/sda of=/dev/null; done). Then try running the vid with a basic vesa driver for a day.

If it only occurs occasionally I'd still try the above, but the result won't be definitive.

tmick
31st October 2007, 06:59 AM
Among all the problems the unexpected system hang is about the worst.

Does the display lock or go black ?
Does it fail when you use the vesa vid driver ?
How frequently does the error occur ?

One thing worthy of suspicion is the disk (or the driver or the interface cable or ...). When these fail you get no recorded messages. Memory failure will usually but not always gain you a screen message.

If the problem occurs frequently (say < 12 hours) I'd try some test runs. Run memtest86+ overnight to wring out memory. Perhaps boot into single user mode and beating all night on the disk (while true; do dd if=/dev/sda of=/dev/null; done). Then try running the vid with a basic vesa driver for a day.

If it only occurs occasionally I'd still try the above, but the result won't be definitive.
Display locks, GUI still displays but keyboard and mouse are unresponsive and stay that way until power cycled. It occurs intermittently (of course) I have updated the kernel to 2.6.23.1-10.fc7 I will see if that helps any. Are there any log files that are specific to the Kernel? I have the debug package installed for that, and the 96xx drivers too.
Interesting note I ran the command memtest86+ and received "command not found"

howlie69
12th November 2007, 07:50 PM
I agree! For what it's worth, I have had the same problem with Fedora 6 and 8. I have just had to go back to 7 to get my computer to run. Some things I've tried: memtest, running tests of each disk, running the system on each of 4 disks with the others unmounted, running from sata drive attached to pci controller with onboard sata disabled, running from onboard sata with pci controller removed, switching pci slots for controller, running with everything not strictly necessary disabled in bios, running with a pci network card instead of the onboard interface. Nothing but running Fedora 7 has worked so far. With 7 I can run apparently indefinitely. Unfortunately, this means I will eventually be running an unmaintained system. Perhaps a different release of Fedora would work for you?

tmick
13th November 2007, 12:05 AM
I think I have narrowed it down to Evolution hanging up for some dumb reason. If I close evolution and let the system run it's fine. If I do a update of the system and shut down Evolution while that runs it doesn't freeze. So I guess all that I need to figure out is how to capture the failure in a log file and recreate it.

thomwien
13th November 2007, 02:23 AM
Same problem here with FC8 on a x86_64 machine. The mouse and keyboard freeze, even though I can still see something running, such as GKrellm if I have that going at the time. Seems to have started with the last yum update, not sure if it was the kernel or something else that updated.

foot596
13th November 2007, 08:25 AM
after upgrading fc7 to fc8 ps/2 keyboard and mouse freeze in xorg with nvidia or nv driver. OS continue working without inputs. Usb keyboard and mouse is working if pluged in when ps/2 inputs dead.

[foot@gate ~]$ uname -a
Linux gate.foot.local 2.6.23.1-49.fc8 #1 SMP Thu Nov 8 22:14:09 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

tmick
28th November 2007, 05:32 AM
what happens if you run dmesg after rebooting?
Does it contain anything similar to this?
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
sd 5:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2001664
Buffer I/O error on device sdc, logical block 250208
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
sd 5:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2001872
Buffer I/O error on device sdc, logical block 250234
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
usb 3-5: reset high speed USB device using ehci_hcd and address 5
sd 5:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2001872
Buffer I/O error on device sdc, logical block 250234

and what happens if you run "modprobe -r <device>?