I've been getting some weird, nasty X crashes the last two weeks or so... sometimes, the X session crashes back to the login screen, other times it hard locks X (sometimes I can SSH into the machine, see that Xorg is at 100% of a core, and I have to reboot the machine as it won't "kill" with any signal) or it hard-locks the machine and requires a cold boot.
This may seems to have started happening after a software update a couple weeks ago, which included a kernal and nvidia kmod update.
Here is part of the messages file around the time of a crash:
Apr 12 11:55:00 pangea abrt[26059]: File '/usr/bin/Xorg' seems to be deleted
Apr 12 11:55:00 pangea gnome-session[2216]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.#012
Apr 12 11:55:00 pangea abrt[26059]: Saved core dump of pid 1513 (/usr/bin/Xorg) to /var/spool/abrt/ccpp-2012-04-12-11:55:00-1513 (2412544 bytes)
Apr 12 11:55:00 pangea abrtd: Directory 'ccpp-2012-04-12-11:55:00-1513' creation detected
Apr 12 11:55:00 pangea systemd-logind[1293]: Removed session 2.
Apr 12 11:55:01 pangea abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2012-04-05-10:51:37-1535
Apr 12 11:55:01 pangea abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2012-04-05-10:51:37-1535
Apr 12 11:55:01 pangea abrtd: Deleting problem directory ccpp-2012-04-12-11:55:00-1513 (dup of ccpp-2012-04-05-10:51:37-1535)
The file is still there:
# ls -la /usr/bin/Xorg
-rws--x--x 1 root root 1965144 Mar 8 15:33 /usr/bin/Xorg
# file /usr/bin/Xorg
/usr/bin/Xorg: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, stripped
Now I can see the directory /var/spool/abrt/ccpp-2012-04-05-10:51:37-1535 but this doesn't appear in my list of ABRT 'not submitted' or 'submitted' reports.
I noticed this as well:
Apr 11 13:17:36 pangea kernel: [ 7629.507303] NVRM: GPU at 0000:04:00.0 has fallen off the bus.
This happened before a second crash today... but not before the first crash at 11:55.
That doesn't sound good. Kind of makes me think it's a hardware problem...but very strange this started happening shortly after a software update which included a kernel and nvidia-kmod update. Prior to that I FC16 was running happily for many months.
Any thoughts?