PDA

View Full Version : Frequent "general protection" and "segfault" errors on FC5 x86_64



kurtruff
31st October 2006, 04:17 PM
I have an nForce3-based Athlon 64 that's been running FC5 beautifully for the last year. I had automatic yum updates turned off, and it spent all this time doing numbercrunching, load average of about 1.0.

Two weeks ago I repurposed the machine, and wiped the disks and installed FC5 again. This time I've been yum-updating.

I am now experiencing frequent "general protection" and "segfault" errors such as the following:


Oct 31 06:31:10 kernel: qmail-remote[27911]: segfault at 000000002a0bc2e0 rip 0000000000406992 rsp 00007fff2a0bc2a8 error 4
Oct 31 06:31:44 kernel: qmail-smtpd[28033] general protection rip:40a554 rsp:7fff5100dae0 error:0
Oct 31 06:46:48 kernel: qmail-smtpd[4251] general protection rip:40a554 rsp:7fff93cee7c0 error:0
Oct 31 06:57:16 kernel: qmail-smtpd[10775] general protection rip:40a554 rsp:7fff5af9fa80 error:0
Oct 31 06:57:33 kernel: qmail-smtpd[10895] general protection rip:40a554 rsp:7fffb1151c30 error:0
Oct 31 07:06:42 kernel: qmail-smtpd[16861] general protection rip:40a554 rsp:7fff4ef60a40 error:0
Oct 31 07:15:14 kernel: qmail-smtpd[21365] general protection rip:40a554 rsp:7fff3841df00 error:0
Oct 31 07:23:18 kernel: qmail-smtpd[25461] general protection rip:40a554 rsp:7fffe7a6b540 error:0
Oct 31 07:23:26 kernel: qmail-smtpd[25523] general protection rip:40a554 rsp:7fffbc7bb290 error:0
Oct 31 07:30:46 kernel: qmail-smtpd[30792] general protection rip:40a554 rsp:7fff13cae780 error:0
Oct 31 07:37:20 kernel: qmail-smtpd[1516] general protection rip:40a554 rsp:7ffff759f070 error:0
Oct 31 07:37:42 kernel: qmail-smtpd[1618] general protection rip:40a554 rsp:7fff9480b2e0 error:0
Oct 31 07:44:18 kernel: qmail-smtpd[4944] general protection rip:40a554 rsp:7fff5c6d81b0 error:0
Oct 31 07:51:20 kernel: qmail-smtpd[10721] general protection rip:40a554 rsp:7fffa9a6e540 error:0
Oct 31 07:57:35 kernel: qmail-smtpd[14198] general protection rip:40a554 rsp:7fffd45fe0e0 error:0
Oct 31 08:03:27 kernel: qmail-smtpd[17670] general protection rip:40a554 rsp:7fffbde88960 error:0
Oct 31 08:03:53 kernel: counter[17822]: segfault at 0000000000000000 rip 000000004873e61f rsp 00000000ffb6cfbc error 4
Oct 31 08:08:50 kernel: qmail-smtpd[20714] general protection rip:40a554 rsp:7fff1cc5c730 error:0
Oct 31 08:13:57 kernel: qmail-smtpd[23715] general protection rip:40a554 rsp:7fff24938410 error:0
Oct 31 08:18:58 kernel: qmail-smtpd[26487] general protection rip:40a554 rsp:7fff56920400 error:0
Oct 31 08:23:59 kernel: qmail-smtpd[29762] general protection rip:40a554 rsp:7fffb8b35610 error:0
Oct 31 08:29:00 kernel: qmail-smtpd[32695] general protection rip:40a554 rsp:7fffeb13ec10 error:0
Oct 31 08:34:00 kernel: qmail-smtpd[2920] general protection rip:40a554 rsp:7ffffc8e43c0 error:0


Though the machine is under pretty heavy load distributed among several applications (stock mysql, apache, dovecot, nfsd), I'm only receiving these errors from the above three binaries (qmail-smtpd, qmail-remote, and counter)... and they only appear occasionally with these three binaries --- qmail-smtpd is handling several mails per second and "only" dies every couple minutes.

(Sidenote: I'm not even sure what "counter" is... I can't find any binary named "counter" on my filesystem. This error appears sporadically also, averaging about once per hour but not at regular intervals.)

Thanks,
Kurt

---

Some further information:


# uname -a
Linux leikata 2.6.18-1.2200.fc5 #1 SMP Sat Oct 14 16:59:56 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux


# uptime
09:49:56 up 1 day, 8:11, 3 users, load average: 1.20, 0.79, 0.82


# lspci
00:00.0 Host bridge: nVidia Corporation nForce3 250Gb Host Bridge (rev a1)
00:01.0 ISA bridge: nVidia Corporation nForce3 250Gb LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation nForce 250Gb PCI System Management (rev a1)
00:02.0 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.2 USB Controller: nVidia Corporation nForce3 EHCI USB 2.0 Controller (rev a2)
00:08.0 IDE interface: nVidia Corporation CK8S Parallel ATA Controller (v2.5) (rev a2)
00:0b.0 PCI bridge: nVidia Corporation nForce3 250Gb AGP Host to PCI Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 Pro Ultra TF
02:06.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 30)
02:07.0 RAID bus controller: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
02:0b.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)

clearer
1st November 2006, 10:13 PM
I assume that his is a productionn (or equivelant) computer, so a wipe of the harddrives just to see if it will work, is out of the question, right?

Personally I have had a lot of problems with the kernels provided with FC6. Try compiling a kernel for yourself with only the stuff you need included -- if this is not a dual core machine (which I assume it isn't) make sure to disable multi-core features and SMP support. And if you have any custom drivers, remember that linux/config.h is no-more, so if the compile compalins that it can't find it, either just touch it (touch /path/to/kernel/source/linux/config.h) or comment/delete the includes.

clearer
1st November 2006, 10:14 PM
Do any of these programmes ever crash, btw?

kurtruff
1st November 2006, 10:36 PM
Clearer -

Thanks for your response.


I assume that his is a productionn (or equivelant) computer, so a wipe of the harddrives just to see if it will work, is out of the question, right?

This is a production machine, so, yes, I can't take it offline for hours.

Also I'm in the middle of building a similar machine with a similar software configuration, so in a few days I'll be able to do a fresh install. But I'm confused about how this will help --- my existing install is a fresh FC5 install, and (afaik) the only way it differs from a stock FC5 install is that I installed qmailrocks. When I install on the new machine, what do you suggest I should watch out for?


Personally I have had a lot of problems with the kernels provided with FC6. Try compiling a kernel for yourself with only the stuff you need included -- if this is not a dual core machine (which I assume it isn't) make sure to disable multi-core features and SMP support.

Cool, I'll try a custom kernel.

- Kurt

kurtruff
1st November 2006, 10:45 PM
Do any of these programmes ever crash, btw?

I'm not certain I understand your question --- each of the programs I mentioned dies repeatedly.

qmail-smtpd is running under tcpserver, which spawns it freshly for each connection. The rest of qmail is supervised and respawned as necessary, so, as of yet, this hasn't brought down the entire qmail system, only individual connections.

- Kurt

kurtruff
7th December 2006, 07:27 PM
Ok.. So, I've built a new machine with very different hardware (but still x86_64), installed FC5 and qmail, and.. I'm getting the same general protection and segfaults. Any ideas?

- Kurt