Hi,
I have a general fedora problem on a very busy mail/web server. It works fine for months and then eventually hangs. Had the problem over a few versions of Redhat and Fedora so it appears to be fairly generic. I had thought it was the result of a DOS or hack but this morning it happened and I still had a session logged on so could do some basic investigation.
Everything was hanging on process startup/initialisation, that is every process that was started simply hung thus the system eventually dies. A ps reveals a growing list of processes, logins hangs after the username and password (because that again is a process startup) and its not even possible to shut the server down. Only way to clear the issue is to power fail the server.
Does anyone understand how process startup on linux works, what is likely to be happening, which process may have died or kernel issue has happened.
What diagnosis can I perform next time if happens etc?
Is there anyway to detect or prevent the problem apart from a reboot every few months in anticipation of it happening.
I'm guessing some process has died or some resource has run out but nothing is reported in any of the logs that I can see so its a little hard to progress.
I saved a ps output but when I rebooted but when the file system was recovered the file was removed.
Thanks in advance
Duncan