Fedora Linux Support Community & Resources Center
  #1  
Old 18th April 2007, 12:33 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
Unhappy sockets are hanging

i have an interesting problem where after several minutes, the sockets on my FC6 x64 server start to hang in various states.

free indicates i have plenty of free memory
top indicates im only getting 20% load spikes, normal
netstat confirms active sockets are just sitting in a state of "SYN_RECV" or "ESTABLISHED"

if i try to telnet or ssh into this server, i see the banner and upon entering my password the session hangs!

ssh hangs after this:
Code:
debug2: we sent a password packet, wait for reply
debug1: Authentication succeeded (password).
debug1: channel 0: new [client-session]
debug2: channel 0: send open
debug1: Entering interactive session.
eventually with a "connection reset by peer" error.

selinux is set to permissive

i have plenty of hard drive space

my interfaces eth0 and eth0 are not dropping packets at all, and have a clean bill of health overall.

telnetting to my postfix daemon gets me locked up in a session that doesnt respond (cant even quit!)

even my dovecot daemon hangs up.

this problem occurs on multiple switches (ive tried several to resolve it) and has plagued me for 4 weeks!
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #2  
Old 18th April 2007, 04:18 PM
ibbo's Avatar
ibbo Offline
Registered User
 
Join Date: Jun 2005
Location: Leeds
Posts: 1,264
Does dmesg give you any indication?

Also have a look in /var/log/messages /var/log/secure for other scraps of info

Ibbo
__________________
A Hangover Lasts A Day, But Our Drunken Memories Last A Lifetime
--
Linux user #349545
(GNU/Linux)iD8DBQBAzWjX+MZAIjBWXGURAmflAKCntuBbuKCWenpm XoA7LNydllVQOwCfdjyzXscddzQvlhBedAcD7qfKmHo==zx0H
Reply With Quote
  #3  
Old 18th April 2007, 04:59 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
kernel is:
2.6.20-1.2933.fc6 #1 SMP

+nope, nothing out of the ordinary except this:

Quote:
Apr 16 17:04:28 mail kernel: Calibrating delay using timer specific routine.. 6532.40 BogoMIPS (lpj=13064806)
Apr 17 13:16:25 mail kernel: Nvidia board detected. Ignoring ACPI timer override.
Apr 17 13:16:25 mail kernel
Apr 17 13:16:27 mail kernel: Calibrating delay using timer specific routine.. 5232.43 BogoMIPS (lpj=2616218)
Apr 17 13:16:28 mail kernel: Using local APIC timer interrupts.
Apr 17 13:16:28 mail kernel: Detected 12.557 MHz APIC timer.
Apr 17 13:16:28 mail kernel: Calibrating delay using timer specific routine.. 5216.59 BogoMIPS (lpj=2608296)
Apr 17 13:16:30 mail kernel: Disabling vsyscall due to use of PM timer
Apr 17 13:16:30 mail kernel: time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #4  
Old 24th April 2007, 03:49 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
this condition is persisting!
crap crap crap! i should have been a plumber!
i cant figure out the solution, but it started in fedora 5, and an upgrade to 6 (not reinstall) has not helped any.


more tips and hints for anyone interested:

the issue affects local user terminal logins too, causing them to time out entirely. pam.d??

Postfix complains of dropped connections midway through EHLO and CONNECT, indicating that yes people can reach my server but my server may or may not respond to anything they say (or even in a timely manner.)
switch is OK
interfaces are OK
cables are OK
network restart, postfix restart, saslauthd restart, and various other daemon restarts do not fix this issue
after reboot this condition randomly persists.
telinit to varied runlevels wont help.
kernel change doesnt help
no errors or warnings are logged in any process.

could this be a corrupt device driver? we had a power outage recently.

im beginning to doubt my Linux zen....
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #5  
Old 24th April 2007, 04:30 PM
ibbo's Avatar
ibbo Offline
Registered User
 
Join Date: Jun 2005
Location: Leeds
Posts: 1,264
Its defineately sounding like perhaps a hardware issue.

APIC makes me think this as I have had lots of APIC issues in the past that also killed my sockets. Do you have another NIC lying around you can stick in and play with?

Or is this coming from your router?

Ibbo
__________________
A Hangover Lasts A Day, But Our Drunken Memories Last A Lifetime
--
Linux user #349545
(GNU/Linux)iD8DBQBAzWjX+MZAIjBWXGURAmflAKCntuBbuKCWenpm XoA7LNydllVQOwCfdjyzXscddzQvlhBedAcD7qfKmHo==zx0H
Reply With Quote
  #6  
Old 24th April 2007, 05:09 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
Wink

hmmm...shouldnt apic throw warnings?

i have two nics, the infamous integrated GBnic's found on most HP Proliant servers. both eth0 and eth1 exhibit this (which youre correct, apic would be a precursor for such behavior)

pci=noacpi has been specified as a boot parameter in the current kernel...heres to crossing ma' fingers!
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu

Last edited by nimbius; 24th April 2007 at 05:39 PM.
Reply With Quote
  #7  
Old 24th April 2007, 06:09 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
no good. acpi does not appear to be the issue
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #8  
Old 25th April 2007, 10:52 AM
ibbo's Avatar
ibbo Offline
Registered User
 
Join Date: Jun 2005
Location: Leeds
Posts: 1,264
"the infamous integrated GBnic's"

Could it be these cards that are the problem? Do you have aloose card you can stick in and test?

Ibbo
__________________
A Hangover Lasts A Day, But Our Drunken Memories Last A Lifetime
--
Linux user #349545
(GNU/Linux)iD8DBQBAzWjX+MZAIjBWXGURAmflAKCntuBbuKCWenpm XoA7LNydllVQOwCfdjyzXscddzQvlhBedAcD7qfKmHo==zx0H
Reply With Quote
  #9  
Old 25th April 2007, 05:37 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
and i thought this as well, however mii-tool, ethtool, and ifconfig all confirm the gbnics are functioning 100% normally with no magical cisco flow control or anything at the switch.

I tried changing a setting in my resolv.conf. one of my interns specified a domain name (domain=). this might not be a good idea as the server operates in two domains (our intranet and our extranet.) I also removed from resolv.conf the primary nameserver, which was set as our internal resolver for our office machines, and replaced it with 0.0.0.0 (good old onboard caching nameserver.) the server, as one can imagine a mail server would, does a staggaring number of resolutions.

as of this time...the system is functioning normally again.
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #10  
Old 26th April 2007, 12:47 PM
ibbo's Avatar
ibbo Offline
Registered User
 
Join Date: Jun 2005
Location: Leeds
Posts: 1,264
"as of this time...the system is functioning normally again."

Ah the good old solved itself. My favorate bit of sys admin.

Ibbo
__________________
A Hangover Lasts A Day, But Our Drunken Memories Last A Lifetime
--
Linux user #349545
(GNU/Linux)iD8DBQBAzWjX+MZAIjBWXGURAmflAKCntuBbuKCWenpm XoA7LNydllVQOwCfdjyzXscddzQvlhBedAcD7qfKmHo==zx0H
Reply With Quote
  #11  
Old 29th April 2007, 05:58 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
crap. nevermind. thought it was resolution but that would have been too simple. resolution is perfect.

the condition occurs again if i try to transfer a large file through scp to the system, or if the system comes under heavy load (many connections.) connections time out, network services hang too. if however im logged in locally on a terminal, that terminal is OK. wtf?
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #12  
Old 30th April 2007, 06:49 AM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
both onboard nic ports are broadcom BCM5721. downing eth1 (intranet) restores connectivity. im thinking this is an issue with broadcom nics and the latest kernel?

nics have been bonded for further testing...just one network now.
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
  #13  
Old 19th May 2007, 12:37 PM
nimbius's Avatar
nimbius Offline
Registered User
 
Join Date: Nov 2004
Location: Kentucky
Posts: 131
ultimate issue for reference: two nics on fedora core 6 could not be on separate subnets and expect stability. the system stopped responding to all network requests either immediately or latently, and local tty loging would also hang.

system has been reverted to primary subnet, both nics bonded.
__________________
The sage does not hoard. The more he helps others, the more he benefits himself, The more he gives to others, the more he gets himself. The Way of Heaven does one good but never does one harm. The Way of the sage is to act but not to compete.

--Lao Tzu
Reply With Quote
Reply

Tags
hanging, sockets

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
mtr: unable to get raw sockets DesiSniper Servers & Networking 4 20th February 2012 08:24 PM
Divert sockets arrowheart Servers & Networking 0 26th October 2008 05:52 AM
sockets and strings fizy Programming & Packaging 4 10th May 2008 10:40 PM
Java and MTS sockets tashirosgt Programming & Packaging 1 24th July 2007 10:06 AM
upgrade php with -sockets sadouk Using Fedora 0 5th March 2005 07:38 PM


Current GMT-time: 06:18 (Tuesday, 18-06-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat