PDA

View Full Version : FC6 gigabit network losing connection


mattgross
4th March 2007, 05:35 AM
Hi Everyone,

The following is a post originally sent to TLUG www.tlug.jp mailing list. I will post the replies of others as replies to this. My hope is that I can somehow get this problem resolved:

----------------------------------------------------------------------------------------

I am having a lot of problems with my Fedora Core 6 server connection
going down when traffic increases. I bought a new switch (mine is quite
old) in hopes of fixing the problem, however, that seems to have
exacerbated the problem.

I have googled for a way to diagnose this problem and to see if anyone
else is having the same problem, but there is so much information on
networking and linux (most of it related to IP forwarding) that I am
overwhelmed and in need of some advice.

Can anyone point me in a good direction for diagnosing a home network
problem?

Thanks,
Matt

mattgross
4th March 2007, 05:41 AM
> > What Sigurd said about checking the "error" number on your ifconfig
> > output is a good start. Bear in mind that this could also have a simple
> > answer like a bad cable...
>
> Good point.
>
> Besides checking the logs, running ipconfig, and checking the cable, do:
> Check connection to route (i.e ping the router)

when everything is working:

[root@fedora6 log]# ping 192.168.0.254
PING 192.168.0.254 (192.168.0.254) 56(84) bytes of data.
64 bytes from 192.168.0.254: icmp_seq=1 ttl=64 time=0.182 ms
64 bytes from 192.168.0.254: icmp_seq=2 ttl=64 time=0.176 ms
64 bytes from 192.168.0.254: icmp_seq=3 ttl=64 time=0.170 ms
64 bytes from 192.168.0.254: icmp_seq=4 ttl=64 time=0.212 ms
64 bytes from 192.168.0.254: icmp_seq=5 ttl=64 time=0.193 ms

--- 192.168.0.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.170/0.186/0.212/0.021 ms

when not working:

[root@fedora6 log]# ping 192.168.0.254
PING 192.168.0.254 (192.168.0.254) 56(84) bytes of data.
From 192.168.0.246 icmp_seq=2 Destination Host Unreachable
From 192.168.0.246 icmp_seq=3 Destination Host Unreachable
From 192.168.0.246 icmp_seq=6 Destination Host Unreachable
From 192.168.0.246 icmp_seq=8 Destination Host Unreachable
From 192.168.0.246 icmp_seq=9 Destination Host Unreachable
From 192.168.0.246 icmp_seq=12 Destination Host Unreachable
From 192.168.0.246 icmp_seq=13 Destination Host Unreachable

--- 192.168.0.254 ping statistics ---
13 packets transmitted, 0 received, +7 errors, 100% packet loss, time
12000ms
, pipe 2


> Check your routing table for errors (i.e. netstat -nr)

when working:

[root@fedora6 log]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
0.0.0.0 192.168.0.254 0.0.0.0 UG 0 0 0
eth1


when not working:

[root@fedora6 log]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
0.0.0.0 192.168.0.254 0.0.0.0 UG 0 0 0
eth1



> Check your arp table for errors (i.e. arp -a)

when working:

[root@fedora6 log]# arp -a
? (192.168.0.254) at 00:90:CC:42:71:58 [ether] on eth1


when not working:

[root@edora6 log]# arp -a
? (192.168.0.254) at <incomplete> on eth1

>
> If no problems above, then we need a better understanding of your
> machine. Here's a shotgun approach:
> Which SELinux setting are you using?
/etc/sysconfig/selinux:

SELINUX=disabled
SELINUXTYPE=targeted
SETLOCALDEFS=0

> Have you modified your SELinux configuration otherwise?

No

> Did you change anything recently? ;-) (i.e. changed a router setting,
> proxy setting, ect.)

I have not made any changes other than yum updates (maybe the problem is
here?)

> What applications are running on the server? (i.e. apache, bind,
> sendmail, squid, JBoss, ect.)

sendmail, apache, mysql, gtk-gnutella, aMSN, evolution, mysql, and many
others. This is a recent problem though (last week or week and a half)

> Which kernel version are you running?

x86_64 2.6.19-1.2911.fc6

> What is the network card(s)/chipset(s) you are using?

I have a MSI K9N Diamond motherboard:

"On-board LAN: Yes, two Gigabit Ethernet controlled by the chipset
together with two Vitesse SimplyPHY chips (VSC8601) chips."
http://www.hardwaresecrets.com/article/400/3

> Are you using wireless networking device in conjuction with a network
> interface card?

No wireless networking being used.

>
> Some of the questions above came from common fedora 6 issues that I've
> read. To answer your earlier questions, yes but I'd rather begin with
> the suggestions above before going there.

I find this to be a very strange problem. If I do a "ifdown eth1 &&
ifup eth1" the connection comes back up for anywhere from 15 seconds to
a few minutes. But the length of time is inconsistent.

I performed a tcpdump -XXvv and noticed a few things. There were a lot
of entries similar to the following with the "bad udp cksum":

20:16:33.494112 IP (tos 0x0, ttl 64, id 47214, offset 0, flags [DF],
proto: UDP (17), length: 71) fedora6.48225 > wtk-ns03.usen.ad.jp.domain:
[bad udp cksum 29eb!] 36173+ PTR? 1.112.122.61.in-addr.arpa. (43)

20:16:34.406371 IP (tos 0x0, ttl 64, id 58126, offset 0, flags [DF],
proto: UDP (17), length: 66) fedora6.48227 > wtk-ns03.usen.ad.jp.domain:
[bad udp cksum e0cb!] 11102+ AAAA? pop.mail.yahoo.co.jp. (38)

20:16:34.604046 IP (tos 0x0, ttl 64, id 33916, offset 0, flags [DF],
proto: TCP (6), length: 72) fedora6.58821 > cs57.msg.dcn.yahoo.com.mmcc:
P, cksum 0x5c2d (incorrect (-> 0x2f95), 2906:2926(20) ack 3727 win 574
<nop,nop,timestamp 72999891 1894641182>

Is this normal?

This problem is so painful, I am debating doing a full re-install in
hopes of fixing everything. Would you discourage/recommend that I do
this?

I really appreciate everyone's input and help with this. Thanks again!

mattgross
4th March 2007, 05:43 AM
> > Hi Everyone,
> >
> > I am having a lot of problems with my Fedora Core 6 server connection
> > going down when traffic increases. I bought a new switch (mine is quite
> > old) in hopes of fixing the problem, however, that seems to have
> > exacerbated the problem.
> >
> It could be both an ethernet problem and a problem higher up. The
> suggestion to read logs is a good starting point.
>
> If it's an ethernet problem you should be able to see it in the output
> from ifconfig:
>
> sigurdur@ifconfig:~$ /sbin/ifconfig
> eth0 Link encap:Ethernet HWaddr 00:13:D3:C1:CF:A0
> inet addr:192.168.43.135 Bcast:192.168.43.255 Mask:255.255.255.0
> inet6 addr: fe80::213:d3ff:fec1:cfa0/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:9184658 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8403557 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:1260796022 (1.1 GiB) TX bytes:2957724591 (2.7 GiB)
> Interrupt:217 Base address:0x4000

here is mine:

[root@fedora6 ~]# ifconfig
eth1 Link encap:Ethernet HWaddr 00:16:17:B7:D9:03
inet addr:192.168.0.246 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::216:17ff:feb7:d903/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8165 errors:0 dropped:0 overruns:0 frame:0
TX packets:6885 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8445125 (8.0 MiB) TX bytes:656304 (640.9 KiB)
Interrupt:21 Base address:0x8000

I know that it does not show a lot of packets, but before I rebooted my server it also said errors:0

>
> If "errors" increases it's at least an indication that this problem is
> ethernet related.
>

I swapped the ethernet cables for new ones between my server, the hub and the hub and the firewall box. Still having a problem.

> Is the problem noticeable between computers on the same LAN, or only
> through a broad band router or something like that? If the former, try
> setting static arp entries on both servers to see if it's an arp related
> problem.

I don't think it is a problem with the arp cache. After the problem happened, the arp cache is still showing a correct entry:

[root@fedora6 ~]# arp -a
? (192.168.0.254) at 00:90:CC:42:71:58 [ether] on eth1

Does the question mark above signify anything important?

I did notice that when I am having the problem, the following takes several seconds to complete:

[root@fedora6 ~]# arp -va
? (192.168.0.254) at <incomplete> on eth1
Entries: 1 Skipped: 0 Found: 1

I noticed the <incomplete> and tried to add the arp entry manually, but that did not fix my problem.

mattgross
4th March 2007, 05:44 AM
> >> Matt Gross wrote:
> >>
> >>> Hi Everyone,
> >>>
> >>> I am having a lot of problems with my Fedora Core 6 server connection
> >>
> [..]

<snip>

Thank you all for the great info. I am back with new developments:

My laptop network card, which also has an internal gigabit ethernet card
and runs Fedora Core 6, stopped working as well. I recently updated the
system with yum.

Last night I was fed up with my computers not working and decided to
simply reinstall the laptop with the Fedora Core 6 rescue disk. I
connected to on of the http mirrors and downloaded everything during the
install. So, this means my network card on my laptop works when I boot
from the F6 rescue cdrom. _BUT_ after the install, the network card
would not work! This is good. More evidence pointing towards a recent
driver update perhaps?

Back to my server which was the original problem, I connected a USB
network card to my server and it is working flawlessly (as long as I
don't complain about the slow speed ;).

What do you think? Does this sound like a driver problem to you? I
haven't been experiencing any problems since I connected the slower
network card.

If this is a driver problem, how would I go about tracking down the
driver problem and getting it fixed?

Thanks again everyone!

Matt

<snip>

mattgross
4th March 2007, 05:45 AM
> > Can anyone point me in a good direction for diagnosing a home network
> > problem?
>
> Considering reading your log files for clues, and look for anything
> out of the ordinary (i.e. Spurious events et al).
>

Tried that but did not find anything. Is there a way to turn on
additional logging for eth1?

Should I start collecting raw tcpdumps?

mattgross
5th March 2007, 01:01 PM
Just a quick update everyone. I submitted this as a bug to redhat fedora bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230992

New Zealand Photos - United Arab Emirates Instagram Photos - Kowloon Photos -