Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Servers & Networking
FedoraForum Search

Forgot Password? Join Us!

Servers & Networking Discuss any Fedora server problems and Networking issues such as dhcp, IP numbers, wlan, modems, etc.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 3rd November 2008, 08:26 PM
knnniggett's Avatar
knnniggett Offline
Registered User
 
Join Date: Apr 2005
Location: Illinois
Age: 39
Posts: 9
Total Lockup Under Heavy Network Load w/ BCM5704

I recently purchased a Tyan Transport GX28 server and proceeded to install Fedora 9 x64 on it.
Hardware items of interest are a Tyan S2881 mb w/ latest bios, single Opteron 280 cpu, and dual integrated Broadcom 5704 nic's.

The Fedora installation went w/o any problem, all hardware was properly detected, and things were looking up. However, I began to experience problems when I tried to copy several GB's worth of files from a network location.

The issue is 100% repeatable and only occurs when I try to copy a large amount of data from a network location. Basically, at some random time during the operation, the machine will either loose all network connectivity or completely lockup (i.e. no mouse or keyboard).

If I am patient enough (and the machine doesn't lock-up), I am eventually presented with one of more of the following kernel error messages:

Code:
Kernel failure message 1:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.

Kernel failure message 2:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:220 dev_watchdog+0xb3/0x122()
NETDEV WATCHDOG: eth0 (tg3): transmit timed out
Modules linked in: cifs nfs nls_utf8 nfsd lockd nfs_acl auth_rpcgss exportfs w83627hf lm85 hwmon_vid fuse sunrpc cpufreq_ondemand powernow_k8 freq_table loop dm_multipath ipv6 sr_mod cdrom sg tg3 amd_rng pata_amd pata_acpi ata_generic i2c_amd8111 i2c_amd756 pcspkr serio_raw i2c_core k8temp hwmon dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp sata_mv libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.26.6-79.fc9.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff81036d29>] warn_slowpath+0xae/0xd7
 [<ffffffff8102d65f>] ? default_wake_function+0xd/0xf
 [<ffffffff810299c1>] ? __wake_up_common+0x46/0x75
 [<ffffffff81137e5d>] ? __next_cpu+0x19/0x26
 [<ffffffff8102aa59>] ? find_busiest_group+0x2c8/0x736
 [<ffffffff8104f2c3>] ? getnstimeofday+0x3a/0x96
 [<ffffffff8122d1a2>] ? dev_watchdog+0x0/0x122
 [<ffffffff8122d255>] dev_watchdog+0xb3/0x122
 [<ffffffff810401c0>] run_timer_softirq+0x192/0x20e
 [<ffffffff8103c21a>] __do_softirq+0x6d/0xe1
 [<ffffffff8100d51c>] call_softirq+0x1c/0x28
 [<ffffffff8100e7b4>] do_softirq+0x44/0x8b
 [<ffffffff8103bfeb>] irq_exit+0x3f/0x80
 [<ffffffff8101b449>] smp_apic_timer_interrupt+0x8c/0xa5
 [<ffffffff8100a000>] ? default_idle+0x0/0x4b
 [<ffffffff8100a000>] ? default_idle+0x0/0x4b
 [<ffffffff8100cf42>] apic_timer_interrupt+0x72/0x80
 <EOI>  [<ffffffff81020594>] ? native_safe_halt+0x6/0x8
 [<ffffffff812a0279>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff8100a02e>] ? default_idle+0x2e/0x4b
 [<ffffffff8100b1ca>] ? cpu_idle+0x92/0xda
 [<ffffffff81297df7>] ? start_secondary+0x169/0x16d

---[ end trace eb9835d0e9df6ec0 ]---

Kernel failure message 1:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying

Kernel failure message 2:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
Thus far I have done the following to troubleshoot this:
  • Performing the copy operation using SMB, instead of NFS, makes no difference.
  • Copying from a different machine on the network does not make a difference.
  • I've repeated the problem booting from kernels 2.6.26.6-79 and 2.6.25-14.
  • Attaching to the network at 100mbps FD or 1GB FD does not make a difference.
  • Using eth1, instead of eth0, produces the same results.


Anyone have any other suggestions?

dmesg | grep eth0
Code:
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000Base-T Ethernet 00:e0:81:33:c8:0c
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f4000] dma_mask[64-bit]
ADDRCONF(NETDEV_UP): eth0: link is not ready
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
ifconfig
Code:
eth0      Link encap:Ethernet  HWaddr 00:E0:81:33:C8:0C  
          inet addr:192.168.1.9  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe33:c80c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19859 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17630 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:9143111 (8.7 MiB)  TX bytes:2279613 (2.1 MiB)
          Interrupt:24 

eth1      Link encap:Ethernet  HWaddr 00:E0:81:33:C8:0D  
          inet addr:192.168.1.8  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:25 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:4106 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4106 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:327306 (319.6 KiB)  TX bytes:327306 (319.6 KiB)
Reply With Quote
  #2  
Old 6th November 2008, 01:20 PM
knnniggett's Avatar
knnniggett Offline
Registered User
 
Join Date: Apr 2005
Location: Illinois
Age: 39
Posts: 9
Over the past few days I Google'd across several posts from other individuals receiving the "tg3_stop_block timed out" error. One can find references to this error in RedHat Bugzilla, HP forum, Ubuntu forum, and others. Turns out this is a rather generic error message that does not necessarily point to a single specific problem.

Most of the posts focused on varying problems with the tg3 driver itself. However, I discovered last night that my problem was not software related.

I did two thing last night that (so far) have cleared up my issue:
1) I set the jumper on the mb disabling the integrated Adpaptec U320 controller
2) I set the jumper on the mb to lower the speed of PCI-X bus A from 100MHz down to 66MHz

Turns out that the two nic's are sharing the same bus as the scsi controller. Since I am not using scsi devices, I had previously disabled the scsi controller in the bios. However, that apparently was not enough.

Since I made the above two changes at the same time, I don't yet know if both are required. I will jumper PCI-X bus A back to 100MHz tonight and then post my results.
Reply With Quote
  #3  
Old 7th November 2008, 12:49 AM
knnniggett's Avatar
knnniggett Offline
Registered User
 
Join Date: Apr 2005
Location: Illinois
Age: 39
Posts: 9
I booted the machine after setting the PCI-X bus speed jumper back to 100MHz and everything works fine. I've copied over 40GB worth of files via the nic thus far. I am confident this problem is resolved.

Anyway, if you happen to stumble onto this thread looking for a resolution to a similar problem, I recommend you take a look at what else is on the same PCI bus as the nic... especially if you have a similar Tyan motherboard.

It should be no surprise, but Googling around a bit shows that I am not the only one who has run into this issue: http://www.centos.org/modules/newbb/...topic_id=15932
Reply With Quote
Reply

Tags
bcm5704, heavy, load, lockup, network, total, w or

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Firefox and Xorg caused heavy CPU load hnws Fedora 12 Alpha, Beta & Release Candidates 7 10th November 2009 01:06 PM
xorg crashes under heavy load? Jake Using Fedora 0 3rd July 2009 12:56 AM
High system CPU load under heavy disk I/O llowrey Using Fedora 2 3rd April 2009 06:14 PM
AIGLX extreme slowness under heavy load sardaukar_siet Using Fedora 6 2nd November 2006 05:48 AM
Heavy Javascript-load in FF browser fast sjonny Using Fedora 6 7th December 2004 03:17 AM


Current GMT-time: 14:39 (Wednesday, 22-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat