I recently purchased a Tyan Transport GX28 server and proceeded to install Fedora 9 x64 on it.
Hardware items of interest are a Tyan S2881 mb w/ latest bios, single Opteron 280 cpu, and dual integrated Broadcom 5704 nic's.
The Fedora installation went w/o any problem, all hardware was properly detected, and things were looking up. However, I began to experience problems when I tried to copy several GB's worth of files from a network location.
The issue is 100% repeatable and only occurs when I try to copy a large amount of data from a network location. Basically, at some random time during the operation, the machine will either loose all network connectivity or completely lockup (i.e. no mouse or keyboard).
If I am patient enough (and the machine doesn't lock-up), I am eventually presented with one of more of the following kernel error messages:
Code:
Kernel failure message 1:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
Kernel failure message 2:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:220 dev_watchdog+0xb3/0x122()
NETDEV WATCHDOG: eth0 (tg3): transmit timed out
Modules linked in: cifs nfs nls_utf8 nfsd lockd nfs_acl auth_rpcgss exportfs w83627hf lm85 hwmon_vid fuse sunrpc cpufreq_ondemand powernow_k8 freq_table loop dm_multipath ipv6 sr_mod cdrom sg tg3 amd_rng pata_amd pata_acpi ata_generic i2c_amd8111 i2c_amd756 pcspkr serio_raw i2c_core k8temp hwmon dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp sata_mv libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.26.6-79.fc9.x86_64 #1
Call Trace:
<IRQ> [<ffffffff81036d29>] warn_slowpath+0xae/0xd7
[<ffffffff8102d65f>] ? default_wake_function+0xd/0xf
[<ffffffff810299c1>] ? __wake_up_common+0x46/0x75
[<ffffffff81137e5d>] ? __next_cpu+0x19/0x26
[<ffffffff8102aa59>] ? find_busiest_group+0x2c8/0x736
[<ffffffff8104f2c3>] ? getnstimeofday+0x3a/0x96
[<ffffffff8122d1a2>] ? dev_watchdog+0x0/0x122
[<ffffffff8122d255>] dev_watchdog+0xb3/0x122
[<ffffffff810401c0>] run_timer_softirq+0x192/0x20e
[<ffffffff8103c21a>] __do_softirq+0x6d/0xe1
[<ffffffff8100d51c>] call_softirq+0x1c/0x28
[<ffffffff8100e7b4>] do_softirq+0x44/0x8b
[<ffffffff8103bfeb>] irq_exit+0x3f/0x80
[<ffffffff8101b449>] smp_apic_timer_interrupt+0x8c/0xa5
[<ffffffff8100a000>] ? default_idle+0x0/0x4b
[<ffffffff8100a000>] ? default_idle+0x0/0x4b
[<ffffffff8100cf42>] apic_timer_interrupt+0x72/0x80
<EOI> [<ffffffff81020594>] ? native_safe_halt+0x6/0x8
[<ffffffff812a0279>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff8100a02e>] ? default_idle+0x2e/0x4b
[<ffffffff8100b1ca>] ? cpu_idle+0x92/0xda
[<ffffffff81297df7>] ? start_secondary+0x169/0x16d
---[ end trace eb9835d0e9df6ec0 ]---
Kernel failure message 1:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
nfs: server 192.168.1.10 not responding, still trying
Kernel failure message 2:
tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
nfs: server 192.168.1.10 OK
Thus far I have done the following to troubleshoot this:
- Performing the copy operation using SMB, instead of NFS, makes no difference.
- Copying from a different machine on the network does not make a difference.
- I've repeated the problem booting from kernels 2.6.26.6-79 and 2.6.25-14.
- Attaching to the network at 100mbps FD or 1GB FD does not make a difference.
- Using eth1, instead of eth0, produces the same results.
Anyone have any other suggestions?
dmesg | grep eth0
Code:
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000Base-T Ethernet 00:e0:81:33:c8:0c
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f4000] dma_mask[64-bit]
ADDRCONF(NETDEV_UP): eth0: link is not ready
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
ifconfig
Code:
eth0 Link encap:Ethernet HWaddr 00:E0:81:33:C8:0C
inet addr:192.168.1.9 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::2e0:81ff:fe33:c80c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:19859 errors:0 dropped:0 overruns:0 frame:0
TX packets:17630 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9143111 (8.7 MiB) TX bytes:2279613 (2.1 MiB)
Interrupt:24
eth1 Link encap:Ethernet HWaddr 00:E0:81:33:C8:0D
inet addr:192.168.1.8 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:25
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4106 errors:0 dropped:0 overruns:0 frame:0
TX packets:4106 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:327306 (319.6 KiB) TX bytes:327306 (319.6 KiB)