I posted the following as a bug:
https://bugzilla.redhat.com/show_bug.cgi?id=493513
I've tried building a 2.6.29 kernel and disabled SMP. I'm not sure what else to change.
Any ideas?
--------------------
Description of problem:
Simple disk activity causes high system cpu time. A simple sequential read, on
an idle system, causes a 25% system cpu load (according to top and vmstat).
Reading from, or writing to, more than 4 disks causes the cpu utilization to
hit 100% and the individual disk throughput to be reduced.
For example, doing a sequential read from 10 disks results in reads of about
19MB/s per drive when each yields 60MB/s alone.
When I boot from a RIPLinux CD (with a 2.6.18 x86_64 kernel), the same system
will easily read at full speed (60MB/s) from each drive with a total aggregate
read rate of over 600MB/s (for 10 drives). When I boot the RIPLinux CD I mount
the FC10 root and chroot to it, so the only difference is, or should be, the
kernel configuration.
I have another host, also running FC10, which is running a custom 2.6.28 kernel
which was built using the stock FC10 kernel .config file. It has the same high
system CPU problem so I know the problem is not a .27 vs .28 kernel issue.
The only conclusion I can draw is that the FC10 kernels are configured in a way
that causes problem I'm experiencing.
I have tried turning off acpi in the bios, booting with acpi=off, nohz=off, and
have tried all of the io schedulers with no significant change.
Version-Release number of selected component (if applicable):
Kernel 2.6.27.19-170.2.35.fc10.x86_64
How reproducible:
Every time, every boot.
Steps to Reproduce:
1. dd if=/dev/sda of=/dev/null
2. watch vmstat to see high ~20+% system load
3. do the same with two dd's simultaneously to see 40% system load, 60% for 3
dd's, etc, etc, etc.
Actual results:
20% system cpu usage reading from one drive
40% system cpu usage reading from two drives
... etc.
When reading from more than 4 drives simultaneously, the read rate from each
drive is reduced.
Expected results:
1) System load less than 20% per sequential read.
2) Ability to sequentially read from 10 drives simultaneously at full speed.
Additional info:
Both systems tested are nforce4 Athlon64 systems. One with a dual core, one
with a single core. Both with 2GB RAM. All disks are SATA and connected via
controllers attached via PCI Express. Controller drivers used are sata_nv,
sata_mv, sata_sil24.
Trying to resync a 10 drive raid5 array is extremely slow, ~10MB/s (per drive)
with the FC10 kernel, but a correct 60MB/s (per drive) with the RIPLinux
kernel.