PDA

View Full Version : Extremely poor performance crunching random numbers under PIV-FC5



BankHacker
18th May 2006, 03:14 PM
Hi, I have installed Linux Fedora Core 5 (FC5) on a Pentium IV 3 Ghz, 4 Gb RAM, and SATA disk.

I have been detected CPU is running extremely slow on certain situations: For example when doing random calculations. In order to benchmark this particular situation I have writen an small C program that make 10 million random numbers and measures the time consumed.

It is surprising that when the program is compiled with the static flag enabled, it runs very fast, doing 10 million calculations in only 0.4 seconds. Nevertheless, when it is compiled without the static flag (that is dynamic binary), the performance becomes very poor, consuming 40 seconds in doing it.

I have tested both compilations under other Linux distributions, like Debian, and it runs in both case perfect, doing the job in only 0.4 seconds. I have also tested both programs under FC3 and I obtain the same results than FC5. So I conclude that the problem only happens when running Fedora!

This is the C code I have used to do the tests:

### test-cpu-2.c ##################################################
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <string.h>

inline void randomize() {
time_t seconds;

time(&seconds);
srand((unsigned int) seconds);
}

int main(int argc, char ** argv) {
int i, r, numero_ciclos, numero_ciclosM, t1, t2;
clock_t start, end;
char* buf;
time_t seconds;

// Se inicializa el generador de numeros aleatorios
randomize();


start = clock();
// Se reserva 0.1 Gb de memoria
buf=malloc(100*1024*1024);
end = clock();
printf("Reservado 0.1 Gb de memoria en %.3f sec\n", (double)(end -
start)/CLOCKS_PER_SEC, r);


start = clock();
// Se escribe en 0.1 Gb de memoria
for(i=0; i<100*1024*1024; i++) {
buf[i]='0';
}
end = clock();
printf("Escritura sobre 0.1 Gb de memoria en %.3f sec\n", (double)(end -
start)/CLOCKS_PER_SEC, r);


numero_ciclos = 10000000; numero_ciclosM = numero_ciclos / 1E6;


start = clock();
for(i=0; i<numero_ciclos; i++) {
r = rand();
}
end = clock();
printf("%d M de rand() en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);


start = clock();
for(i=0; i<numero_ciclos; i++) {
r = sqrt(i);
}
end = clock();
printf("%d M de sqrt(i) en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);


start = clock();
for(i=0; i<numero_ciclos; i++) {
r = log(i);
}
end = clock();
printf("%d M de log(i) en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);



start = clock();
for(i=0; i<numero_ciclos; i++) {
r = log10(i);
}
end = clock();
printf("%d M de log10(i) en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);


#ifdef linux
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = random();
}
end = clock();
printf("LINUX: %d M de random() en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);

// Se inicializa el generador especial de numeros aleatorios
srand48((unsigned int) seconds);

start = clock();
for(i=0; i<numero_ciclos; i++) {
r = lrand48();
}
end = clock();
printf("LINUX: %d M de lrand48() en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end -
start)/CLOCKS_PER_SEC, r);
#else
#endif

return (0);
}
### test-cpu-2.c (the end) ########################################

First test:
gcc test-cpu-2.c -o static-test-cpu-2 -lm -static

Second test:
gcc test-cpu-2.c -o dynamic-test-cpu-2 -lm

Obtaining these files:
#file *-test-cpu-2

dynamic-test-cpu-2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), for GNU/Linux 2.2.5, not stripped


static-test-cpu-2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, statically linked, for GNU/Linux 2.2.5, not stripped

When runnnig both these are the results:

# ./static-test-cpu-2

Reservado 0.1 Gb de memoria en 0.000 sec
Escritura sobre 0.1 Gb de memoria en 0.410 sec
10 M de rand() en 0.230 sec (example.: 1705120472) <===============
10 M de sqrt(i) en 0.020 sec (example.: 3162)
10 M de log(i) en 0.050 sec (example.: 16)
10 M de log10(i) en 0.050 sec (example.: 6)
LINUX: 10 M de random() en 0.210 sec (example.: 1072609142) <======
LINUX: 10 M de lrand48() en 0.340 sec (example.: 1674848660) <=====

# ./dynamic-test-cpu-2

Reservado 0.1 Gb de memoria en 0.000 sec
Escritura sobre 0.1 Gb de memoria en 0.410 sec
10 M de rand() en 45.310 sec (example.: 661533760) <===============
10 M de sqrt(i) en 0.020 sec (example.: 3162)
10 M de log(i) en 0.050 sec (example.: 16)
10 M de log10(i) en 0.050 sec (example.: 6)
LINUX: 10 M de random() en 37.610 sec (example.: 1311921343) <=====
LINUX: 10 M de lrand48() en 30.490 sec (example.: 839680703) <=====


My kernel is the default for FC5 but it is the SMP version in order to use the Hyperthreading:
# uname -a
Linux obelix.breinestorm.net 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 i686 i686 i386 GNU/Linux

This is my CPU description:
# cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 1
cpu MHz : 2999.084
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips : 6007.68

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping : 1
cpu MHz : 2999.084
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips : 5997.58

I have tested both programs under other systems (Not PIV, but opteron, and PIII), running Fedora 3, 4 and 5 (even MsWindows 2000 and XP) and the results never shows a poor performance.

I have deactivate SELinux functionality, and the results remain the same, poor performance with or without SELinux.

These are the linked libraries the dynamic version is using:
# ldd dynamic-test-cpu-2
linux-gate.so.1 => (0x003c1000)
libm.so.6 => /lib/libm.so.6 (0x00728000)
libc.so.6 => /lib/libc.so.6 (0x005ed000)
/lib/ld-linux.so.2 (0x005d0000)

Running the debugger strace gives these diferencial results:
# strace -o /tmp/dump-dynamic ./dynamic-test-cpu-2

times({tms_utime=28, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 1718797100
times({tms_utime=4511, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 1718801583
write(1, "10 millones de rand() en 44.830 "..., 72) = 72

# strace -o /tmp/dump-static ./static-test-cpu-2

times({tms_utime=27, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 1718810164
times({tms_utime=50, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 1718810186
write(1, "10 millones de rand() en 0.230 s"..., 71) = 71



So I can conclude that the combination of a PIV-3Ghz-4GbRAM-SATA plus Fedora plus dynamic compilation is reporting the problem.

Any hint to find out what is happening or to know of somebody that shares the problem, will be very helping.

Thanks in advance.

`,,`,,`
Juan Ignacio Perez Sacristan
webmaster@bankhacker.com
Linux, Perl, PHP, MySQL ... solutions.
http://www.bankhacker.com/
Zaragoza, Spain
`,,`,,`