 |
 |
 |
 |
| Programming & Packaging A place to discuss programming and packaging. |

4th October 2011, 04:01 PM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
C++ speed tweaks.
Hello,
I have a rather slow C++ program than I have written.
It consists of there nested for loops that loop from -300 to 300:
Code:
int x,y,z;
int max = 1000;
int min = -max;
increment = 1;
for(x = min;x < max; x+=increment){
for(y = min; y < max; y+=increment){
for(z = min; z < 0; z+=increment){
**calculate stuff**
}}}
The problem is if I compile it with the -Ofast option then it takes 0.04 seconds. If I change max to 1001 it takes 14 seconds. Any idea why?
|

4th October 2011, 04:26 PM
|
|
Official Gnome 3 Sales Rep. (and Adminstrator)
|
|
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711

|
|
|
Re: C++ speed tweaks.
I wouldn't like to guess without seeing the inner loop. Are you iterating through arrays? If so, it may be a CPU cache issue.
Also look at the assembly output (gcc -S -o file.s file.cc) in both cases to see if the optimizer does different things.
What happens if max is user-input (so the compiler cannot see its value)?
Gareth
|

4th October 2011, 04:44 PM
|
 |
"Fixed" by (vague) request
|
|
Join Date: Oct 2005
Location: GMT+ 1
Posts: 2,950

|
|
|
Re: C++ speed tweaks.
What compiler are you using? I found no -Ofast option in gcc/g++. The closest thing is -O3 which optimizes a lot:
This is w/o:
Code:
real 0m13.169s
user 0m12.750s
sys 0m0.043s
and this is with:
Code:
real 0m0.005s
user 0m0.001s
sys 0m0.003s
I deleted the **calculate stuff** line, of course
Last edited by giulix; 4th October 2011 at 04:52 PM.
|

4th October 2011, 05:06 PM
|
 |
Registered User
|
|
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302

|
|
|
Re: C++ speed tweaks.
Desp;ite teh OPs note the three neves of iteration go from -1000 to +1000 with an increment of one, so there are 8 billion iterations of the inner loop.
Probably the best optimizations will involves examining the caluatio in the inner loop and pushing as much as possible to the more outter loops.
Also the algorithm MAY be parallelizable, but we can't say w/o seeing the calculation.
A compiler optimizers MAY apply some of these optimizations.
SOME compilers optimizers will completely discard the code it it has no output and no side effects. (therefore 0.04 seconds).
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe
Last edited by stevea; 4th October 2011 at 05:08 PM.
|

4th October 2011, 05:43 PM
|
|
Official Gnome 3 Sales Rep. (and Adminstrator)
|
|
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711

|
|
|
Re: C++ speed tweaks.
Quote:
Originally Posted by giulix
What compiler are you using? I found no -Ofast option in gcc/g++.
|
-Ofast was added in recent GCC versions.
---------- Post added at 05:43 PM ---------- Previous post was at 05:35 PM ----------
Quote:
Originally Posted by stevea
Desp;ite teh OPs note the three neves of iteration go from -1000 to +1000 with an increment of one, so there are 8 billion iterations of the inner loop.
[...]
SOME compilers optimizers will completely discard the code it it has no output and no side effects. (therefore 0.04 seconds).
|
Good catch. If your machine is really running at something like 500+ GHz, I want one. (GCC is actually quite good at optimizing these days, despite its reputation.)
Gareth
|

4th October 2011, 09:39 PM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
Re: C++ speed tweaks.
Hi All,
Thanks for all the replies.
Okay full loop is as follows:
Code:
double max = 1000;
double min = -max;
int increment = 1;
double E_nu_x = 100;
double E_nu_y = 100;
double E_b = 100;
double p_b_x = 100;
double p_b_y = 100;
double p_b_z = 100;
double E_l = 100;
double p_l_x = 100;
double p_l_y = 100;
double p_l_z = 100;
double E_bbar = 100;
double p_bbar_x = 100;
double p_bbar_y = 100;
double p_bbar_z = 100;
double E_lbar = 100;
double p_lbar_x = 100;
double p_lbar_y = 100;
double p_lbar_z = 100;
double p_nubar_z;
double E_nu;
double M_w;
double M_t;
double E_nubar;
double M_wbar;
double M_tbar;
double p_nu_x = min;
double p_nu_y = min;
double p_nu_z = min;
double p_nubar_x;
double p_nubar_y;
for(p_nu_x = min; p_nu_x < max; p_nu_x+=increment){
p_nubar_x = E_nu_x - p_nu_x;
for(p_nu_y = min; p_nu_y < max; p_nu_y+=increment){
p_nubar_y = E_nu_y - p_nu_y;
for(p_nu_z = min; p_nu_z < 0; p_nu_z+=increment){
p_nubar_z = p_nu_z;
E_nu = sqrt(((p_nu_x)*(p_nu_x))
+ ((p_nu_y)*(p_nu_y))
+ ((p_nu_z)*(p_nu_z)));
M_w = sqrt(((E_l + E_nu)*(E_l + E_nu))
- ((p_l_x + p_nu_x)*(p_l_x + p_nu_x))
- ((p_l_y + p_nu_y)*(p_l_y + p_nu_y))
- ((p_l_z + p_nu_z)*(p_l_z + p_nu_z)));
M_t = sqrt(((E_b + E_l + E_nu)*(E_b + E_l + E_nu))
- ((p_b_x + p_l_x + p_nu_x)*(p_b_x + p_l_x + p_nu_x))
- ((p_b_y + p_l_y + p_nu_y)*(p_b_y + p_l_y + p_nu_y))
- ((p_b_z + p_l_z + p_nu_z)*(p_b_z + p_l_z + p_nu_z)));
E_nubar = sqrt(((p_nubar_x)*(p_nubar_x))
+ ((p_nubar_y)*(p_nubar_y))
+ ((p_nubar_z)*(p_nubar_z)));
M_wbar = sqrt(((E_lbar + E_nubar)*(E_lbar + E_nubar))
- ((p_lbar_x + p_nubar_x)*(p_lbar_x + p_nubar_x))
- ((p_lbar_y + p_nubar_y)*(p_lbar_y + p_nubar_y))
- ((p_lbar_z + p_nubar_z)*(p_lbar_z + p_nubar_z)));
M_tbar = sqrt(((E_bbar + E_lbar + E_nubar)*(E_bbar + E_lbar + E_nubar))
- ((p_bbar_x + p_lbar_x + p_nubar_x)*(p_bbar_x + p_lbar_x + p_nubar_x))
- ((p_bbar_y + p_lbar_y + p_nubar_y)*(p_bbar_y + p_lbar_y + p_nubar_y))
- ((p_bbar_z + p_lbar_z + p_nubar_z)*(p_bbar_z + p_lbar_z + p_nubar_z)));
///////////////
///////////////////// Flip
p_nu_z = -p_nu_z;
p_nubar_z = p_nu_z;
/*
E_nu = sqrt(((p_nu_x)*(p_nu_x))
+ ((p_nu_y)*(p_nu_y))
+ ((p_nu_z)*(p_nu_z)));
*/
M_w = sqrt(((E_l + E_nu)*(E_l + E_nu))
- ((p_l_x + p_nu_x)*(p_l_x + p_nu_x))
- ((p_l_y + p_nu_y)*(p_l_y + p_nu_y))
- ((p_l_z + p_nu_z)*(p_l_z + p_nu_z)));
M_t = sqrt(((E_b + E_l + E_nu)*(E_b + E_l + E_nu))
- ((p_b_x + p_l_x + p_nu_x)*(p_b_x + p_l_x + p_nu_x))
- ((p_b_y + p_l_y + p_nu_y)*(p_b_y + p_l_y + p_nu_y))
- ((p_b_z + p_l_z + p_nu_z)*(p_b_z + p_l_z + p_nu_z)));
/*
E_nubar = sqrt(((p_nubar_x)*(p_nubar_x))
+ ((p_nubar_y)*(p_nubar_y))
+ ((p_nubar_z)*(p_nubar_z)));
*/
M_wbar = sqrt(((E_lbar + E_nubar)*(E_lbar + E_nubar))
- ((p_lbar_x + p_nubar_x)*(p_lbar_x + p_nubar_x))
- ((p_lbar_y + p_nubar_y)*(p_lbar_y + p_nubar_y))
- ((p_lbar_z + p_nubar_z)*(p_lbar_z + p_nubar_z)));
M_tbar = sqrt(((E_bbar + E_lbar + E_nubar)*(E_bbar + E_lbar + E_nubar))
- ((p_bbar_x + p_lbar_x + p_nubar_x)*(p_bbar_x + p_lbar_x + p_nubar_x))
- ((p_bbar_y + p_lbar_y + p_nubar_y)*(p_bbar_y + p_lbar_y + p_nubar_y))
- ((p_bbar_z + p_lbar_z + p_nubar_z)*(p_bbar_z + p_lbar_z + p_nubar_z)));
//re flip so that for loop does not fail.
p_nu_z = -p_nu_z;
}
}
}
return 0;
}
I am not sure if any of the inner loop can be moved to the outer loop. You will notice that the final loop is split in two. This resulted in a speed increase. Splitting it further proved minimal increases with each split, example:
Code:
inner loop:
... i+=4){
function(i);
function(i+1);
function(i+2);
function(i+3);
}
Any suggestions welcome.
|

4th October 2011, 11:11 PM
|
|
Official Gnome 3 Sales Rep. (and Adminstrator)
|
|
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711

|
|
|
Re: C++ speed tweaks.
Yes, max=1000 is 0.01s, max=1001 jumps to 5s here. (I added "#include <cmath>", "using namespace std;" and "int main () {" to the start.)
However:
Code:
$ g++ -Wall -Wextra -Ofast maths.cc
maths.cc: In function ‘int main()’:
maths.cc:37:10: warning: variable ‘M_w’ set but not used [-Wunused-but-set-variable]
maths.cc:38:10: warning: variable ‘M_t’ set but not used [-Wunused-but-set-variable]
maths.cc:40:10: warning: variable ‘M_wbar’ set but not used [-Wunused-but-set-variable]
maths.cc:41:10: warning: variable ‘M_tbar’ set but not used [-Wunused-but-set-variable]
I added:
Code:
cout << M_w << M_t << M_wbar << M_tbar << endl;
before the return statement, and recompiled with the same command without warnings. ./a.out is taking about 3 minutes to run, so my guess is that without doing something with the results, GCC is indeed optimizing the loop to a NOOP. Why incrementing max makes such a sudden jump I don't know, maybe it passes a limit in the loop optimizer's flow analysis...
Gareth
---------- Post added at 11:11 PM ---------- Previous post was at 11:04 PM ----------
P.S.: M_w and M_t are reported as -NaN at the end. You might want to look into that.
|

5th October 2011, 09:34 AM
|
 |
Registered User
|
|
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883

|
|
|
Re: C++ speed tweaks.
Quote:
Originally Posted by Gareth Jones
Yes, max=1000 is 0.01s, max=1001 jumps to 5s here. (I added "#include <cmath>", "using namespace std;" and "int main () {" to the start.)
However:
Code:
$ g++ -Wall -Wextra -Ofast maths.cc
maths.cc: In function ‘int main()’:
maths.cc:37:10: warning: variable ‘M_w’ set but not used [-Wunused-but-set-variable]
maths.cc:38:10: warning: variable ‘M_t’ set but not used [-Wunused-but-set-variable]
maths.cc:40:10: warning: variable ‘M_wbar’ set but not used [-Wunused-but-set-variable]
maths.cc:41:10: warning: variable ‘M_tbar’ set but not used [-Wunused-but-set-variable]
I added:
Code:
cout << M_w << M_t << M_wbar << M_tbar << endl;
before the return statement, and recompiled with the same command without warnings. ./a.out is taking about 3 minutes to run, so my guess is that without doing something with the results, GCC is indeed optimizing the loop to a NOOP. Why incrementing max makes such a sudden jump I don't know, maybe it passes a limit in the loop optimizer's flow analysis...
Gareth
---------- Post added at 11:11 PM ---------- Previous post was at 11:04 PM ----------
P.S.: M_w and M_t are reported as -NaN at the end. You might want to look into that.
|
Thank you for the reply. Yes, M_w, M_t, M_wbar and M_tbar; should at the minute be set and not used (Thats the next slow down they need to be put in a vector!)
The input values (ones outside the for loop) are just random numbers at the minute so I expect that a lot of the values will be -NaN; Once real values are in place there should be some values that are not NaN.
I will also puts some if statements in to avoid sqrts where necessary. (i.e. where x is negative and I have sqrt(x)).
Also what do you mean by NOOP?
|

5th October 2011, 02:50 PM
|
|
Official Gnome 3 Sales Rep. (and Adminstrator)
|
|
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711

|
|
|
Re: C++ speed tweaks.
Quote:
Originally Posted by Adunaic
Also what do you mean by NOOP?
|
Sorry, "no operation". Technically it's a processor instruction that does nothing (except use a processor cycle and an instruction's space in memory).
In this case though, I just meant that because the results of the loop calculation weren't being used (until I added the cout statement), and the loop had no externally visible side-effects (apart from execution time), the optimizer dropped all or most of the calculation.
Outputting the results to the terminal forces the optimizer to keep the loop calculation, which makes the code run much slower, but only because before it wasn't really running at all.
As for the 1000 to 1001 max change, I guess that GCCs flow analysis has some upper limits with regard to loops, and once they're passed, the optimizer is less aggressive.
Gareth
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Current GMT-time: 00:55 (Friday, 24-05-2013)
|
|
 |
 |
 |
 |
|
|