Fedora Linux Support Community & Resources Center
  #1  
Old 4th October 2011, 04:01 PM
Adunaic's Avatar
Adunaic Online
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
C++ speed tweaks.

Hello,

I have a rather slow C++ program than I have written.

It consists of there nested for loops that loop from -300 to 300:

Code:
int x,y,z;
int max = 1000;
int min = -max;
increment = 1;

  for(x = min;x < max; x+=increment){

    for(y = min; y < max; y+=increment){

      for(z = min; z < 0; z+=increment){

**calculate stuff**

}}}
The problem is if I compile it with the -Ofast option then it takes 0.04 seconds. If I change max to 1001 it takes 14 seconds. Any idea why?
Reply With Quote
  #2  
Old 4th October 2011, 04:26 PM
Gareth Jones Online
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711
linuxfirefox
Re: C++ speed tweaks.

I wouldn't like to guess without seeing the inner loop. Are you iterating through arrays? If so, it may be a CPU cache issue.

Also look at the assembly output (gcc -S -o file.s file.cc) in both cases to see if the optimizer does different things.

What happens if max is user-input (so the compiler cannot see its value)?

Gareth
Reply With Quote
  #3  
Old 4th October 2011, 04:44 PM
giulix's Avatar
giulix Offline
"Fixed" by (vague) request
 
Join Date: Oct 2005
Location: GMT+ 1
Posts: 2,950
linuxfirefox
Re: C++ speed tweaks.

What compiler are you using? I found no -Ofast option in gcc/g++. The closest thing is -O3 which optimizes a lot:

This is w/o:
Code:
real	0m13.169s
user	0m12.750s
sys	0m0.043s
and this is with:
Code:
real	0m0.005s
user	0m0.001s
sys	0m0.003s
I deleted the **calculate stuff** line, of course

Last edited by giulix; 4th October 2011 at 04:52 PM.
Reply With Quote
  #4  
Old 4th October 2011, 05:06 PM
stevea's Avatar
stevea Online
Registered User
 
Join Date: Apr 2006
Location: Ohio, USA
Posts: 8,302
linuxfedorafirefox
Re: C++ speed tweaks.

Desp;ite teh OPs note the three neves of iteration go from -1000 to +1000 with an increment of one, so there are 8 billion iterations of the inner loop.

Probably the best optimizations will involves examining the caluatio in the inner loop and pushing as much as possible to the more outter loops.

Also the algorithm MAY be parallelizable, but we can't say w/o seeing the calculation.


A compiler optimizers MAY apply some of these optimizations.
SOME compilers optimizers will completely discard the code it it has no output and no side effects. (therefore 0.04 seconds).
__________________
None are more hopelessly enslaved than those who falsely believe they are free.
Johann Wolfgang von Goethe

Last edited by stevea; 4th October 2011 at 05:08 PM.
Reply With Quote
  #5  
Old 4th October 2011, 05:43 PM
Gareth Jones Online
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711
linuxfirefox
Re: C++ speed tweaks.

Quote:
Originally Posted by giulix View Post
What compiler are you using? I found no -Ofast option in gcc/g++.
-Ofast was added in recent GCC versions.

---------- Post added at 05:43 PM ---------- Previous post was at 05:35 PM ----------

Quote:
Originally Posted by stevea View Post
Desp;ite teh OPs note the three neves of iteration go from -1000 to +1000 with an increment of one, so there are 8 billion iterations of the inner loop.

[...]

SOME compilers optimizers will completely discard the code it it has no output and no side effects. (therefore 0.04 seconds).
Good catch. If your machine is really running at something like 500+ GHz, I want one. (GCC is actually quite good at optimizing these days, despite its reputation.)

Gareth
Reply With Quote
  #6  
Old 4th October 2011, 09:39 PM
Adunaic's Avatar
Adunaic Online
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
Re: C++ speed tweaks.

Hi All,

Thanks for all the replies.

Okay full loop is as follows:

Code:
 double max = 1000;
  double min = -max;

  int increment = 1;

  double E_nu_x = 100;
  double E_nu_y = 100;

  double E_b = 100;  
  double p_b_x = 100;  
  double p_b_y = 100;  
  double p_b_z = 100;  

  double E_l = 100;  
  double p_l_x = 100;  
  double p_l_y = 100;  
  double p_l_z = 100;  

  double E_bbar = 100;  
  double p_bbar_x = 100;  
  double p_bbar_y = 100;  
  double p_bbar_z = 100;  

  double E_lbar = 100;  
  double p_lbar_x = 100;  
  double p_lbar_y = 100;  
  double p_lbar_z = 100;  

  double p_nubar_z;
  double E_nu;
  double M_w;
  double M_t;
  double E_nubar;
  double M_wbar;
  double M_tbar;

  double p_nu_x = min;
  double p_nu_y = min;
  double p_nu_z = min;

  double p_nubar_x;
  double p_nubar_y;

  for(p_nu_x = min; p_nu_x < max; p_nu_x+=increment){

    p_nubar_x = E_nu_x - p_nu_x;
    for(p_nu_y = min; p_nu_y < max; p_nu_y+=increment){

      p_nubar_y = E_nu_y - p_nu_y;
      for(p_nu_z = min; p_nu_z < 0; p_nu_z+=increment){

	p_nubar_z = p_nu_z;

	 E_nu = sqrt(((p_nu_x)*(p_nu_x))
			  + ((p_nu_y)*(p_nu_y))
			  + ((p_nu_z)*(p_nu_z)));
	
	 M_w = sqrt(((E_l + E_nu)*(E_l + E_nu)) 
		    - ((p_l_x + p_nu_x)*(p_l_x + p_nu_x))
		    - ((p_l_y + p_nu_y)*(p_l_y + p_nu_y))
		    - ((p_l_z + p_nu_z)*(p_l_z + p_nu_z)));
	
	 M_t = sqrt(((E_b + E_l + E_nu)*(E_b + E_l + E_nu))
		    - ((p_b_x + p_l_x + p_nu_x)*(p_b_x + p_l_x + p_nu_x))
		    - ((p_b_y + p_l_y + p_nu_y)*(p_b_y + p_l_y + p_nu_y))
		    - ((p_b_z + p_l_z + p_nu_z)*(p_b_z + p_l_z + p_nu_z)));
	 
	 E_nubar = sqrt(((p_nubar_x)*(p_nubar_x))
			+ ((p_nubar_y)*(p_nubar_y))
			+ ((p_nubar_z)*(p_nubar_z)));
	 
	 M_wbar = sqrt(((E_lbar + E_nubar)*(E_lbar + E_nubar)) 
		       - ((p_lbar_x + p_nubar_x)*(p_lbar_x + p_nubar_x))
		       - ((p_lbar_y + p_nubar_y)*(p_lbar_y + p_nubar_y))
		       - ((p_lbar_z + p_nubar_z)*(p_lbar_z + p_nubar_z)));
	 
	 M_tbar = sqrt(((E_bbar + E_lbar + E_nubar)*(E_bbar + E_lbar + E_nubar))
		       - ((p_bbar_x + p_lbar_x + p_nubar_x)*(p_bbar_x + p_lbar_x + p_nubar_x))
		       - ((p_bbar_y + p_lbar_y + p_nubar_y)*(p_bbar_y + p_lbar_y + p_nubar_y))
		       - ((p_bbar_z + p_lbar_z + p_nubar_z)*(p_bbar_z + p_lbar_z + p_nubar_z)));

	 ///////////////
	
	///////////////////// Flip
	p_nu_z = -p_nu_z;
	p_nubar_z = p_nu_z;
	/*
	 E_nu = sqrt(((p_nu_x)*(p_nu_x))
			  + ((p_nu_y)*(p_nu_y))
			  + ((p_nu_z)*(p_nu_z)));
	*/
	M_w = sqrt(((E_l + E_nu)*(E_l + E_nu)) 
		   - ((p_l_x + p_nu_x)*(p_l_x + p_nu_x))
		   - ((p_l_y + p_nu_y)*(p_l_y + p_nu_y))
		   - ((p_l_z + p_nu_z)*(p_l_z + p_nu_z)));
	
	M_t = sqrt(((E_b + E_l + E_nu)*(E_b + E_l + E_nu))
		   - ((p_b_x + p_l_x + p_nu_x)*(p_b_x + p_l_x + p_nu_x))
		   - ((p_b_y + p_l_y + p_nu_y)*(p_b_y + p_l_y + p_nu_y))
		   - ((p_b_z + p_l_z + p_nu_z)*(p_b_z + p_l_z + p_nu_z)));
	 /*
	 E_nubar = sqrt(((p_nubar_x)*(p_nubar_x))
			     + ((p_nubar_y)*(p_nubar_y))
			     + ((p_nubar_z)*(p_nubar_z)));
	 */
	M_wbar = sqrt(((E_lbar + E_nubar)*(E_lbar + E_nubar)) 
		      - ((p_lbar_x + p_nubar_x)*(p_lbar_x + p_nubar_x))
		      - ((p_lbar_y + p_nubar_y)*(p_lbar_y + p_nubar_y))
		      - ((p_lbar_z + p_nubar_z)*(p_lbar_z + p_nubar_z)));
	
	M_tbar = sqrt(((E_bbar + E_lbar + E_nubar)*(E_bbar + E_lbar + E_nubar))
		      - ((p_bbar_x + p_lbar_x + p_nubar_x)*(p_bbar_x + p_lbar_x + p_nubar_x))
		      - ((p_bbar_y + p_lbar_y + p_nubar_y)*(p_bbar_y + p_lbar_y + p_nubar_y))
		      - ((p_bbar_z + p_lbar_z + p_nubar_z)*(p_bbar_z + p_lbar_z + p_nubar_z)));

	 //re flip so that for loop does not fail.	
	 p_nu_z = -p_nu_z;
      }
    }
  }
  return 0;

}
I am not sure if any of the inner loop can be moved to the outer loop. You will notice that the final loop is split in two. This resulted in a speed increase. Splitting it further proved minimal increases with each split, example:

Code:
inner loop:
... i+=4){

function(i);
function(i+1);
function(i+2);
function(i+3);
}

Any suggestions welcome.
Reply With Quote
  #7  
Old 4th October 2011, 11:11 PM
Gareth Jones Online
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711
linuxfirefox
Re: C++ speed tweaks.

Yes, max=1000 is 0.01s, max=1001 jumps to 5s here. (I added "#include <cmath>", "using namespace std;" and "int main () {" to the start.)

However:
Code:
$ g++ -Wall -Wextra -Ofast maths.cc 
maths.cc: In function ‘int main()’:
maths.cc:37:10: warning: variable ‘M_w’ set but not used [-Wunused-but-set-variable]
maths.cc:38:10: warning: variable ‘M_t’ set but not used [-Wunused-but-set-variable]
maths.cc:40:10: warning: variable ‘M_wbar’ set but not used [-Wunused-but-set-variable]
maths.cc:41:10: warning: variable ‘M_tbar’ set but not used [-Wunused-but-set-variable]
I added:
Code:
cout << M_w << M_t << M_wbar << M_tbar << endl;
before the return statement, and recompiled with the same command without warnings. ./a.out is taking about 3 minutes to run, so my guess is that without doing something with the results, GCC is indeed optimizing the loop to a NOOP. Why incrementing max makes such a sudden jump I don't know, maybe it passes a limit in the loop optimizer's flow analysis...

Gareth

---------- Post added at 11:11 PM ---------- Previous post was at 11:04 PM ----------

P.S.: M_w and M_t are reported as -NaN at the end. You might want to look into that.
Reply With Quote
  #8  
Old 5th October 2011, 09:34 AM
Adunaic's Avatar
Adunaic Online
Registered User
 
Join Date: Mar 2009
Location: Lancaster, UK
Posts: 883
linuxfirefox
Re: C++ speed tweaks.

Quote:
Originally Posted by Gareth Jones View Post
Yes, max=1000 is 0.01s, max=1001 jumps to 5s here. (I added "#include <cmath>", "using namespace std;" and "int main () {" to the start.)

However:
Code:
$ g++ -Wall -Wextra -Ofast maths.cc 
maths.cc: In function ‘int main()’:
maths.cc:37:10: warning: variable ‘M_w’ set but not used [-Wunused-but-set-variable]
maths.cc:38:10: warning: variable ‘M_t’ set but not used [-Wunused-but-set-variable]
maths.cc:40:10: warning: variable ‘M_wbar’ set but not used [-Wunused-but-set-variable]
maths.cc:41:10: warning: variable ‘M_tbar’ set but not used [-Wunused-but-set-variable]
I added:
Code:
cout << M_w << M_t << M_wbar << M_tbar << endl;
before the return statement, and recompiled with the same command without warnings. ./a.out is taking about 3 minutes to run, so my guess is that without doing something with the results, GCC is indeed optimizing the loop to a NOOP. Why incrementing max makes such a sudden jump I don't know, maybe it passes a limit in the loop optimizer's flow analysis...

Gareth

---------- Post added at 11:11 PM ---------- Previous post was at 11:04 PM ----------

P.S.: M_w and M_t are reported as -NaN at the end. You might want to look into that.
Thank you for the reply. Yes, M_w, M_t, M_wbar and M_tbar; should at the minute be set and not used (Thats the next slow down they need to be put in a vector!)

The input values (ones outside the for loop) are just random numbers at the minute so I expect that a lot of the values will be -NaN; Once real values are in place there should be some values that are not NaN.

I will also puts some if statements in to avoid sqrts where necessary. (i.e. where x is negative and I have sqrt(x)).

Also what do you mean by NOOP?
Reply With Quote
  #9  
Old 5th October 2011, 02:50 PM
Gareth Jones Online
Official Gnome 3 Sales Rep. (and Adminstrator)
 
Join Date: Jul 2011
Location: Leamington Spa, UK
Age: 30
Posts: 1,711
linuxfirefox
Re: C++ speed tweaks.

Quote:
Originally Posted by Adunaic View Post
Also what do you mean by NOOP?
Sorry, "no operation". Technically it's a processor instruction that does nothing (except use a processor cycle and an instruction's space in memory).

In this case though, I just meant that because the results of the loop calculation weren't being used (until I added the cout statement), and the loop had no externally visible side-effects (apart from execution time), the optimizer dropped all or most of the calculation.

Outputting the results to the terminal forces the optimizer to keep the loop calculation, which makes the code run much slower, but only because before it wasn't really running at all.

As for the 1000 to 1001 max change, I guess that GCCs flow analysis has some upper limits with regard to loops, and once they're passed, the optimizer is less aggressive.

Gareth
Reply With Quote
Reply

Tags
speed, tweaks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A few tweaks clay247 Suggestions & Feedback 22 18th February 2008 06:15 AM
Konqueror tweaks Coolerthanyou Using Fedora 4 12th November 2006 11:39 PM
Speed tips/tricks/tweaks for PPC installations Zakarth Mac Chat 0 7th November 2005 09:51 AM
FC3 startup speed tweaks, are there any? storlied EOL (End Of Life) Versions 10 2nd October 2005 01:18 AM
Linux DSL Tweaks? EvolutionR Servers & Networking 2 23rd November 2004 11:41 PM


Current GMT-time: 00:55 (Friday, 24-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat