Week 3
Compiler
Compiler is like a translator. Human type things they better understand (more human friendly), than it will change it to what the machine understands. There is few steps on translating code. (pre-processing | compiling | assembling | linking).
Pre-prcessing is to prepare the human typed code for the next step. It can be removing your comments. compiling this is the step that the code is being translated to assembly language so it's one step closer to what machine can read. Assembling is taking those assembly language and making them in to machine code which computer can read and run. Linking, is other code or set of instructions and combine it with your code for the program to work.
Knowing how compilers improve your code lets you trust them to make your easy-to-read code run faster. This lets you focus on more important stuff, like picking the best ways to solve problems and how to store your data.The compiler we focus on is GCC.
GCC options:
-o flag: (the strictest of checks | a preset for which flag is on and off)
-O0
: No optimization-O1
to -O3
: Increasing levels of optimization for speed (might affect size)-Os
: Optimize for size -Ofast
: Very aggressive speed optimization-Og
: Optimize for debugging (avoid optimizations which convolute debugging)-f flag: (allows more control over specific optimizations)
-funroll_loops
: Unroll loops for faster execution (could increase size)-fno-unroll-loops
: Disable loop unrollingyou can check which optimization s are on and off by add (-Q --help=optimizers) :
gcc -O3 -Q --help=optimizers
Examples of Common Optimizations
Code Rewriting Optimizations
Rewriting code to take up less space and runs faster is the goal.
The compiler has a be-hide the sense version of the code. (compiler could rewrite your code)
Few technique of optimizations:
Strength Reduction:
Find the slow operations(expensive) in your code and replacing them with faster operations(cheaper).
int x; for (x=0; x < 10; x++) { printf("%d\n", x*6); }
- adding six everytime
- update the check flag to 60
int x; for (x=0; x < 60; x+=6) { printf("%d", x); }It remove the multiplication (improve speed). goal is to replace expensive operations with cheap once.
Hoisting:
Finding things that don't need to be in a loop out.
Hoisting I - Loop-Invariant Variable
int t, x; double c; t = readtemp(); /* current temp in farenheit */ for (x = 0; x < 200; x++) { c = (t-32)/1.8000 + 273.15; foo(x,c); }
C is a constant it doesn't have to be in the loop. moving it out will safe operations.
and will make this code using less power.
int t, x; double c; t = readtemp(); /* current temp in farenheit */ c = (t-32)/1.8000 + 273.15; for (x = 0; x < 200; x++) { foo(x,c); }
Hoisting II - Loop-Invariant Expression
Same goal here trying to move things that doesn't have to be in the loop out, so it will cost less.int t, x; t = readtemp(); /* current temp in farenheit */ for (x = 0; x < 200; x++) { foo(x,(t-32)/1.8000 + 273.15); }
There is a calculation that doesn't have to be in the loop. it doesn't depend other value and there is no risk of putting it outside the loop.
int t, x; double c; t = readtemp(); /* current temp in farenheit */ c = (t-32)/1.8000 + 273.15; for (x = 0; x < 200; x++) { foo(x,c); }
Hoisting III - Loop-Invariant Expression in Loop Condition
We target the loop condition. Look for values that can be hoist outside of the loop. Reduces the calculation.
int x, y; y = foo(); for (x=0; x < y*12; x++) { bar(x); }
Move y*12 outside of the loop.
int x, y, c; y = foo(); c = y * 12; for (x=0; x < c; x++) { bar(x); }
Pre-calculation of Constants
Compiler will see when and where code are being use, And in a safe condition it would update the code.
ff = (212-32)/100; /* factor for celcius-farenheit conversion */ conv = c * ff + 32;
Updated code
conv = c * 1.800 + 32;
Loop Unswitching
Look for less expensive condition and move it before the condition that cost more. Use the less expensive way to check do you go in. Or sometimes move the check outside the loop before going in.
Loop Unswitching I - Inner/Outer Swap
int foo(float ctl) { int x; for (x=0; x < 10000; x++) { if (ctl == 0) { bar(x); } else { qux(x); } } }
We are check in ctl is 0 or any other thing. we don't have to check everything in the loop. Moving the check before looping will make t only check once. This make the code cost a lot less.
int foo(float ctl) { int x; if (ctl == 0) { for (x=0; x < 10000; x++) { bar(x); } } else { for (x=0; x < 10000; x++) { qux(x); } } }
Loop Unswitching II - Inner/Outer Swap with Code Repetition
int foo(float ctl) { int x; for (x=0; x < 10000; x++) { bar(x); if (ctl == 0) { qux(x); } } }
Here we move the check before going in to the loop. However it take a little more space.
int foo(float ctl) { int x; if (ctl == 0) { for (x=0; x < 10000; x++) { bar(x); qux(x); } } else { for (x=0; x < 10000; x++) { bar(x); } } }
Loop Splitting
We look for splitting up the loop and remove the check flag(if possible).
int foo(float ctl) { int x; for (x=0; x < 10000; x++) { if (x < 3700) { bar(x); } else { qux(x); } } }
By splitting the loop in to 2 we removed the check flag in the loop.
int foo(float ctl) { int x; for (x=0; x < 3700; x++) { bar(x); } for (; x < 10000; x++) { qux(x); } }
Loop Interchange
Sometimes swapping the inner and outer loop can result in better performance.
int x, y, max=10000; long double a[[max]][max]; for (y=0; y < max; y++) { for (x=0; x<max; x++) { a[[x]][y]=foo(x,y); } }
Making the memory access to be sequential.
int x, y, max=10000; long double a[[max]][max]; for (x=0; x < max; x++) { for (y=0; y<max; y++) { a[[x]][y]=foo(x,y); } }
Loop Unrolling
Have index counter with multiple copies call if possible. Will results in less number of loop but takes more space.
Loop Unrolling I - Guaranteed-Multiple Iterations
int x, y; y=foo(); for (x = 0; x < y*2; x++) { bar(x); }
We call bar more than once and have x goes up but 2 every loop. Reduces the check by 2.
int x, y; y=foo(); for (x = 0; x < y*2; ) { bar(x++); bar(x++); }
Loop Unrolling II - Pairs-of-Iterations plus a Conditional Extra Iteration
We can add extra code after the loop to make sure the number of calls matches no matter if it's odd or even
int x, y; y=foo(); for (x=0; x < y; x++) { bar(x); }
Added an extra check to make sure the number of calls are good.
int x, y; y=foo(); for (x=0; x < y-1; ) { bar(x++); bar(x++); } if (x < y ) { bar(x++); }
Loop Unrolling III - Large Number of Iterations
This is just doing a super unrolling. Up the number of call in the loop. Make the loop much shorter.
int x, y; y=foo(); for (x=0; x < y; x++) { bar(x); }
Adding much more calls in the loop.
YOU CAN ONLY UNROLLED AS LONG AS IT GITS INTO CACHE.
int x, y; y=foo(); for (x=0; x < y-10; ) { bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); bar(x++); } for (; x < y; ) { bar(x++); }
Inlining
This is like lamda. We remove the function and put the instruction in where it was.
int foo(int a, int b) { return a + b * 2; } int x, i; long ttl=0; for (x = 0; x < 10000; x++) { i = rand(); ttl += foo(x, i); }
Here it remove the function foo() and did the calculation in the loop.
int x, i; long ttl=0; for (x = 0; x < 10000; x++) { i = rand(); ttl += x + i*2; }
Common Subexpression Elimination
we try to group up mirror expression (safe time).
x = (a * c) - (a * c) / 12
update it to only do one multiplication.
t = (a * c) x = t - t/12
Jump Threading
Goal is to remove unnecessary checks.
int a, b; a=foo(); b=!a; if (a) { bar(); } if (b) { baz(); }
removing unnecessary check for if (b)
int a, b; a=foo(); if (a) { bar(); } else { baz(); }
Short-Circuit Evaluation
Short Circuit I - AND (&&)
If the first condition is false than the check is false there is no need to keep on checking.
if (a * b > c && d > e) { foo(); }
============================================
if (a * b > c) { if (d > e) { foo(); } }
Short Circuit II - OR (||)
If one of the condition is true than there are no need to check more. We can just skip it and go in to it.
if (a * b > c || d > e ) { foo(); }
============================================
// This makes more sense in assembler than C! if (a * b > c) { foo(); } elseif (d > e) { foo(); }
Test Reordering
Put the less expensive check before the costly once. If first condition is fails program won't check for the other once.
if (a * b > c && d > e) { foo(); }
Move the less expensive check at the front. b > e cost less than a * b > c.
if (d > e) { if (a * b > c) { foo(); } }
Dead Store Elimination
Having useless code in your program that never get use it just get overwritten. It add no value to the program.
Dead Store Elimination I - Simple Case
The first variable was never been used. we can remove it.
a = b * c + d / f; a = c / f * 60;
============================================
a = c / f * 60;
Dead Store Elimination II - Unused Initialized Value in Declaration
Stores unused initialized value
int a = 0; /* ...possibly many lines later, before a is read: */ a = foo();
We use the power to initialized the variable a to 0 but than we replace it with another value.
int a; a = foo();
Dead Store Elimination III - Special Case
Another dead stores when we rewrite a variable without using it when there is a special case.
int a; a = b * c + d / f; /* normal/default case */ if (foo()) { a = c / f * 60; /* special case! */ }
add condition check to make sure it only initialized once.
int a; if (foo()) { a = c / f * 60; } else { a = b * c + d / f; }
Machine-Code Level Optimization
if (condition) { action } following code
The comelier sometime would swap the code is the action is very unlikely going to happen(Block Rearrangement for Conditional Jumps). Like a error condition. Hopefully your code will run and have a very little chance of going in to the error condition check.
"In order to do this, the compiler needs to know that the condition is almost always false. The programmer can state this using Compiler Intrinsics such as the GCC __builtin_expect function, or the compiler can discover this by code analysis or PGO."
Instruction Selection
Debugging with Optimizations Enabled
Debuggers are good when the code still looks like what you wrote. compilers with high optimization will try to improve performance and update you code. At the end it might not look like the code you wrote. knowing what you are trying to do and what the code would do and using the right balance of the gcc flag will help you debug code more effectively.
Comments
Post a Comment