Week 3

Compiler

Compiler is like a translator. Human type things they better understand (more human friendly), than it will change it to what the machine understands. There is few steps on translating code. (pre-processing | compiling | assembling | linking).

Pre-prcessing is to prepare the human typed code for the next step. It can be removing your comments. compiling this is the step that the code is being translated to assembly language so it's one step closer to what machine can read. Assembling is taking those assembly language and making them in to machine code which computer can read and run. Linking, is other code or set of instructions and combine it with your code for the program to work.

Knowing how compilers improve your code lets you trust them to make your easy-to-read code run faster. This lets you focus on more important stuff, like picking the best ways to solve problems and how to store your data.

The compiler we focus on is GCC.

GCC options:

-o flag: (the strictest of checks | a preset for which flag is on and off)

  • -O0: No optimization
  • -O1 to -O3: Increasing levels of optimization for speed (might affect size)
  • -Os: Optimize for size
  • -Ofast: Very aggressive speed optimization
  • -Og: Optimize for debugging (avoid optimizations which convolute debugging)
  • -f flag: (allows more control over specific optimizations)

  • -funroll_loops: Unroll loops for faster execution (could increase size)
  • -fno-unroll-loops: Disable loop unrolling
  •  you can check which optimization s are on and off by add (-Q --help=optimizers) :

    gcc -O3 -Q --help=optimizers 

    Examples of Common Optimizations

  • Variety of Optimizations: There are many optimization techniques beyond the ones listed here, some even working on the object code level.
  • Side Effects: Optimizations often can't be applied to code with side effects like I/O or dependent calculations. Analyzing these dependencies can be challenging.
  • Visibility: The compiler needs to see all the code, which might not be possible with pre-compiled libraries or separate compilation units. Link Time Optimization can help.
  • Trade-offs: Balancing factors like speed vs code size. Usage patterns like loop counts can also affect optimization. Profile-Guided Optimization can help make choices.
  • Code Rewriting Optimizations

    Rewriting code to take up less space and runs faster is the goal.

    The compiler has a be-hide the sense version of the code. (compiler could rewrite your code)

    Few technique of optimizations:

    Strength Reduction:

     Find the slow operations(expensive) in your code and replacing them with faster operations(cheaper).

     int x;
     for (x=0; x < 10; x++) {
         printf("%d\n", x*6);
     }
    - adding six everytime
    - update the check flag to 60
     int x;
     for (x=0; x < 60; x+=6) {
         printf("%d", x);
     }
    It remove the multiplication (improve speed). goal is to replace expensive operations with cheap once.
     

    Hoisting:

     Finding things that don't need to be in a loop out.

    Hoisting I - Loop-Invariant Variable
     int t, x;
     double c;
     t = readtemp(); /* current temp in farenheit */
     for (x = 0; x < 200; x++) {
         c = (t-32)/1.8000 + 273.15;
         foo(x,c);
     }
     C is a constant it doesn't have to be in the loop. moving it out will safe operations.
     and will make this code using less power.
    int t, x;
     double c;
     t = readtemp(); /* current temp in farenheit */
     c = (t-32)/1.8000 + 273.15;
     for (x = 0; x < 200; x++) {
         foo(x,c);
     }
    Hoisting II - Loop-Invariant Expression
    Same goal here trying to move things that doesn't have to be in the loop out, so it will cost less.
     int t, x;
     t = readtemp(); /* current temp in farenheit */
     for (x = 0; x < 200; x++) {
         foo(x,(t-32)/1.8000 + 273.15);
     }

     There is a calculation that doesn't have to be in the loop. it doesn't depend other value and there is no risk of putting it outside the loop.

     int t, x;
     double c;
     t = readtemp(); /* current temp in farenheit */
     c = (t-32)/1.8000 + 273.15;
     for (x = 0; x < 200; x++) {
         foo(x,c);
     }
    Hoisting III - Loop-Invariant Expression in Loop Condition

    We target the loop condition. Look for values that can be hoist outside of the loop. Reduces the calculation.

    int x, y;
     y = foo();
     for (x=0; x < y*12; x++) {
         bar(x);
     }

     Move y*12 outside of the loop.

     int x, y, c;
     y = foo();
     c = y * 12;
     for (x=0; x < c; x++) {
         bar(x);
     }

    Pre-calculation of Constants

     Compiler will see when and where code are being use, And in a safe condition it would update the code.

     ff = (212-32)/100;   /* factor for celcius-farenheit conversion */
     conv = c * ff + 32;
    Updated code 
     conv = c * 1.800 + 32; 

    Loop Unswitching

    Look for less expensive condition and move it before the condition that cost more. Use the less expensive way to check do you go in. Or sometimes move the check outside the loop before going in.

    Loop Unswitching I - Inner/Outer Swap


     int foo(float ctl) {
         int x;
         for (x=0; x < 10000; x++) {
             if (ctl == 0) {
                 bar(x);
             } else {
                 qux(x);
             }
         }
     }

    We are check in ctl is 0 or any other thing. we don't have to check everything in the loop. Moving the check before looping will make t only check once. This make the code cost a lot less.

     int foo(float ctl) {
         int x;
         if (ctl == 0) {
             for (x=0; x < 10000; x++) {
                 bar(x);
             }
         } else {
             for (x=0; x < 10000; x++) {
                 qux(x);
             }
         }
    Loop Unswitching II - Inner/Outer Swap with Code Repetition
     int foo(float ctl) {
         int x;
         for (x=0; x < 10000; x++) {
             bar(x);
             if (ctl == 0) {
                 qux(x);
             }
         }
     }

     Here we move the check before going in to the loop. However it take a little more space.

     int foo(float ctl) {
         int x;
         if (ctl == 0) {
             for (x=0; x < 10000; x++) {
                 bar(x);
                 qux(x);
             }
         } else {
             for (x=0; x < 10000; x++) {
                 bar(x);
             }
         }
     }

    Loop Splitting

     We look for splitting up the loop and remove the check flag(if possible).

     int foo(float ctl) {
         int x;
         for (x=0; x < 10000; x++) {
             if (x < 3700) {
                 bar(x);
             } else {
                 qux(x);
             }
         }
     }

     By splitting the loop in to 2 we removed the check flag in the loop.

     int foo(float ctl) {
         int x;
         for (x=0; x < 3700; x++) {
                 bar(x);
         }
         for (; x < 10000; x++) {
                 qux(x);
         }
     }

    Loop Interchange

     Sometimes swapping the inner and outer loop can result in better performance.

     int x, y, max=10000;
     long double a[[max]][max];
     for (y=0; y < max; y++) {
         for (x=0; x<max; x++) {
             a[[x]][y]=foo(x,y);
         }
     }

    Making the memory access to be sequential.

     int x, y, max=10000;
     long double a[[max]][max];
     for (x=0; x < max; x++) {
         for (y=0; y<max; y++) {
             a[[x]][y]=foo(x,y);
         }
     }

    Loop Unrolling

     Have index counter with multiple copies call if possible. Will results in less number of loop but takes more space.

    Loop Unrolling I - Guaranteed-Multiple Iterations
     int x, y;
     y=foo();
     for (x = 0; x < y*2; x++) {
         bar(x);
     }

      We call bar more than once and have x goes up but 2 every loop. Reduces the check by 2.

     int x, y;
     y=foo();
     for (x = 0; x < y*2; ) {
         bar(x++);
         bar(x++);
     } 
    Loop Unrolling II - Pairs-of-Iterations plus a Conditional Extra Iteration

     We can add extra code after the loop to make sure the number of calls matches no matter if it's odd or even

     int x, y;
     y=foo();
     for (x=0; x < y; x++) {
         bar(x);
     }

     Added an extra check to make sure the number of calls are good.

     int x, y;
     y=foo();
     for (x=0; x < y-1; ) {
         bar(x++);
         bar(x++);
     }
     if (x < y ) {
         bar(x++);
     }
    Loop Unrolling III - Large Number of Iterations

    This is just doing a super unrolling. Up the number of call in the loop. Make the loop much shorter.

     int x, y;
     y=foo();
     for (x=0; x < y; x++) {
         bar(x);
     }

     Adding much more calls in the loop.

    YOU CAN ONLY UNROLLED AS LONG AS IT GITS INTO CACHE.

     int x, y;
     y=foo();
     for (x=0; x < y-10; ) {
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
         bar(x++);
     }
     for (; x < y; ) {
         bar(x++);
     } 

    Inlining

     This is like lamda. We remove the function and put the instruction in where it was.

     int foo(int a, int b) {
         return a + b * 2;
     }
     int x, i;
     long ttl=0;
     for (x = 0; x < 10000; x++) {
         i = rand();
         ttl += foo(x, i);
     }

     Here it remove the function foo() and did the calculation in the loop.

     int x, i;
     long ttl=0;
     for (x = 0; x < 10000; x++) {
         i = rand();
         ttl += x + i*2;
     }


    Common Subexpression Elimination

     we try to group up mirror expression (safe time).

     x = (a * c) - (a * c) / 12

     update it to only do one multiplication.

     t = (a * c)
     x = t - t/12

    Jump Threading

    Goal is to remove unnecessary checks.

     int a, b;
     a=foo();
     b=!a;
     if (a) {
         bar();
     }
     if (b) {
         baz();
     }

     removing unnecessary check for if (b)

     int a, b;
     a=foo();
     if (a) {
         bar();
     } else {
         baz();
     }

    Short-Circuit Evaluation

    Short Circuit I - AND (&&)

    If the first condition is false than the check is false there is no need to keep on checking.

     if (a * b > c && d > e) {
         foo();
     }
     ============================================
     if (a * b > c) {
         if (d > e) {
             foo();
         }
     }

    Short Circuit II - OR (||)

    If one of the condition is true than there are no need to check more. We can just skip it and go in to it.

     if (a * b > c || d > e ) {
         foo();
     }

    ============================================

     // This makes more sense in assembler than C!
     if (a * b > c) {
         foo();
     } elseif (d > e) {
         foo();
     }

    Test Reordering

     Put the less expensive check before the costly once. If first condition is fails program won't check for the other once.

     if (a * b > c && d > e) {
         foo();
     }

    Move the less expensive check at the front. b > e cost less than a * b > c.

     if (d > e) {
         if (a * b > c) {
             foo();
         }
     }

    Dead Store Elimination

     Having useless code in your program that never get use it just get overwritten. It add no value to the program.

    Dead Store Elimination I - Simple Case

     The first variable was never been used. we can remove it.

     a = b * c + d / f;
     a = c / f * 60;

     ============================================

     a = c / f * 60;
    Dead Store Elimination II - Unused Initialized Value in Declaration

    Stores unused initialized value

     int a = 0;
     /* ...possibly many lines later, before a is read: */
     a = foo();

      We use the power to initialized the variable a to 0 but than we replace it with another value.

     int a;
     a = foo();
    Dead Store Elimination III - Special Case

     Another dead stores when we rewrite a variable without using it when there is a special case.

     int a;
     a = b * c + d / f;     /* normal/default case */
     if (foo()) {
         a = c / f * 60;    /* special case! */
     }

     add condition check to make sure it only initialized once.

     int a;
     if (foo()) {
         a = c / f * 60;
     } else {
         a = b * c + d / f;
     }

    Machine-Code Level Optimization

     if (condition) {
       action
      }
      following code

     The comelier sometime would swap the code is the action is very unlikely going to happen(Block Rearrangement for Conditional Jumps). Like a error condition. Hopefully your code will run and have a very little chance of going in to the error condition check.

     "In order to do this, the compiler needs to know that the condition is almost always false. The programmer can state this using Compiler Intrinsics such as the GCC __builtin_expect function, or the compiler can discover this by code analysis or PGO."

    Instruction Selection

    If there is two ways to do the same thing. It will pick the one that cost less and faster. Picking the better one to do the job. Different platform has different kind of ways to do things and they might have different space and size. We have tell compiler which cpu to target.

    It is like the big(o) but every cpu has it's own best way to do something.

    Debugging with Optimizations Enabled

     Debuggers are good when the code still looks like what you wrote. compilers with high optimization will try to improve performance and update you code. At the end it might not look like the code you wrote. knowing what you are trying to do and what the code would do and using the right balance of the gcc flag will help you debug code more effectively.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Comments

    Popular posts from this blog

    Lab 1

    Project Stage 1

    Project Stage 2