Predication

This article explores why common if statements can significantly slow down execution time and how to counteract this. To better understand the origins of this phenomenon, a basic understanding of branch prediction is required.

To improve the execution speed of code, the CPU uses what is known as pipelining to maximize pipeline utilization. However, this effect can be compromised by so-called pipeline flushes. To understand this issue in more detail, here is a brief example: The following program counts the number of numbers greater than 50. Since the numbers are generated using a random function and modulo 101, they are always in the range of 0 and 100. For this reason, the counter in the if condition should be increased in about half of the loop passes.

C
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main()
{
  int counter = 0;
  for (int i = 0; i < 100000; i++)
  {
    if (rand() % 101 > 50)
    {
      counter++;
    }
  }
  printf("%d", counter);
  return 0;
}

Why is this slow now? The CPU tries to predict future program lines with the help of the so-called Branch Prediction and to execute them ahead of time. Since the next random number is not yet known, the CPU has to guess whether the counter will be incremented or not. Based on this decision, new instructions are fed to the pipeline. However, if the prediction is wrong, a pipeline flush is triggered, causing the pipeline to take a few clocks to clear all previous instructions. Only then can the correct instruction be executed. However, since this is true only in 50% of the cases, practically no runtime gain can be achieved by Branch Prediction. On the contrary, the execution time is even slowed down by the flushes. The 50% is also referred to as selectivity.

How to prevent this behavior?

The best way to avoid wrong branch predictions is not to use branches in the first place. For this purpose, the boolean if-condition can be cast into an integer. This has either the value 0 or 1, which can be added directly to the variable counter.

C
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main()
{
  int counter = 0;
  for (int i = 0; i < 100000; i++)
  {
    counter += (int)(rand() % 101 > 50);
  }
  printf("%d", counter);
  return 0;
}

In this way, the execution speed is independent of the selectivity.

Comparison of runtimes from Branched to Predicated

A comparison of the runtimes clearly shows how selectivity affects performance. A selectivity of 50% clearly shows that branch prediction is the most ineffective. An implementation with predication, on the other hand, remains completely unaffected by selectivity.

One Reply to “Predication”

Leave a Reply

Your email address will not be published. Required fields are marked *