Writing source/1_serial_sum.cpp
Parallel C++
In the post, I will explore various methods to achieve parallelism in C++. I will keep updating this post to include more methods as I learn about them.
Introduction
Let’s start with a very simple example that adds all the numbers in a vector. I will use the C++’s standard template library (STL), and start with a serial implementation. I will use the time command to measure the execution time of the program.
I create a vector of ones with \(2^{30}=1,073,741,824\) (approximately a billion) elements. For context, each integer takes 4 bytes, so this vector will take approximately 4GB of memory. I add them, serially, as follows:
Let’s break down the code:
- Line 2-3: I include the necessary headers.
<vector>is for using the vector container and<numeric>is for thestd::reducefunction. - Line 6: I define a vector of size \(2^{30}\) filled with ones.
- Line 7: I use
std::reduceto sum the elements of the vector. I pass to the function the beginning and end iterators of the vector, along with an init value of 0.
I compile and run the program below, measuring the execution time with the time command:
real 8.36
user 7.14
sys 1.93
The real time is the actual elapsed time, while the user time is the CPU time spent in user mode and system time is the CPU time spent in kernel mode 1.
1 For more details, see https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
Next, I will introduce parallelism by passing an execution policy to the std::reduce function, as follows:
Writing source/1_parallel_sum.cpp
I have changed two lines in the code:
- Line 5: I added the
<execution>header. - Line 8: I added the
std::execution::parpolicy to thestd::reducefunction.
Note that <execution> requires a C++17 compliant compiler, and in GCC, I need to link with the -ltbb flag to use Intel’s Threading Building Blocks (TBB) for parallelism. For reference, I check which compilers support parallel algorithms and execution policies on the webpage Compiler support for C++17. Here is line relating to <execution> on GCC:

<execution> compiler supportIn the Linux environment, I install TBB using the package manager, with the following command:
sudo apt-get install libtbb-dev
Now, I compile and run the parallel version of the program:
real 2.32
user 12.27
sys 1.27
The execution time is significantly reduced compared to the serial version, demonstrating the benefits of parallelism for large datasets. Note that parallelism has overhead costs, so for small datasets, the serial version may perform better. It also has many pitfalls, such as race conditions, which can lead to incorrect results if not handled properly. I will explore more about these issues in future updates.