# Using MPI with C This tutorial is adapted from the CU Boulder Research Computing documentation, which is also licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parallel programs enable users to fully utilize the multi-node structure of supercomputing clusters. Message Passing Interface (MPI) is a standard used to allow several different processors on a cluster to communicate with each other. In this tutorial we will be using the Intel C++ Compiler, GCC, IntelMPI, and OpenMPI to create a multiprocessor ‘hello world’ program in C++. This tutorial assumes the user has experience in both the Linux terminal and C++. __Helpful MPI Tutorial:__ - [http://mpitutorial.com/tutorials/](http://mpitutorial.com/tutorials/) ## Setup and “Hello, World” We must first load MPI into our environment. Begin by loading in your choice of C++ compiler and its corresponding MPI library. Use the following commands if using the GNU C++ compiler: (tabset-ref-mpi-c-comp)= `````{tab-set} :sync-group: tabset-mpi-c-comp ````{tab-item} GNU C++ Compiler :sync: mpi-c-comp-gnu ```bash module load gcc module load openmpi ``` ```` ````{tab-item} Intel C/C++ Compiler :sync: mpi-c-comp-gnu-intel ```bash module load intel module load impi ``` ```` ````` This should prepare your environment with all the necessary tools to compile and run your MPI code. Let’s now begin to construct our C++ file. In this tutorial, we will name our code file: `hello_world_mpi.cpp` Open `hello_world_mpi.cpp` and begin by including the C standard library `` and the MPI library `` , and by constructing the main function of the C++ code: ```c++ #include #include int main(int argc, char** argv){ return 0; } ``` Now let’s set up several MPI directives to parallelize our code. In this ‘Hello World’ tutorial we’ll be utilizing the following four directives: (tabset-ref-mpi-c-func)= ````{tab-set} :sync-group: tabset-mpi-c-func ```{tab-item} MPI_Init() :sync: mpi-c-func-init The `MPI_Init()` function initializes the MPI environment. It takes in the addresses of the C++ command line arguments `argc` and `argv`. ``` ```{tab-item} MPI_Comm_size() :sync: mpi-c-func-comm-size The `MPI_Comm_size()` function returns the total size of the environment via quantity of processes. The function takes in the MPI environment, and the memory address of an integer variable. ``` ```{tab-item} MPI_Comm_rank() :sync: mpi-c-func-comm-rank The `MPI_Comm_rank()` function returns the process ID of the processor that called the function. The function takes in the MPI environment, and the memory address of an integer variable. ``` ```{tab-item} MPI_Finalize() :sync: mpi-c-func-finalize The `MPI_Finalize()` function cleans up the MPI environment and ends MPI communications. ``` ```` These four directives should be enough to get our parallel 'Hello World' program running. We will begin by creating two variables, `process_Rank`, and `size_Of_Cluster`, to store an identifier for each of the parallel processes and the number of processes running in the cluster, respectively. We will also implement the `MPI_Init` function which will initialize the MPI communicator: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); return 0; } ``` Let's now obtain some information about our cluster of processors and print the information out for the user. We will use the functions `MPI_Comm_size()` and `MPI_Comm_rank()` to obtain the count of processes and the rank of a process, respectively: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); printf("Hello World from process %d of %d\n", process_Rank, size_Of_Cluster); return 0; } ``` Lastly let's close the environment using `MPI_Finalize()`: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); printf("Hello World from process %d of %d\n", process_Rank, size_Of_Cluster); MPI_Finalize(); return 0; } ``` Now the code is complete and ready to be compiled. Because this is an MPI program, we have to use a specialized compiler. Be sure to use the correct command based on which compiler you have loaded. (tabset-ref-mpi-c-compile)= `````{tab-set} :sync-group: tabset-mpi-c ````{tab-item} Open MPI :sync: mpi-c-openmpi ```bash mpic++ hello_world_mpi.cpp -o hello_world_mpi.exe ``` ```` ````{tab-item} Intel MPI :sync: mpi-c-intelmpi ```bash mpiicc hello_world_mpi.cpp -o hello_world_mpi.exe ``` ```` ````` This will produce an executable we can pass to the cluster as a job. In order to execute MPI compiled code, a special command must be used: ```bash mpirun -np 4 ./hello_world_mpi.exe ``` The flag `-np` specifies the number of processors that are to be utilized in execution of the program. In your job script, load the same compiler and OpenMPI choices you used above to compile the program, and run the job with Slurm to execute the application. Your job script should look something like this: (tabset-ref-mpi-c-batch)= `````{tab-set} :sync-group: tabset-mpi-c ````{tab-item} Open MPI :sync: mpi-c-openmpi ```bash #!/bin/bash #SBATCH -N 1 #SBATCH --ntasks 4 #SBATCH --job-name parallel_hello #SBATCH --partition cpu.std #SBATCH --qos normal #SBATCH --constraint ib #SBATCH --time 00:01:00 #SBATCH --output parallel_hello_world.out module purge module load gcc module load openmpi mpirun -np 4 ./hello_world_mpi.exe ``` ```` ````{tab-item} Intel MPI :sync: mpi-c-intelmpi ```bash #!/bin/bash #SBATCH -N 1 #SBATCH --ntasks 4 #SBATCH --job-name parallel_hello #SBATCH --partition cpu.std #SBATCH --qos normal #SBATCH --constraint ib #SBATCH --time 00:01:00 #SBATCH --output parallel_hello_world.out module purge module load intel module load impi mpirun -np 4 ./hello_world_mpi.exe ``` ```` ````` ```{note} The output file should look something like this: ```bash Hello World from process 3 of 4 Hello World from process 2 of 4 Hello World from process 1 of 4 Hello World from process 0 of 4 ``` Source: Dartmouth College Intro to MPI Guide ## MPI Barriers and Synchronization Like many other parallel programming utilities, synchronization is an essential tool in thread safety and ensuring certain sections of code are handled at certain points. `MPI_Barrier` is a process lock that holds each process at a certain line of code until all processes have reached that line in code. `MPI_Barrier` can be called as such: ```c++ MPI_Barrier(MPI_Comm comm); ``` To get a handle on barriers, let’s modify our "Hello World" program so that it prints out each process in order of thread ID. Starting with our "Hello World" code from the previous section, begin by nesting our print statement in a loop: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); for(int i = 0; i < size_Of_Cluster; i++){ printf("Hello World from process %d of %d\n", process_Rank, size_Of_Cluster); } MPI_Finalize(); return 0; } ``` Next, let’s implement a conditional statement in the loop to print only when the loop iteration matches the process rank. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); for(int i = 0; i < size_Of_Cluster; i++){ if(i == process_Rank){ printf("Hello World from process %d of %d\n", process_Rank, size_Of_Cluster); } } MPI_Finalize(); return 0; } ``` Lastly, implement the barrier function in the loop. This will ensure that all processes are synchronized when passing through the loop. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); for(int i = 0; i < size_Of_Cluster; i++){ if(i == process_Rank){ printf("Hello World from process %d of %d\n", process_Rank, size_Of_Cluster); } MPI_Barrier(MPI_COMM_WORLD); } MPI_Finalize(); return 0; } ``` Compiling and running this code will result in this output: ``` Hello World from process 0 of 4 Hello World from process 1 of 4 Hello World from process 2 of 4 Hello World from process 3 of 4 ``` ## Message Passing Message passing is the primary utility in the MPI application interface that allows for processes to communicate with each other. In this tutorial, we will learn the basics of message passing between 2 processes. Message passing in MPI is handled by the corresponding functions and their arguments: ```c++ MPI_Send(void* message, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm, communicator); MPI_Recv(void* data, int count, MPI_Datatype datatype, int from, int tag, MPI_Comm comm, MPI_Status* status); ``` The arguments are as follows: *MPI_Send* ```c++ void* message; //Address for the message you are sending. int count; //Number of elements being sent through the address. MPI_Datatype datatype; //The MPI specific data type being passed through the address. int dest; //Rank of destination process. int tag; //Message tag. MPI_Comm comm; //The MPI Communicator handle. ``` *MPI_Recv* ```c++ void* message; //Address to the message you are receiving. int count; //Number of elements being sent through the address. MPI_Datatype datatype; //The MPI specific data type being passed through the address. int from; //Process rank of sending process. int tag; //Message tag. MPI_Comm comm; //The MPI Communicator handle. MPI_Status* status; //Status object. ``` Let’s implement message passing in an example: ### Example We will create a two-process program that will pass the number 42 from one process to another. We will use our “Hello World” program as a starting point for this program. Let’s begin by creating a variable to store some information. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster, message_Item; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); MPI_Finalize(); return 0; } ``` Now create `if` and `else if` conditionals that specify appropriate process to call `MPI_Send()` and `MPI_Recv()` functions. In this example we want process 1 to send out a message containing the integer 42 to process 2. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Cluster, message_Item; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); if(process_Rank == 0){ message_Item = 42; printf("Sending message containing: %d\n", message_Item); } else if(process_Rank == 1){ printf("Received message containing: %d\n", message_Item); } MPI_Finalize(); return 0; } ``` Lastly we must call `MPI_Send()` and `MPI_Recv()`. We will pass the following parameters into the functions: ```c++ MPI_Send( &message_Item, //Address of the message we are sending. 1, //Number of elements handled by that address. MPI_INT, //MPI_TYPE of the message we are sending. 1, //Rank of receiving process 1, //Message Tag MPI_COMM_WORLD //MPI Communicator ); MPI_Recv( &message_Item, //Address of the message we are receiving. 1, //Number of elements handled by that address. MPI_INT, //MPI_TYPE of the message we are sending. 0, //Rank of sending process 1, //Message Tag MPI_COMM_WORLD //MPI Communicator MPI_STATUS_IGNORE //MPI Status Object ); ``` Lets implement these functions in our code: ```c++ #include #include int main(int argc, char** argv) { int process_Rank, size_Of_Cluster, message_Item; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Cluster); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); if(process_Rank == 0){ message_Item = 42; MPI_Send(&message_Item, 1, MPI_INT, 1, 1, MPI_COMM_WORLD); printf("Message Sent: %d\n", message_Item); } else if(process_Rank == 1){ MPI_Recv(&message_Item, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("Message Received: %d\n", message_Item); } MPI_Finalize(); return 0; } ``` Compiling and running our code with 2 processes will result in the following output: ``` Message Sent: 42 Message Received: 42 ``` ## Group Operators: Scatter and Gather Group operators are very useful for MPI. They allow for swaths of data to be distributed from a root process to all other available processes, or data from all processes can be collected at one process. These operators can eliminate the need for a surprising amount of boilerplate code via the use of two functions: __MPI_Scatter__: ```c++ void* send_Var; //Address of the variable that will be scattered. int send_Count; //Number of elements that will be scattered. MPI_Datatype send_Type; //MPI Datatype of the data that is scattered. void* recv_Var; //Address of the variable that will store the scattered data. int recv_Count; //Number of data elements that will be received per process. MPI_Datatype recv_Type; //MPI Datatype of the data that will be received. int root_Process; //The rank of the process that will scatter the information. MPI_Comm comm; //The MPI_Communicator. ``` __MPI_Gather__: ```c++ void* send_Var; //Address of the variable that will be sent. int send_Count; //Number of data elements that will sent . MPI_Datatype send_Type; //MPI Datatype of the data that is sent. void* recv_Var; //Address of the variable that will store the received data. int recv_Count; //Number of data elements per process that will be received. MPI_Datatype recv_Type; //MPI Datatype of the data that will be received. int root_Process; //The rank of the process rank that will gather the information. MPI_Comm comm; //The MPI_Communicator. ``` In order to get a better grasp on these functions, let’s go ahead and create a program that will utilize the scatter function. Note that the gather function (not shown in the example) works similarly, and is essentially the converse of the scatter function. Further examples which utilize the gather function can be found in the MPI tutorial linked at the beginning of this document. ### Example We will create a program that scatters one element of a data array to each process. Specifically, this code will scatter the four elements of an array to four different processes. We will start with a basic C++ main function along with variables to store process rank and number of processes. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Comm; return 0; } ``` Now let’s set up the MPI environment using `MPI_Init` , `MPI_Comm_size` , `MPI_Comm_rank` , and `MPI_Finaize`: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Comm; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Comm); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); MPI_Finalize(); return 0; } ``` Next let’s generate an array named `distro_Array` to store four numbers. We will also create a variable called `scattered_Data` that we shall scatter the data to. ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Comm; int distro_Array[4] = {39, 72, 129, 42}; int scattered_Data; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Comm); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); MPI_Finalize(); return 0; } ``` Now we will begin the use of group operators. We will use the operator scatter to distribute `distro_Array` into `scattered_Data` . Let’s take a look at the parameters we will use in this function: ```c++ MPI_Scatter( &distro_Array, //Address of array we are scattering from. 1, //Number of items we are sending each processor MPI_INT, //MPI Datatype of scattering array. &scattered_Data, //Address of array we are receiving scattered data. 1, //Amount of data each process will receive. MPI_INT, //MPI Datatype of receiver array. 0, //Process ID that will distribute the data. MPI_COMM_WORLD //MPI Communicator. ) ``` Let’s see this implemented in code. We will also write a print statement following the scatter call: ```c++ #include #include int main(int argc, char** argv){ int process_Rank, size_Of_Comm; int distro_Array[4] = {39, 72, 129, 42}; int scattered_Data; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_Of_Comm); MPI_Comm_rank(MPI_COMM_WORLD, &process_Rank); MPI_Scatter(&distro_Array, 1, MPI_INT, &scattered_Data, 1, MPI_INT, 0, MPI_COMM_WORLD); printf("Process has received: %d \n", scattered_Data); MPI_Finalize(); return 0; } ``` Running this code will print out the four numbers in the distro array as four separate numbers each from different processors (note the order of ranks isn’t necessarily sequential): ``` Process has received: 39 Process has received: 72 Process has received: 129 Process has received: 42 ```