Reduction Operations

Example 1.5 Numerical Integration Using MPI Reduction Operation

So far, we have used point-to-point blocking and nonblocking communication routines to perform numerical integration. In this example, we turn our focus to collective communication routines. Unlike point-to-point communications with which a message is passed between one processor and another, a collective communication routine performs a one (processor) to all (processors) , all-to-one, or all-to-all communications. Broadcasting is a typical example of a one-to-all communication while a gather operation is representative of an all-to-one communication.

Two collective communication routines are introduced in this example. First, we make the program more general by allowing the number of intervals, n, for each sub-range to be defined through run-time input. To avoid repetition, it is read on the master processor only. To make n available on all processors, it must then be copied to all processes. This can be accomplished with a broadcast operation, MPI_Bcast. In previous examples, after the local integral sums have been computed by all participating processes, they are individually sent to the master who sums them to obtain the final sum. In this example, a collective reduction routine, MPI_Reduce, will be used to perform the same task. As you will see, they are more compact and convenient to use than point-to-point communication routines, they are also expected to be more efficient and less prone to errors.

Example 1.5 Fortran Code

```      Program Example1_5
c##################################################################################
c#                                                                                #
c# This is an MPI example on parallel integration to demonstrate the use of:      #
c#                                                                                #
c# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize                         #
c# * MPI_Reduce                                                                   #
c# * MPI_SUM                                                                      #
c#                                                                                #
c# Scientific Computing and Visualization                                         #
c# Boston University                                                              #
c# 1998                                                                           #
c#                                                                                #
c##################################################################################
implicit none
integer n, p, i, j, proc, ierr, master, myid, tag, comm
real h, a, b, integral, pi, ai, my_int, integral_sum
include "mpif.h"  ! brings in pre-defined MPI constants, ...
integer status(MPI_STATUS_SIZE)  ! size defined in mpif.h
data master/0/    ! processor 0 collects integral sums from other processors

comm = MPI_COMM_WORLD
call MPI_Init(ierr)                        ! starts MPI
call MPI_Comm_rank(comm, myid, ierr)      ! get current proc ID
call MPI_Comm_size(comm, p, ierr)          ! get number of procs

pi = acos(-1.0)   !  = 3.14159...
a = 0.0           ! lower limit of integration
b = pi/2.         ! upper limit of integration
tag = 123         ! message tag

if(myid .eq. master) then
print *,'The requested number of processors =',p
print *,'enter number of increments within each process'
endif
c**Broadcast "n" to all processes defined by "comm"
call MPI_Bcast(         ! Broadcast "n" to all procs
&    n, 1, MPI_INTEGER,       ! Buffer, size, data type
&    master, comm, ierr)       ! source of message

h = (b-a)/n/p     ! length of increment
ai = a + myid*n*h ! lower limit of integration for partition myid
my_int = integral(ai, h, n)

write(*,"('Process ',i2,' has the partial sum of',f10.6)")
&          myid,my_int

call MPI_Reduce(             ! a collective reduction operation
&      my_int,      ! message to send
&      integral_sum, 1, MPI_REAL,     ! Triplet of receive buffer, size, data type
&      MPI_SUM,      ! Reduction operator
&      master,
&      comm, ierr)

if(myid .eq. master) then
print *,'The Integral =',integral_sum
endif

call MPI_Finalize(ierr)                             ! let MPI finish up ...

end
real function integral(ai, h, n)
implicit none
integer n, j
real h, ai, aij

integral = 0.0                ! initialize integral
do j=0,n-1                    ! sum integrals
aij = ai +(j+0.5)*h         ! abscissa mid-point
integral = integral + cos(aij)*h
enddo

return
end
```

Example 1.5 C code

```#include <mpi.h>
#include <math.h>
#include <stdio.h>
float fct(float x)
{
return cos(x);
}
/* Prototype */
float integral(float ai, float h, int n);
int main(int argc, char* argv[])
{
/*###############################################################################
#                                                                              #
# This is an MPI example on parallel integration to demonstrate the use of:    #
#                                                                              #
# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize                       #
# * MPI_Reduce                                                                 #
# * MPI_SUM                                                                    #
#                                                                              #
# Scientific Computing and Visualization                                       #
# Boston University                                                            #
# 1998                                                                         #
#                                                                              #
###############################################################################*/

int n, p, myid, tag, proc, ierr;
float h, integral_sum, a, b, ai, pi, my_int;
char line[10];
int master = 0;  /* processor performing total sum */
MPI_Comm comm;

comm = MPI_COMM_WORLD;
ierr = MPI_Init(&argc,&argv);            /* starts MPI */
MPI_Comm_rank(comm, &myid);             /* get current process id */
MPI_Comm_size(comm, &p);                /* get number of processes */

pi = acos(-1.0);  /* = 3.14159... */
a = 0.;           /* lower limit of integration */
b = pi*1./2.;     /* upper limit of integration */

if(myid == master) {
printf("The requested number of processors is %dn",p);
printf("Enter number of increments within each processn");
(void) fgets(line, sizeof(line), stdin);
(void) sscanf(line, "%d", &);
}
/* Broadcast "n" to all processes */
MPI_Bcast(
&n, 1, MPI_INT,
master, comm);        /* message source */

h = (b-a)/n/p;    /* length of increment */
ai = a + myid*n*h;  /* lower limit of integration for partition myid */
my_int = integral(ai, h, n);   /* 0<=myid<=p-1 */

printf("Process %d has the partial result of %fn", myid,my_int);

MPI_Reduce(
&my_int,      /* send buffer */
&integral_sum, 1, MPI_FLOAT,    /* triplet of receive buffer, size, data type */
MPI_SUM,    /* the reduction operation is summation */
master, comm);

if(myid == 0) {
printf("The result =%fn",integral_sum);
}
MPI_Finalize();                       /* let MPI finish up ... */
}
float integral(float ai, float h, int n)
{
int j;
float aij, integ;

integ = 0.0;                 /* initialize */
for (j=0;j<j++) {          /* sum integrals */
aij = ai + (j+0.5)*h;      /* mid-point */
integ += cos(aij)*h;
}
return integ;
}
```

Discussions

All collective communication routines are implemented so that the call to the routine must be invoked on all processes. For MPI_Bcast, the source of the broadcast is master. On this process, MPI_Bcast acts as a send. On all other processes where myid is not master, it automatically knows that it must act as a receive.

Once my_int is computed on all processes, calling the collective reduction subroutine MPI_Reduce will take care of collecting my_int from all processes followed by a reduction (summation) operation. Reduction operation can be one of two types: pre-defined or user-defined. Pre-defined reduction operations are built in to MPI and include, for example : MPI_SUM for summing; MPI_MAX and MPI_MIN for finding extremums; MPI_MAXLOC and MPI_MINLOC for finding extremums and their corresponding locations. In order to use MPI_Reduce, the reduction operation must satisfy the associative rule. If it also satisfies the commutative rule, the reduction operation may gain in efficiency. For example, summation, or MPI_SUM, is a reduction operator that is both associative and commutative. Subtraction, which is not on the list of pre-defined reduction operations, is neither associative nor commutative. User-defined reduction operations must also satisfy the associative rule and optionally the commutative rule. For further details, see MPI: The Complete Reference.