#### Example 1.5 Numerical Integration Using MPI Reduction Operation

So far, we have used point-to-point blocking and nonblocking communication routines to perform numerical integration. In this example, we turn our focus to collective communication routines. Unlike point-to-point communications with which a message is passed between one processor and another, a collective communication routine performs a one (processor) to all (processors) , all-to-one, or all-to-all communications. Broadcasting is a typical example of a one-to-all communication while a gather operation is representative of an all-to-one communication.

Two collective communication routines are introduced in this example. First, we make the program more general by allowing the number of intervals, *n*, for each sub-range to be defined through run-time input. To avoid repetition, it is read on the *master* processor only. To make *n* available on all processors, it must then be copied to all processes. This can be accomplished with a broadcast operation, **MPI_Bcast**. In previous examples, after the local integral sums have been computed by all participating processes, they are individually sent to the *master* who sums them to obtain the final sum. In this example, a collective reduction routine, **MPI_Reduce**, will be used to perform the same task. As you will see, they are more compact and convenient to use than point-to-point communication routines, they are also expected to be more efficient and less prone to errors.

#### Example 1.5 Fortran Code

```
Program Example1_5
c#######################################################################
c# #
c# This is an MPI example on parallel integration to demonstrate the #
c# use of: #
c# #
c# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize #
c# * MPI_Reduce #
c# * MPI_SUM #
c# #
c# Dr. Kadin Tseng #
c# Scientific Computing and Visualization #
c# Boston University #
c# 1998 #
c# #
c#######################################################################
implicit none
integer n, p, i, j, proc, ierr, master, myid, tag, comm
real h, a, b, integral, pi, ai, my_int, integral_sum
include "mpif.h" ! brings in pre-defined MPI constants, ...
integer status(MPI_STATUS_SIZE) ! size defined in mpif.h
data master/0/ ! processor 0 collects integral sums from other processors
comm =
```**MPI_COMM_WORLD**
call **MPI_Init**(ierr) ! starts MPI
call **MPI_Comm_rank**(comm, myid, ierr) ! get current proc ID
call **MPI_Comm_size**(comm, p, ierr) ! get number of procs
pi = acos(-1.0) ! = 3.14159...
a = 0.0 ! lower limit of integration
b = pi/2. ! upper limit of integration
tag = 123 ! message tag
if(myid .eq. master) then
print *,'The requested number of processors =',p
print *,'enter number of increments within each process'
read(*,*)n
endif
c**Broadcast "n" to all processes defined by "comm"
call MPI_Bcast( ! Broadcast "n" to all procs
& n, 1, MPI_INTEGER, ! Buffer, size, data type
& **master**, comm, ierr) ! source of message
h = (b-a)/n/p ! length of increment
ai = a + myid*n*h ! lower limit of integration for partition myid
my_int = integral(ai, h, n)
write(*,"('Process ',i2,' has the partial sum of',f10.6)")
& myid,my_int
call **MPI_Reduce**( ! a collective reduction operation
& my_int, ! message to send
& **integral_sum**, 1, MPI_REAL, ! Triplet of receive buffer, size, data type
& **MPI_SUM**, ! Reduction operator
& **master**,
& comm, ierr)
if(myid .eq. master) then
print *,'The Integral =',integral_sum
endif
call **MPI_Finalize**(ierr) ! let MPI finish up ...
end
real function integral(ai, h, n)
implicit none
integer n, j
real h, ai, aij
integral = 0.0 ! initialize integral
do j=0,n-1 ! sum integrals
aij = ai +(j+0.5)*h ! abscissa mid-point
integral = integral + cos(aij)*h
enddo
return
end

#### Example 1.5 C code

```
#include <mpi.h>
#include <math.h>
#include <stdio.h>
float fct(float x)
{
return cos(x);
}
/* Prototype */
float integral(float ai, float h, int n);
int main(int argc, char* argv[])
{
/*######################################################################
# #
# This is an MPI example on parallel integration to demonstrate the #
# use of: #
# #
# * MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize #
# * MPI_Reduce #
# * MPI_SUM #
# #
# Dr. Kadin Tseng #
# Scientific Computing and Visualization #
# Boston University #
# 1998 #
# #
#####################################################################*/
int n, p, myid, tag, proc, ierr;
float h, integral_sum, a, b, ai, pi, my_int;
char line[10];
int master = 0; /* processor performing total sum */
MPI_Comm comm;
comm =
```**MPI_COMM_WORLD**;
ierr = **MPI_Init**(&argc,&argv); /* starts MPI */
**MPI_Comm_rank**(comm, &myid); /* get current process id */
**MPI_Comm_size**(comm, &p); /* get number of processes */
pi = acos(-1.0); /* = 3.14159... */
a = 0.; /* lower limit of integration */
b = pi*1./2.; /* upper limit of integration */
if(myid == master) {
printf("The requested number of processors is %dn",p);
printf("Enter number of increments within each processn");
(void) fgets(line, sizeof(line), stdin);
(void) sscanf(line, "%d", &);
}
/* Broadcast "n" to all processes */
**MPI_Bcast**(
&**n, 1, MPI_INT**,
**master**, comm); /* message source */
h = (b-a)/n/p; /* length of increment */
ai = a + myid*n*h; /* lower limit of integration for partition myid */
my_int = integral(ai, h, n); /* 0<=myid<=p-1 */
printf("Process %d has the partial result of %fn", myid,my_int);
**MPI_Reduce**(
&my_int, /* send buffer */
&**integral_sum**, 1, MPI_FLOAT, /* triplet of receive buffer, size, data type */
**MPI_SUM**, /* the reduction operation is summation */
**master**, comm);
if(myid == 0) {
printf("The result =%fn",integral_sum);
}
**MPI_Finalize**(); /* let MPI finish up ... */
}
float integral(float ai, float h, int n)
{
int j;
float aij, integ;
integ = 0.0; /* initialize */
for (j=0;j<j++) { /* sum integrals */
aij = ai + (j+0.5)*h; /* mid-point */
integ += cos(aij)*h;
}
return integ;
}

#### Discussions

All collective communication routines are implemented so that the call to the routine must be invoked on all processes. For **MPI_Bcast**, the source of the broadcast is *master*. On this process, **MPI_Bcast** acts as a send. On all other processes where *myid* is not *master*, it automatically knows that it must act as a receive.

Once *my_int* is computed on all processes, calling the collective reduction subroutine **MPI_Reduce** will take care of collecting *my_int* from all processes followed by a reduction (summation) operation. Reduction operation can be one of two types: pre-defined or user-defined. Pre-defined reduction operations are built in to MPI and include, for example : **MPI_SUM** for summing; **MPI_MAX** and **MPI_MIN** for finding extremums; **MPI_MAXLOC** and **MPI_MINLOC** for finding extremums and their corresponding locations. In order to use **MPI_Reduce**, the reduction operation must satisfy the associative rule. If it also satisfies the commutative rule, the reduction operation may gain in efficiency. For example, summation, or **MPI_SUM**, is a reduction operator that is both associative and commutative. Subtraction, which is not on the list of pre-defined reduction operations, is neither associative nor commutative. User-defined reduction operations must also satisfy the associative rule and optionally the commutative rule. For further details, see MPI: The Complete Reference.

Example 1 | Example 1.1 | Example 1.2 | Example 1.3 | Example 1.4 | Example 1.5