This page imported from: /afs/bu.edu/cwis/webuser/web/s/c/scv/documentation/tutorials/MPI/alliance/apply/solvers/sor_parallel.html

*Parallel Successive Over Relaxation Red-black Scheme*

###

A parallel implementation of the Successive Over Relaxation Red-black Scheme

(based on a Parallel Jacobi Implementation)

as applied to the Laplace equation is included below. Note that:

A parallel implementation of the Successive Over Relaxation Red-black Scheme

(based on a Parallel Jacobi Implementation)

as applied to the Laplace equation is included below. Note that:

- F90 is used.
- System size, m, is determined at run time.
- MPI cartesian topology is used.
- Boundary conditions are handled by subroutine bc.
- The parallel Jacobi code uses subroutine neighbors to provide

adjoining process numbers. In this implementation, since cartesian

topology is used, MPI_Cart_shift is used to provide the same

information. - A red-black scheme is used to speed in-processor convergence.
- Subroutine update_bc_2 updates the blue cells of current and adjoining

processes simultaneously by MPI routine that

pairs send and receive, MPI_Sendrecv, for subsequent iteration. - Subroutine update_bc_1 may be used in place of update_bc_2 as

an alternative message passing method - Subroutine printmesh may be used to print local solution for

tiny cases (like 4×4) - Pointer arrays c, n, e, w, and s point to various parts of the

solution space, u. They are used to avoid unnecessary memory usage

as well as to improve readability. - MPI_Allreduce is used to collect global error from

all participating processes to determine whether further iteration

is required. This is somewhat costly to do

in every iteration. Using the fact that the number of iterations is

proportional to the grid size,*m*, and the assumption that

we focus on*m*in the hundredths, MPI_Allreduce is called

once every 100 iterations. (In the interests of clarity, we

opt to use this simplified criteria). As we said earlier in the

parallel Jacobi implementation, there is a small price to pay by

calling MPI_Allreduce infrequently. If the solution error threshold

is reached inside the 100 iterations, the solution marches on unabated

until the 100 count is reached and hence unnecessary computation is

performed. However, with slight modifications to the testing criteia,

wasteful computing cycles may be minimized. At least for this problem,

the savings in MPI_Allreduce calls far outweigh the penalty. - The effect of MPI_Allreduce call is significantly less noticeable

on the SGI Origin2000 shared-memory multiprocessor than on, say, a

Linux Pentium cluster due to better communications on the Origin. - This scheme is considerably more rapid in convergent rate than the

Jacobi Scheme.

```
PROGRAM sor_cart
USE types_module
USE sor_module
TYPE (redblack) :: c, n, e, w, s
INTEGER, PARAMETER :: period=0, ndim=1
INTEGER :: grid_comm, me, iv, coord, dims
LOGICAL, PARAMETER :: reorder = .true.
CALL MPI_Init(ierr) ! starts MPI
CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr) ! get current process id
CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr) ! get # procs from env or
! command line
if(k == 0) then
write(*,*)'Enter matrix size, m :'
read(*,*)m
endif
CALL MPI_Bcast(m, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)
rhoj = 1.0d0 - pi*pi*0.5/(m+2)**2
rhojsq = rhoj*rhoj
mp = m/p
ALLOCATE (vnew(m/2,mp/2), v(0:m+1,0:mp+1))
CALL cpu_time(start_time) ! start timer, measured in seconds
! create 1D cartesian topology for matrix
dims = p
CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, &
period, reorder, grid_comm, ierr)
CALL MPI_Comm_rank(grid_comm, me, ierr)
CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr)
iv = coord
CALL bc(v, m, mp, iv, p) ! set up boundary conditions
CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr)
c = news(v, m, mp, 0, 0) ! i+0,j+0 center
n = news(v, m, mp, 0, 1) ! i+0,j+1 north
e = news(v, m, mp, 1, 0) ! i+1,j+0 east
w = news(v, m, mp,-1, 0) ! i-1,j+0 west
s = news(v, m, mp, 0,-1) ! i+0,j-1 south
omega = 1.0d0
CALL update_u(c%red, n%red, e%red, w%red, s%red, &
vnew, m, mp, omega, delr) ! update red
CALL update_bc_2( v, m, mp, iv, below, above)
omega = 1.0d0/(1.0d0 - 0.50d0*rhojsq)
CALL update_u(c%black, n%black, e%black, w%black, s%black, &
vnew, m, mp, omega, delb) ! update black
CALL update_bc_2( v, m, mp, iv, below, above)
DO WHILE (gdel > tol)
iter = iter + 1 ! increment iteration counter
omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega)
CALL update_u(c%red, n%red, e%red, w%red, s%red, &
vnew, m, mp, omega, delr) ! update red
CALL update_bc_2( v, m, mp, iv, below, above)
omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega)
CALL update_u(c%black, n%black, e%black, w%black, s%black, &
vnew, m, mp, omega, delb) ! update black
del = (delr + delb)*4.d0
IF(MOD(iter,100)==0) THEN
del = (delr + delb)*4.d0
CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, MPI_MAX, &
MPI_COMM_WORLD, ierr ) ! find global max error
IF(k == 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega
ENDIF
ENDDO
CALL cpu_time(end_time) ! stop timer
IF(k == 0) THEN
PRINT *,'#######################################'
PRINT *,'Total cpu time =',end_time - start_time,' x',p
PRINT *,'Stopped at iteration =',iter
PRINT *,'The maximum error =',del
write(40,"(3i5)")m,mp,p
ENDIF
WRITE(41+k,"(6d13.4)")v
DEALLOCATE (vnew, v)
CALL MPI_Finalize(ierr)
END PROGRAM sor_cart
```