This page imported from: /afs/bu.edu/cwis/webuser/web/s/c/scv/documentation/tutorials/MPI/alliance/apply/solvers/sor_parallel.html

Parallel Successive Over Relaxation Red-black Scheme

Parallel Successive Over Relaxation Red-black Scheme


A parallel implementation of the Successive Over Relaxation Red-black Scheme
(based on a Parallel Jacobi Implementation)
as applied to the Laplace equation is included below. Note that:

  • F90 is used.
  • System size, m, is determined at run time.
  • MPI cartesian topology is used.
  • Boundary conditions are handled by subroutine bc.
  • The parallel Jacobi code uses subroutine neighbors to provide
    adjoining process numbers. In this implementation, since cartesian
    topology is used, MPI_Cart_shift is used to provide the same
    information.

  • A red-black scheme is used to speed in-processor convergence.
  • Subroutine update_bc_2 updates the blue cells of current and adjoining
    processes simultaneously by MPI routine that
    pairs send and receive, MPI_Sendrecv, for subsequent iteration.

  • Subroutine update_bc_1 may be used in place of update_bc_2 as
    an alternative message passing method

  • Subroutine printmesh may be used to print local solution for
    tiny cases (like 4×4)

  • Pointer arrays c, n, e, w, and s point to various parts of the
    solution space, u. They are used to avoid unnecessary memory usage
    as well as to improve readability.

  • MPI_Allreduce is used to collect global error from
    all participating processes to determine whether further iteration
    is required. This is somewhat costly to do
    in every iteration. Using the fact that the number of iterations is
    proportional to the grid size, m, and the assumption that
    we focus on m in the hundredths, MPI_Allreduce is called
    once every 100 iterations. (In the interests of clarity, we
    opt to use this simplified criteria). As we said earlier in the
    parallel Jacobi implementation, there is a small price to pay by
    calling MPI_Allreduce infrequently. If the solution error threshold
    is reached inside the 100 iterations, the solution marches on unabated
    until the 100 count is reached and hence unnecessary computation is
    performed. However, with slight modifications to the testing criteia,
    wasteful computing cycles may be minimized. At least for this problem,
    the savings in MPI_Allreduce calls far outweigh the penalty.

  • The effect of MPI_Allreduce call is significantly less noticeable
    on the SGI Origin2000 shared-memory multiprocessor than on, say, a
    Linux Pentium cluster due to better communications on the Origin.

  • This scheme is considerably more rapid in convergent rate than the
    Jacobi Scheme.



PROGRAM sor_cart
  USE types_module
  USE sor_module
  TYPE (redblack) :: c, n, e, w, s
  INTEGER, PARAMETER :: period=0, ndim=1
  INTEGER :: grid_comm, me, iv, coord, dims
  LOGICAL, PARAMETER :: reorder = .true.
  
  CALL MPI_Init(ierr)                            ! starts MPI
  CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr)    ! get current process id
  CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr)    ! get # procs from env or
                                                 ! command line
  if(k == 0) then
    write(*,*)'Enter matrix size, m :'
    read(*,*)m
  endif
  CALL MPI_Bcast(m, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)
  
  rhoj = 1.0d0 - pi*pi*0.5/(m+2)**2
  rhojsq = rhoj*rhoj
  mp = m/p
  ALLOCATE (vnew(m/2,mp/2), v(0:m+1,0:mp+1))
  
  CALL cpu_time(start_time)    ! start timer, measured in seconds
  
  ! create 1D cartesian topology for matrix
  dims = p
  CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims,  &
            period, reorder, grid_comm, ierr)
  CALL MPI_Comm_rank(grid_comm, me, ierr)
  CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr)
  iv = coord
  CALL bc(v, m, mp, iv, p)     ! set up boundary conditions
  
  CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr)
  
  c = news(v, m, mp, 0, 0)         ! i+0,j+0  center
  n = news(v, m, mp, 0, 1)         ! i+0,j+1  north
  e = news(v, m, mp, 1, 0)         ! i+1,j+0  east
  w = news(v, m, mp,-1, 0)         ! i-1,j+0  west
  s = news(v, m, mp, 0,-1)         ! i+0,j-1  south
  
  omega = 1.0d0
  CALL update_u(c%red, n%red, e%red, w%red, s%red, &
                vnew, m, mp, omega, delr)   ! update red
  CALL update_bc_2( v, m, mp, iv, below, above)
  omega = 1.0d0/(1.0d0 - 0.50d0*rhojsq)
  CALL update_u(c%black, n%black, e%black, w%black, s%black, &
                vnew, m, mp, omega, delb)   ! update black
  CALL update_bc_2( v, m, mp, iv, below, above)
  DO WHILE (gdel > tol)
    iter = iter + 1    ! increment iteration counter
    omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega)
    CALL update_u(c%red, n%red, e%red, w%red, s%red, &
                vnew, m, mp, omega, delr)   ! update red
    CALL update_bc_2( v, m, mp, iv, below, above)
    omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega)
    CALL update_u(c%black, n%black, e%black, w%black, s%black, &
                vnew, m, mp, omega, delb)   ! update black
    del = (delr + delb)*4.d0
    IF(MOD(iter,100)==0) THEN
      del = (delr + delb)*4.d0
      CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, MPI_MAX,   &
             MPI_COMM_WORLD, ierr )  ! find global max error
      IF(k == 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega
    ENDIF
  ENDDO
  
  CALL cpu_time(end_time)          ! stop timer
  
  IF(k == 0) THEN
    PRINT *,'#######################################'
    PRINT *,'Total cpu time =',end_time - start_time,' x',p
    PRINT *,'Stopped at iteration =',iter
    PRINT *,'The maximum error =',del
    write(40,"(3i5)")m,mp,p
  ENDIF
  WRITE(41+k,"(6d13.4)")v
  DEALLOCATE (vnew, v)
  
  CALL MPI_Finalize(ierr)
  
END PROGRAM sor_cart