This page imported from: /afs/bu.edu/cwis/webuser/web/s/c/scv/documentation/tutorials/MATLAB/parallel/slide2.html
Parallel Processing with MATLAB
There exist many free and commercial software packages devoted to
parallel Matlab computations. We will discuss three non-commercial packages
that are available on the SCV Linux Cluster in detail. We will also describe
two additional commercial packages only briefly below since they are not
available on SCV. However, because of their potential prominence in the field
of parallel Matlab computing for various reasons, they are provided here for
informational purposes.
In the following, we will first review some parallel computing terminologies
that are used in this page :
- SPMD
This is the acronym for Single Program Multiple Data.
It is a parallel programming method that runs the same program
with different data on multiprocessors. All packages discussed here
are based on this method. - Rank number
A rank number is a processor’s identification number.
In a program, the user typically use this number
to determine what to do. This can also be used to identify what
data to generate or what data to use. - Task Parallelism
This refers to the type of problems that are highly parallel such that
each processor performs its own computation with little or no
communications among processors. When communications are required, it is
usually happened at the beginning (to fork data to all processors) and
at the end (to join data from all processors).
This is essentially typified by the parallelization of a for-loop by
distributing the work load of the for-loop to multiprocessors. Common
examples of this type of task parallelism include Monte Carlo methods
and multiple image processing. - Data Parallelism
This type of parallelization are best for applications that
involve, primarily, large array or matrix operations. - Interprocessor Communications
This refers to the data transfer, i.e., message passing, among
processors. Message passing is generally undesirable, especially on
distributed memory multiprocessor clusters because of its relative
slowness.
- Parallel batch script (we will call it SCVmatlab for the sake of
reference) — Available on SCV Cluster
This is a shell script that launches multiple copies
of Matlab. In order for a specific copy of a Matlab process to perform
work unique from the others, the script creates a separate
subdirectory for each process. This is done to preserve data
distinct from the other processes as well as to establish a unique identity
(or rank number) to which it can be referenced. With a global file system accessible
by all processes, “communication” among processes are done through
file I/O. This method is adequate for programs that require light
interprocessor communications. For programs requiring moderate to heavy
communications, the use of file I/O in place of communication would
be too slow to render it practical. With this package, the user’s parallel
implementation is expected to be based on the SPMD (Single Program Multiple
Data) programming paradigm. A program using this paradigm would run
the same program on all processors. On each processor, however, the unique
rank number (i.e., processor ID) provided to that specific copy of Matlab
process by the script enables it to control what task or data to work on.
A parallel program that operates in this manner is also referred to as a
task parallel program. - MatlabMPI — Available on SCV Cluster
This package is modeled after MPI in the sense that a very small library
of MPI-like functions are implemented in the MATLAB language that users
can use to perform interprocessor communications and other tasks.
Like the preceeding package, it requires a global file system that
is accessible by all involved processors. Communication is achieved through
file I/O. Hence, this package is good for programs with limited
communications. This package is in the task parallel category. - pMatlab — Available on SCV Cluster
This package is built on top of MatlabMPI. The primary goal of this
package is to make migrating from a serial to a parallel program as
easy as possible. It is most suitable for programs that
perform primarily matrix (or array) operations. This type of
parallelization is called data parallelism.
After specifying how the matrices should be decomposed
(or mapped) to the processors, matrix operations can then proceed in
parallel using MATLAB functions that have been overloaded to accept the
mapping. Because this package is based on MatlabMPI, it is also limited
to applications with little or no communications. - Distributed Computing Toolbox — Not available on SCV Cluster
This package is developed by Mathworks, the developer of Matlab. Unlike
the preceding three, this package does not use file I/O for
communications and hence should not have the kind of communication
issues that those three have. Like pMatlab, it also uses function
overloading to greatly reduce the parallel porting effort for matrix-based
Matlab programs. It supports both the task and data parallel paradigms. - Star-P — Not available on SCV Cluster
This package is developed by Interactive Supercomputing, Inc. Like
Mathworks’ DCT, this package enables the programmer to convert a serial
Matlab code to parallel with very little effort. Like DCT, it supports
both the data parallel and task parallel paradigms. In addition, this
package also supports popular parallel libraries such as ScaLAPACK,
which is the parallel version of the LAPACK linear algebra package.
In the following, we will discuss the operational procedures of the three
packages available at SCV through an example. The same example is being used
for all three packages in order to highlight the programming differences.
A table that summarizes the differences as well as the pros-and-cons are
included at the end to assist you to determine which package may be more
suitable for your applications.