This page imported from: /afs/bu.edu/cwis/webuser/web/s/c/scv/documentation/tutorials/MPI/more/matrix_transpose.html

Matrix Transpose

Example : Matrix Transpose

Demonstrates the usage of virtual topology.

We will demonstrate the usage of virtual topologies through a matrix transposition example.
While the underlining operations of a matrix transpose is very simple
mathematically, it is not at all trivial to program it in parallel
efficiently by conventional procedures.
More importantly here, it enables us to see how virtual
topology can be used effectively to assist us in the programming process.

The algorithm we will follow is a somewhat little known fact that a matrix transpose
can be performed in the following manner:

  1. Select p and q such that the total number of processes,
    nprocs = p x q. For simplicity, we will assume that p and q evenly
    divide n and m, respectively.

  2. Partition the n x m matrix into a (blocked) p x q matrix whose elements
    are themselves matrices of size (n/p) x (m/q).
    The elements of these sub-matrices may either be defined,
    computed or read in from a file.

  3. Perform a transpose on each of these sub-matrices. These are performed
    serially as the entire sub-matrix resides locally on a process. No interprocess communication is required.

  4. Perform a transpose on the p x q matrix; this means that the contents
    of the local transposed matrix must be communicated from location “p,q” to location
    “q,p” via message passing. This operation involves
    interprocess communications. On completion, the final transposed matrix is
    obtained.

  5. Note that in reality, the previous step is often not necessary. If
    you need to access the element (or sub-matrix) “p,q” of the transposed
    matrix, all you need do is to access the element “q,p” which has already
    been transposed locally. Depending on what comes next in the calculation,
    unnecessary message passing may be avoided.
i,j (p)

aij

As an example, we define a 9 x 4 matrix and will use 6 processes. Next,
map it into a 3 x 2 virtual cartesian grid, i.e., p=3, q=2.
Coincidentally, each element of this cartesian grid is in turn a matrix of size
3 x 2, as illustrated in the figure below. For the physical grid, each square
box represents one entry of the matrix. The pair of indices, “i,j”, on the first
row gives the global cartesian coordinates while “(p)” is
the process associated with the virtual grid allocated by calling
MPI_Cart_create or MPI_Comm_split.
aij on the second row is the value of the matrix element.
On the right of the figure, the 3 x 2 virtual grid is depicted.
Each box in this grid represents one process and contains one 3×2 submatrix.
Finally, another communicator is created for the transposed virtual grid
which has the dimensions of 2 x 3. The element at “1,0” of the
transposed virtual grid, for instance, stores the value sent by the element
at “0,1” of the virtual grid.

Physical Grid

&nbsp &nbsp &nbsp
&nbsp
&nbsp