• Matrix multiply example

    Demonstrated below are several ways to perform a matrix multiply of two N x N matrices (requiring N^3 operations). There is a cost associated with starting the matlabpool and there is a cost associated with distributing an array to the workers. Consequently, a parallel operation is worthwhile only if the array size is sufficiently large to gain significant parallel speedup to offset the overhead cost. Note that the overhead cost is incurred only once while the distributed arrays may be used many times. Note also that there is always a communication cost whenever the distributed array need to be transferred among the workers or with the MATLAB workspace.

    On shared-memory multicored PC or a multicored node of a cluster such as the SCC, multithreading is turned on by default. Consequently, what may appear to be a non-parallel matrix multiply (the first matrix multiply) may actually be computed with multi-threads (or cores). This multi-threaded matrix multiply scales very well, so is the distributed matrix multiply operations that follow. Multithreading is also referred to as implicit parallelism.

    There are multiple ways to distribute arrays. You can distribute an array among the workers directly from the MATLAB workspace with distributed (see above slide) or from the spmd environment with codistributed (see slide below). Distributed arrays using distributed are accessible directly from the MATLAB workspace and hence may be more convenient for some applications. However, these arrays are always distributed by the right-most dimension. For example, a distributed two-dimensional array is always distributed along the columns. On the other hand, an array may be codistributed in any valid dimension; if not specified, it is defaulted to the right-most dimension.

    MATLAB constructs such as zeros and rand have been overloaded to handle parallel distribution. Decomposing an array directly with constructor function not only saves memory, it may be more efficient as well.

    According to the above timing data, distributing a by row and b by column yield the most efficient matrix multiply operation. This is a direct consequence of matrix multiply rules. For other applications, the optimal choice of matrix distribution may be less obvious or straightforward unless you are familiar with the underlining algorithm used. The linear algebraic system solver, shown next, is just such an example.

  • Linear algebraic system example ( Ax = b )

    Similar to matrix multiply, linear solver (Ax=b) operations are performed in parallel since the matrices involved are distributed.

Previous   Home   Next