A parallel job is:
A single task running concurrently on multiple workers that may communicate with each other. On the SCC, this results in one batch job with multiple processors running in parallel. This is also known as a data-parallel job.
Examples of a parallel job include many linear algebra applications: matrix multiply, linear system of equation solvers, or Eigen solvers. Some of these applications may run efficiently in parallel while others may not. It depends on the underlying algorithms and operations. This also include jobs that mix serial and parallel processing. The important PCT utilities we will cover here are parfor, drange, and spmd.
If the index range is N and the number of workers is m, then parfor distributes the loop index range into m contiguous chunks. The chunk size is computed as floor(N/m). If N is not divisible by m, parfor will try to distribute N as evenly as possible among m.
For a full description of parfor usage rules, please see this.
parfor is capable of performing global reduction operations across the workers, as long as the operation satisifes the associative rule, as is the case with the plus (+) operator. The above example computes the sum of an arithmetic series. When completed, the parfor loop yields the total global sum, s, of the series. This contrasts with the subsequent two examples in which drange and spmd are used to compute the sum of the same series. In these latter two examples, the computed sum in the for-loop is the local sum for each worker. Hence, the PCT utility gplus is used to compute the total sum.
As mentioned above, the minus ( – ) operator fails the associative rule. The example s = s – i yields indeterministic result: it may return different wrong answer on repeat runs on one computer but it may consistently return the correct answer (-55) on another computer or operating system.
drange distributes the loop index in exactly the same manner as parfor. However, the plus (+) operator is limited to the respective worker’s local sum only. Consequently, all local sum of workers must be added, via gplus, to yield the global sum.
For spmd, a work load distribution algorithm may be formulated as a function of the labindex and numlabs query functions. While work load distribution can be done in any arbitrary way, for the sake of comparison, we use prange to provide the same distribution pattern as used by parfor and drange.
In summary, parfor and drange are for-loop based and their work load distribution is determined implicitly with a fixed algorithm. On the other hand, with spmd, it is up to the programmer to define a work load distribution scheme, such as prange. To facilitate the implementation of a work load distribution algorithm, labindex, numlabs are very useful. Overall, parfor is very versatile because it not only automatically distributes the work load, it also supports many features such as global reduction, which drange is not capable of doing. On the other hand, parfor does not work within the spmd environment, while drange offers the alternative simple parallel for-loop that works inside spmd which may be handy if you have other spmd-based operations.