There are three methods to submit a parallel MATLAB batch job on the Shared Computing Cluster, depending on the number of processors needed and whether the tasks for the job are mutually independent:
- Method 1: Batch submission procedure for up to 12 processors
- Method 2: Batch submission procedure for more than 12 processors Currently unavailable
- Method 3: Batch submission procedure for embarrassingly parallel codes
Quite often, many applications’ parallel efficiency peak at 4 to 16 processors. For these runs, it is suitable to use this method to submit batch jobs. In addition, this procedure guaranteed that any communication among the workers will be local within the node and hence more effective — especially for communication-bound applications. Being on the same node also means that multithreading may be used and hence provide additional parallel functionalities (see the previous section, Implicit Parallelism). This procedure uses the standard
qsub batch command. To accompany
qsub in a batch submission, a batch script is required. RCS provides a single_node_batch script to run your parallel MATLAB applications. Because it is not a system script, you will need to download and save it. Don’t forget to enable the script’s execute attribute:
scc1% chmod +x single_node_batch
With this script saved and execute attribute set, a batch job may be launched in the following manner
scc1% qsub -pe omp 12 single_node_batch "n=300;m=200; runLocal" localOutput
In this example,
- 12 processors are requested.
- the line break is an artifact of the webpage line width. You should enter it as one line.
- runLocal.m is used to open matlabpool and run an application program, matmulExample, and are part of the downloadable zip file:
% runLocal.m script m-file to open matlabpool and run user app matlabpool open local % use local config matmulExample(n, m) % example user app matlabpool close
- the MATLAB output is directed to
- -pe omp 4, for example, may be used to request 4 processors of a node.
- a -l h_rt=24:00:00 may be added, before or after -pe …, to change the default walltime limit of 2 hours to 24.
- See Technical Summary for additional choices available on the SCC Cluster.)
If your application needs more than twelve processors, MATLAB worker licenses will be required. Consequently, a special process is needed to handle the license request, automatically in the background, without any user action. The steps a user need to follow are described below.
- A batch script, called pctBatch, is required. This script is very similar to that of Method 1. What differentiate the two methods is with which configuration matlabpool is invoked. This script is also not available on the system level. You will need to download and save it and enable the execute attribute with
scc1% chmod +x pctBatch
- runSGE.m is being used to open matlabpool to run an application program, matmulExample. These files are part of the download zip file.
% runSGE.m wrapper script to open matlabpool and run user app matlabpool('SGE', N) % N is passed at runtime; matmulExample(n, m) % example user app matlabpool close
- Submit job with pctBatch:
scc1% pctBatch "n=3000,m=3000,N=16,runSGE" myoutput
The input parameters, n, m, are passed on to the user application code while N is the worker pool size. The job’s default runtime is 2 hours. This is set by the system and cannot be changed.
- Unlike a job that is submitted with
qsubdirectly, in this case it may take a while before the job appears in the batch queue and tracked with
qstat. In the meantime, you can confirm that the process is registered with this unix command:
scc1% ps -aux | grep runSGE
- In addition to the matlab workspace output redirected to
myoutput, a MATLAB file, JobX.mpiexe.out is also generated. This file contains miscellaneous information such as the splash screen as well as the processors assigned to job.
Practical considerations for Methods 1 and 2
The purpose for the two procedures are abundantly clear: Method 1 for up to 12 cores and Method 2 for larger number of workers. However, in practice, Method 1 may often turn out to be the better of the two choices for a variety of reasons, despite having fewer cores than with Method 2:
- Many non-embarrassingly parallel codes naturally scale well for small number of cores but reaches the point of diminishing return beyond a certain number (code dependent). For these codes, Method 1 is highly suitable.
- Inter-node communications generally have higher latency than on-node communications — especially since MATLAB supports only the 1 Gigabit Ethernet inter-node communication. For communication-bound applications, this make Method 2 more expensive.
- The processor pool from which Method 2 picks has the slowest clock speed (i.e., the 2.4 Ghz AMD Opteron 2216HE). This may offset the advantage of using Method 2 to run with more processors.
(See Technical Summary for the different processors available on the SCC.)
- Submitting jobs with Method 2 for more than 12 processors means multiple nodes will be used. This reduces the likelihood for multi-threaded computations.
- For some codes, the reason for choosing Method 2 may be driven by memory requirement considerations. With the “-l memory=96G” qsub switch, a job submitted with Method 1 allows access to more memory per processor than Method 2.
- The wait time in queue may be longer with Method 2 because the SGE configuration requires one Distributed Computing Server (DCS) license for each worker. If the needed number of free DCS licenses — there are 64 — are not available, your batch job has to wait in spite of free processors. On the other hand, Method 1 wait only for free processors, not DCS licenses.
Jobs that fall into this category may be much more effectively submitted via an RCS-developed script. With this method, no knowledge or use of the Parallel Computing Toolbox is required and all tasks are submitted as serial jobs. Wait time for serial jobs may be siginificantly reduced on a heavily loaded batch system.