For MATLAB operations such as code development, GUI and other graphical rendering, interactive MATLAB window is the natural and preferred mode of operation. Other applications, such as running long duration production runs that do not require interaction, are best run in the background mode, commonly known as batch. Batch jobs are typically managed by a batch scheduler. The scheduler for the SCC is Open Grid Scheduler (OGS). Users submit batch jobs via a job submission command and the rest is handled by the batch scheduler and the operating system. The general procedure for batch submission and handling is described in the Shared Computing Center’s Running Jobs page.

Batch System Overview

  • Batch jobs are submitted to the batch scheduler via qsub
    scc1$ qsub [qsub options] user-script [arg1 ...]

    In above, user-script is a user supplied shell script that dictates operations to perform while qsub options let you specify supported options.

  • This page uses the words processor, core, thread and slot to interchangeably denote what computer hardware vendors call a processor core.
  • A user can submit as many jobs as needed. However, no more than 256 processors requested by the same user can be in run state at any given time.
  • The default wall clock limit is 12 hours. Specify a different limit with
    scc1$ qsub -l h_rt=HH:MM:SS . . .
  • Serial batch jobs can run for up to 720 wall clock hours (30 days)
    scc1$ qsub -l h_rt=720:00:00 . . .
  • Multicore batch jobs using the “omp” parallel environment can run for 720 hours
    scc1$ qsub -pe omp 16 -l h_rt=720:00:00 . . .
  • Technical Summary lists available SCC compute nodes with details.

Essential Batch Commands

  1. Use qsub to submit batch jobs. For example
    scc1$ qsub ./mbatch

    The mbatch batch script is

    #!/bin/csh
    matlab -nodisplay -singleCompThread -r "n=4, rand(n), exit"
  2. Use qstat to query batch queue status. And, specifically for your jobs (with -u option)
    scc1$ qstat -u yourID
    job-ID  prior   name       user         state submit/start at     queue . . .
    ----------------------------------------------------------------------
     477578 0.00000 mbatch     yourID      qw    05/01/2013 08:50:06

    In the above, qw indicates that the job is waiting in queue. A running job would have a state of r.

  3. To kill a job already in the queue (running or waiting)
    scc1$ qdel 477578
  4. Two output files will be generated in connection with the job; all errors are reported in mbatch.e477578 while output goes to mbatch.o477578 (for MATLAB jobs this includes the MATLAB splash screen and anything that goes to the command window). Please read running-jobs page to find out how to request notification when the batch job is completed.
  5. The batch scheduler has built-in system default behaviors, like 12-hour wall time limit. You can define your own qsub default behaviors so that you won’t have to specify them. On occasions, you may wish to control the execution order of your batch jobs. See Advanced Topics for instructions.

Types of MATLAB Batch Jobs

Depending on the applications, job running procedures generally fall into one of the following categories:

Serial MATLAB Batch Jobs

This is the most common and basic batch job type for applications that require only one processor or core.

scc1$ qsub ./mbatch

The sample batch script, mbatch, is

#!/bin/csh
matlab -nodisplay -singleCompThread -r "n=4, rand(n), exit"

Represented between the pair of double quotes above is, essentially, a MATLAB command window to run any supported MATLAB commands: define a variable ( n=4 ); run built-in MATLAB utilities ( rand, exit ); or — though not in the aove sample script — you can use your own function or script m-file (e.g., myfct.m; do not include .m ! ). The MATLAB exit command is recommended to properly quit MATLAB and ends the batch job. As this is a single-processor job, the -singleCompThread disables MATLAB multithreading (i.e., multi-core; Intel chip hyperthreading is turned off system-wise).

Optionally, give mbatch execute attribute to enable it to run from the window session. This is useful for testing mbatch in your window session before submiting it to batch with qsub.

scc1$ chmod +x mbatch
scc1$ ./mbatch

Embarrasingly Parallel MATLAB Batch Jobs

When a group of independent serial jobs are submitted to batch and run concurrently, such as Monte Carlo simulations or parametric studies, they effectively run in parallel. This is one of the easiest and efficient way to gain parallel performance. Towards this goal, the -t option, known as Job Array, makes it simple to run an application multiple times with a single issuance of qsub.

scc1$ qsub  -t 3-9:2 ./epbatch

Relationship among the Array’s environment variables are expressed, with the help of MATLAB colon notation, as

[$SGE_TASK_ID] = $SGE_TASK_FIRST : $SGE_TASK_STEP : $SGE_TASK_LAST 
               = 3:2:9  
               = [3,5,7,9]

The values of $SGE_TASK_ID are computed automatically and made available to you by qsub. For this example, the 4 unique values of $SGE_TASK_ID are assigned to the 4 respective batch jobs. The epbatch script below provides one usage of $SGE_TASK_ID

#!/bin/csh
matlab -nodisplay -singleCompThread -r "n=2*$SGE_TASK_ID, rand(n); exit"

For most applications, the step size is 1 by omission

scc1$ qsub  -t 1-9 . . .
[$SGE_TASK_ID] = 1:1:9 = [1,2,3,4,5,6,7,8,9]

Additional examples on $SGE_TASK_ID usage

  1. Use $SGE_TASK_ID to save output for each task to avoid clobber
    scc1$ qsub -t 1-3 epbatch2

    Batch script epbatch2:

    #!/bin/csh
    # This is a sample script for running qsub array jobs
    # Used with qsub -t to provide env var $SGE_TASK_ID
    matlab -nodisplay -singleCompThread -r "myApp($SGE_TASK_ID), exit"
    function myApp(task)
    % Sample user app running embarrassingly parallel tasks
    % task -- expects $SGE_TASK_ID (an index)
    % To prevent clobbered by output from multiple tasks, save
    % each task's output to a file with name indicative of task
    n=2*task      % size of square random matrix
    A = rand(n)   % computes random matrix
    filnam=['output.' num2str(task) '.mat']
    save(filnam, 'A')  % saves A in a mat file
  2. If you need two indices, you can use the MATLAB ind2sub utility to map linear indexing to 2D. For example, if you want to generate a 3×4 array of 2D indexing, you could submit an array job:
    scc1$ qsub -t 1-12 . . .

    This launches 12 jobs with their respective $SGE_TASK_ID = 1, 2, 3, . . ., 12. Next, in myApp.m . . .

    [i, j] = ind2sub([3 4], $SGE_TASK_ID); % returns the row i & column j

More details . . .


Parallel MATLAB Batch Jobs With PCT

Generally, this type of batch jobs require the MATLAB Parallel Computing Toolbox. More details . . .

MATLAB Standalone Batch Jobs

For users that run MATLAB batch jobs frequently (especially many at a time), we recommend that you compile your application into a standalone executable. This executable will run directly on the host system (such as the SCC) rather than in the MATLAB environment. No MATLAB license will be required. This will prevent your jobs from getting aborted in the event that MATLAB licenses are not available.

MATLAB Multithreaded Batch Jobs

Many MATLAB vector operations, especially level-3 (Ο(n3)) linear algebra operations such as matrix-matrix multiply, can take advantage of multithreading if the amount of computations is significant. If you are certain that your MATLAB code can take advantage of implicit parallelism, remove the -singleCompThread flag from mbatch, then submit the job to a multiprocessor queue. Note that this type of parallel operation does not require the Parallel Computing Toolbox. There are two situations associated with implicit parallel computation in regards to batch processing: using a whole node (SCC hardware options are 12 and 16 cores) versus using a partial node.

  1. Sample batch script for using a whole node, wnbatch:
    #!/bin/csh
    matlab -nodisplay -r "A=rand(10000); B=A*A'; exit"
    scc1$ qsub -pe omp 16 ./wnbatch

    In this example, 16 cores are assigned to the job but only 12 is allowed by MATLAB R2013a. Recognizing the limitation, submitting the job with “-pe omp 12″ would be a better utilization of system resources.

  2. Sample batch script for using a partial node, pnbatch:
    #!/bin/csh
    matlab -nodisplay -r "$NSLOTS, maxNumCompThreads($NSLOTS); A=rand(10000); B=A*A'; exit"
    scc1$ qsub -pe omp 7 -V  ./pnbatch

    The -V flag causes all environment variables to be passed to pnbatch. The environment variable $NSLOTS picks up 7 when qsub is invoked with -pe omp 7.

    The use of maxNumCompThreads causes a forewarning of its future deprecation. As of MATLAB R2014b, this function is still available.