Content

How to Submit a Batch Job

Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:

scc % qsub [options] command [arguments]

For example, to submit the printenv command to the batch system, execute:

scc % qsub -b y printenv
Your job #jobID ("printenv") has been submitted

The option -b y tells the batch system that the following command is a binary executable. The output message of the qsub command will print the job ID, which you can use to monitor the job’s status within the queue. While the job is running the batch system creates stdout and stderr files in the job’s working directory, which are named after the job with the extension ending in the job’s number, for the above example printenv.o#jobID and printenv.e#jobID. The first one will contain the output of the command and the second will have the list of errors, if any, that occurred while the job was running.

When running a program that requires arguments and passes additional options to the batch system, it quickly becomes useful to save them in a script file and submit this script as an argument to the qsub command. For example, the following script script.sh will execute a simple MATLAB job:

#!/bin/bash
 
# program name or command and its options and arguments
matlab -nodisplay -singleCompThread -r "n=4, rand; exit"
Note: To be processed correctly, the script must contain a blank line at the end of the file.

To submit this script.sh file to the batch system, execute:

scc % qsub script.sh
Your job #jobID ("script.sh") has been submitted

For additional MATLAB batch procedures, please see Running MATLAB Batch on the SCC.

General job submission directives

There are a number of directives (options) that the user can pass to the batch system. These directives can either be provided as arguments to the qsub command or embedded in the job script. In a script file the lines containing these directives begin with the symbols #$ – here is an example:

#!/bin/bash

#$ -N myjob     # Give job a name
#$ -j y         # Merge the error and output streams into a single file

printenv

Below is the list of most commonly used directives:

General Directives
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-P project_name Project to which this jobs is to be assigned. This directive is mandatory for all users associated with any Med.Campus project.
-N job_name Specifies the job name. The default is the script or command name.
-o outputfile File name for the stdout output of the job.
-e errfile File name for the stderr output of the job.
-j y Merge the error and output stream files into a single file.
-m b|e|a|s|n Controls when the batch system sends email to you. The possible values are – when the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n) – default.
-M user_email Overwrites the default email address used to send the job report.
-V All current environment variables should be exported to the batch job.
-v env=value Set the runtime environment variable env to value.
-hold_jid job_list Setup job dependency list. job_list is a comma separated list of job ids and/or job names which must complete before this job can run. See Advanced Batch System Usage for more information.

Resource usage and limits

SGE allows a job to request specific SCC resources necessary for a successful run, including a node with large memory, multiple CPUs, specific queue or a node with a specific architecture. Technical Summary contains hardware configuration for all SCC nodes. Advanced Batch System Usage page contains examples of running the jobs which require parallel environment (OMP, MPI, GPU).

The following table lists most commonly used options to request resources available on the SCC:

Directives to request SCC resources
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-l mem_total =#G Request a node that has at least this amount of memory. Current possible choices include 94G, 125G, 252G ( 504G – for Med. Campus users only).
-l cpu_arch=ARCH Select a processor architecture (sandybridge, nehalem). See Technical Summary for all available choices.
-l cpu_type=TYPE Select a processor type (E5-2670, E5-2680, X5570, X5650, X5670, X5675). See Technical Summary for all available choices.
-l gpus=G/C Requests a node with GPU. G/C specifies the number of GPUs per each CPU requested and should be expressed as a decimal number. See Advanced Batch System Usage for more information.
-l gpu_type=GPUMODEL Current choices for GPUMODEL are M2050, M2070 and K40m.
-pe omp N Request multiple slots for Shared Memory applications (OpenMP, pthread). This option can also be used to reserve larger amount of memory for the application. N can vary from 1 to 16.
-pe mpi_#_tasks_per_node N Select multiple nodes for MPI job. Number of tasks can be 4, 8, 12 or 16 and N must be a multiple of this value. See Advanced Batch System Usage for more information.

The following table summarizes the wall-clock runtime limits for different jobs based on their type:

Run time lmits for shared nodes
Type of the job Time limit on shared nodes
Single processor job 720 hours (30 days)
OMP 720 hours (30 days)
MPI 120 hours (5 days)
GPU 48 hours (2 days)