Content

How to Submit a Batch Job

Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:

scc % qsub [options] command [arguments]

For example, to submit the printenv command to the batch system, execute:

scc % qsub -b y printenv
Your job #jobID ("printenv") has been submitted

The option -b y tells the batch system that the following command is a binary executable. The output message of the qsub command will print the job ID, which you can use to monitor the job’s status within the queue. While the job is running the batch system creates stdout and stderr files in the job’s working directory, which are named after the job with the extension ending in the job’s number, for the above example printenv.o#jobID and printenv.e#jobID. The first one will contain the output of the command and the second will have the list of errors, if any, that occurred while the job was running.

When running a program that requires arguments and passes additional options to the batch system, it quickly becomes useful to save them in a script file and submit this script as an argument to the qsub command. For example, the following script script.sh will execute a simple MATLAB job:

#!/bin/bash
 
# program name or command and its options and arguments
matlab -nodisplay -singleCompThread -r "n=4, rand; exit"
Note: To be processed correctly, the script must contain a blank line at the end of the file.

To submit this script.sh file to the batch system, execute:

scc % qsub script.sh
Your job #jobID ("script.sh") has been submitted

For other batch script examples, please see Batch Script Examples.

Software Versions and the Module Command

The default versions of many applications (like MATLAB, R, Python, gcc compiler, etc.) are old. To get access to newer versions of the software, please use Modules. When a module command is used in a bash script, the first line of the script must contain the “-l” option to ensure proper handling of the module command:

#!/bin/bash -l
 
# Specify the version of MATLAB to be used
module load matlab/2016b

# program name or command and its options and arguments
matlab -nodisplay -singleCompThread -r "n=4, rand; exit"

General Job Submission Directives

There are a number of directives (options) that the user can pass to the batch system. These directives can either be provided as arguments to the qsub command or embedded in the job script. In a script file the lines containing these directives begin with the symbols #$ – here is an example:

#!/bin/bash

#$ -l h_rt=24:00:00   # Specify the hard time limit for the job
#$ -N myjob           # Give job a name
#$ -j y               # Merge the error and output streams into a single file

printenv

Below is the list of some of the most commonly used directives:

General Directives
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-P project_name Project to which this jobs is to be assigned. This directive is mandatory for all users associated with any Med.Campus project.
-N job_name Specifies the job name. The default is the script or command name.
-o outputfile File name for the stdout output of the job.
-e errfile File name for the stderr output of the job.
-j y Merge the error and output stream files into a single file.
-m b|e|a|s|n Controls when the batch system sends email to you. The possible values are – when the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n) – default.
-M user_email Overwrites the default email address used to send the job report.
-V All current environment variables should be exported to the batch job.
-v env=value Set the runtime environment variable env to value.
-hold_jid job_list Setup job dependency list. job_list is a comma separated list of job ids and/or job names which must complete before this job can run. See Advanced Batch System Usage for more information.

Resource Usage and Limits

The Sun Grid Engine (SGE) allows a job to request specific SCC resources necessary for a successful run, including a node with large memory, multiple CPUs, a specific queue, or a node with a specific architecture. The Technical Summary contains hardware configuration for all SCC nodes. The Advanced Batch System Usage page contains examples of running jobs which require parallel environments (OMP, MPI, GPU).

The following table lists the most commonly used options to request resources available on the SCC:

Directives to request SCC resources
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-l mem_total =#G Request a node that has at least this amount of memory. Current possible choices include 94G, 125G, 252G, 504G, and 1000G
-l mem_per_core =#G Request a node that has at least this amount of memory per core. Current possible choices include 3G, 4G, 8G, 12G, 16G, 18G and 28G
-pe omp N Request multiple slots for Shared Memory applications (OpenMP, pthread). This option can also be used to reserve a larger amount of memory for the application. N can vary 1-28, 36.
-pe mpi_#_tasks_per_node N Select multiple nodes for an MPI job. Number of tasks can be 4, 8, 12, 16, or 28 and N must be a multiple of this value. See Running Parallel Batch Jobs for more information.
-t N Submit an Array Job with N tasks. N can be up to 75,000. For more information see Array Jobs
-l cpu_arch=ARCH Select a processor architecture (broadwell, haswell, ivybridge, …). See Technical Summary for all available choices.
-l cpu_type=TYPE Select a processor type (X5650, X5670, X5675, etc.) See Technical Summary for all available choices.
-l gpus=G/C Requests a node with GPUs. G/C specifies the number of GPUs per CPU requested and should be expressed as a decimal number. See GPU Computing for more information.
-l gpu_type=GPUMODEL Current choices for GPUMODEL are M2070, K40m, and P100.
-l gpu_c=CAPABILITY Specify minimum GPU capability. Current choices for CAPABILITY are 2.0, 3.5, and 6.0

The following table summarizes the wall-clock runtime limits for different jobs based on their type:

Run time lmits for shared nodes
Type of the job Time limit on shared nodes
Single processor job 720 hours (30 days)
OMP 720 hours (30 days)
MPI 120 hours (5 days)
GPU 48 hours (2 days)

SGE Environment Variables

When the job is scheduled to run, a number of environment variables are set and may be used by the program:

Batch System Environment
Environment Variable Description
JOB_ID Current job ID
JOB_NAME Current job name
NSLOTS The number of slots requested by a job
HOSTNAME Name of execution host
SGE_TASK_ID Array Job task index number
SGE_TASK_STEPSIZE The step size of the array job specification
SGE_TASK_FIRST The index number of the first array job task
SGE_TASK_LAST The index number of the last array job task
TMPDIR The absolute path to the job’s temporary working directory