Content

How to Submit a Batch Job

Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:

scc % qsub [options] command [arguments]

For example, to submit the printenv command to the batch system, execute:

scc % qsub -b y printenv
Your job #jobID ("printenv") has been submitted

The option -b y tells the batch system that the following command is a binary executable. The output message of the qsub command will print the job ID, which you can use to monitor the job’s status within the queue. While the job is running the batch system creates stdout and stderr files, which by default are created in the job’s working directory. These files are named after the job’s name with the extension ending in the job’s number, for the above example printenv.o#jobID and printenv.e#jobID. The first one will contain the output of the command and the second will have the list of warnings and  errors, if any, that occurred while the job was running.

When running a program that requires arguments and passes additional directives to the batch system, it becomes useful to save them in a script file and submit this script as an argument to the qsub command. For example, the following script script.sh will execute a simple python job:

#!/bin/bash -l
 
# program name or command and its options and arguments
python myscript.py
Note: To be processed correctly, the script must contain a blank line at the end of the file.

To submit this script.sh file to the batch system, execute:

scc % qsub script.sh
Your job #jobID ("script.sh") has been submitted

For other batch script examples, please see Batch Script Examples.

Software Versions and the Module Command

To access software packages on the SCC you need to use a module command. For example, even though there is a systems version of Python, it is very old and does not contain any popular packages. To get access to newer versions of the software, please use Modules. When a module command is used in a bash script, the first line of the script must contain the “-l” option to ensure proper handling of the module command:

#!/bin/bash -l
 
# Specify the version of MATLAB to be used
module load matlab/2021b

# program name or command and its options and arguments
matlab -nodisplay -nodesktop -singleCompThread -batch "n=4, rand; exit"

General Job Submission Directives

There are a number of directives (options) that the user can pass to the batch system. These directives can either be provided as arguments to the qsub command or embedded in the job script. In a script file the lines containing these directives begin with the symbols #$ – here is an example:

#!/bin/bash -l

#$ -P myproject       # Specify the SCC project name you want to use
#$ -l h_rt=12:00:00   # Specify the hard time limit for the job
#$ -N myjob           # Give job a name
#$ -j y               # Merge the error and output streams into a single file


module load python3/3.13.8
python myscript.py

Below is the list of some of the most commonly used directives:

General Directives
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-P project_name Project to which this jobs is to be assigned. This directive is mandatory for all users associated with any Med.Campus project.
-N job_name Specifies the job name. The default is the script or command name.
-o outputfile File name for the stdout output of the job.
-e errfile File name for the stderr output of the job.
-j y Merge the error and output stream files into a single file.
-m b|e|a|s|n Controls when the batch system sends email to you. The possible values are – when the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n) – default.
-M user_email Overwrites the default email address used to send the job report.
-V All current environment variables should be exported to the batch job.
-v env=value Set the runtime environment variable env to value.
-hold_jid job_list Setup job dependency list. job_list is a comma separated list of job ids and/or job names which must complete before this job can run. See Advanced Batch System Usage for more information.

Resource Usage and Limits

The Sun Grid Engine (SGE) allows a job to request specific SCC resources necessary for a successful run, including a node with large memory, multiple CPUs, a specific queue, or a node with a specific architecture. The Technical Summary contains hardware configuration for all SCC nodes. The Advanced Batch System Usage page contains examples of running jobs which require parallel environments (OMP, MPI, GPU).

The following table lists the most commonly used options to request resources available on the SCC:

Directives to request SCC resources
Directive Description
-l h_rt=hh:mm:ss Hard run time limit in hh:mm:ss format. The default is 12 hours.
-l mem_per_core=#G Request a node that has at least this amount of memory per core. Recommended choices are: 3G, 4G, 6G, 8G, 12G, 16G, 18G and 28G
-pe omp N Request multiple slots for Shared Memory applications (OpenMP, pthread). This option can also be used to reserve a larger amount of memory for the application. N can vary. Currently, to request multiple cores on SCC’s shared nodes, we recommend selecting 1-4, 8, 16, 28, or 36 cores.
-pe mpi_#_tasks_per_node N Select multiple nodes for an MPI job. Number of tasks can be 4, 8, 12, 16, or 28 and N must be a multiple of this value. See Running Parallel Batch Jobs for more information.
-t N Submit an Array Job with N tasks. N can be up to 75,000. For more information see Array Jobs
-l cpu_arch=ARCH Select a processor architecture (broadwell, ivybridge, cascadelake…). See the Technical Summary for all available choices.
-l cpu_type=TYPE Select a processor type (X5670, X5675, Gold-6132 etc.) See the Technical Summary for all available choices.
-l gpus=G Requests a node with GPUs. G is the number of GPUs. See GPU Computing for more information.
-l gpu_type=GPUMODEL See below on running the qgpus command for current values. See the GPU Computing for more information.
-l gpu_c=CAPABILITY Specify the minimum GPU capability. See below on running the qgpus command for current values.
-l gpu_memory=#G Request a node with a GPU that has at least the specified amount of memory in gigabytes. See below on running the qgpus command for current values.
-l avx Request a node that supports AVX and newer CPU instructions. A small number of modules require support for these instructions.
-l avx2 Request a node that supports AVX2 and newer CPU instructions. A small number of modules require support for these instructions.

Using the qgpus command

qgpus is a utility that can be run on the SCC to show the GPUs currently installed on the SCC and their availability. To see the list of GPU types just run the command without any arguments. Adding the “-s” flag will show only those GPUs that are part of the shared queues.

scc % qgpus
gpu_type  total  in_use  available
--------  -----  ------  ---------
A100          5      0      5
A100-80G     24     17      7
A40          68     15     48
A6000        77     27     50
H200         16     16      0
...etc...

Run the command with the “-v” flag to see the GPU compute capability, GPU memory, the number of CPU cores installed on the GPU nodes, and the queue assignment. This can also be combined with the “-s” flag to limit the results to the shared queues. An additional flag, “-q queuename“, can be specified to view the GPU configuration for a particular queue.

scc % qgpus -v
host      gpu_type  gpu_c  gpu_mem  cpu_   cpu_    gpu_   gpu_    gpu_   queue_list
                                    total  in_use  total  in_use  avail
--------  --------  -----  -------  -----  ------  -----  ------  -----  ------------------------------
scc-212   A100      8.0    80G      32     16      4      3       1      a100
scc-211   A40       8.6    48G      32     18      4      4       0      a40
scc-a03   H200      9.0    144G     32     17      4      4       0      h200
scc-a04   H200      9.0    144G     32     13      4      4       0      h200
...etc...

The following table summarizes the wall-clock runtime limits for different jobs based on their type:

Run time limits for shared nodes
Type of the job Time limit on shared nodes
Single processor job 720 hours (30 days)
OMP 720 hours (30 days)
MPI 120 hours (5 days)
GPU 48 hours (2 days)

SGE Environment Variables

When the job is scheduled to run, a number of environment variables are set and may be used by the program:

Batch System Environment
Environment Variable Description
JOB_ID Current job ID
JOB_NAME Current job name
NSLOTS The number of slots requested by a job
HOSTNAME Name of execution host
SGE_TASK_ID Array Job task index number
SGE_TASK_STEPSIZE The step size of the array job specification
SGE_TASK_FIRST The index number of the first array job task
SGE_TASK_LAST The index number of the last array job task
TMPDIR The absolute path to the job’s temporary working directory