Content
- How to Submit a Batch Job
- Software Versions and the Module Command
- General Job Submission Directives
- Resource Usage and Limits
- SGE Environment Variables
How to Submit a Batch Job
Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:
scc % qsub [options] command [arguments]
For example, to submit the printenv
command to the batch system, execute:
scc % qsub -b y printenv
Your job #jobID ("printenv") has been submitted
The option -b y
tells the batch system that the following command is a binary executable. The output message of the qsub
command will print the job ID, which you can use to monitor the job’s status within the queue. While the job is running the batch system creates stdout and stderr files, which by default are created in the job’s working directory. These files are named after the job’s name with the extension ending in the job’s number, for the above example printenv.o#jobID
and printenv.e#jobID
. The first one will contain the output of the command and the second will have the list of warnings and errors, if any, that occurred while the job was running.
When running a program that requires arguments and passes additional directives to the batch system, it becomes useful to save them in a script file and submit this script as an argument to the qsub
command. For example, the following script script.sh
will execute a simple python job:
#!/bin/bash -l
# program name or command and its options and arguments
python myscript.py
To submit this script.sh
file to the batch system, execute:
scc % qsub script.sh
Your job #jobID ("script.sh") has been submitted
For other batch script examples, please see Batch Script Examples.
Software Versions and the Module Command
To access software packages on the SCC you need to use a module
command. For example, even though there is a systems version of Python, it is very old and does not contain any popular packages. To get access to newer versions of the software, please use Modules. When a module
command is used in a bash script, the first line of the script must contain the “-l
” option to ensure proper handling of the module command:
#!/bin/bash -l
# Specify the version of MATLAB to be used
module load matlab/2021b
# program name or command and its options and arguments
matlab -nodisplay -nodesktop -singleCompThread -batch "n=4, rand; exit"
General Job Submission Directives
There are a number of directives (options) that the user can pass to the batch system. These directives can either be provided as arguments to the qsub
command or embedded in the job script. In a script file the lines containing these directives begin with the symbols #$
– here is an example:
#!/bin/bash -l
#$ -P myproject # Specify the SCC project name you want to use
#$ -l h_rt=12:00:00 # Specify the hard time limit for the job
#$ -N myjob # Give job a name
#$ -j y # Merge the error and output streams into a single file
module load python3/3.8.10
python myscript.py
Below is the list of some of the most commonly used directives:
General Directives | |
Directive | Description |
---|---|
-l h_rt=hh:mm:ss | Hard run time limit in hh:mm:ss format. The default is 12 hours. |
-P project_name | Project to which this jobs is to be assigned. This directive is mandatory for all users associated with any Med.Campus project. |
-N job_name | Specifies the job name. The default is the script or command name. |
-o outputfile | File name for the stdout output of the job. |
-e errfile | File name for the stderr output of the job. |
-j y | Merge the error and output stream files into a single file. |
-m b|e|a|s|n | Controls when the batch system sends email to you. The possible values are – when the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n) – default. |
-M user_email | Overwrites the default email address used to send the job report. |
-V | All current environment variables should be exported to the batch job. |
-v env=value | Set the runtime environment variable env to value . |
-hold_jid job_list | Setup job dependency list. job_list is a comma separated list of job ids and/or job names which must complete before this job can run. See Advanced Batch System Usage for more information. |
Resource Usage and Limits
The Sun Grid Engine (SGE) allows a job to request specific SCC resources necessary for a successful run, including a node with large memory, multiple CPUs, a specific queue, or a node with a specific architecture. The Technical Summary contains hardware configuration for all SCC nodes. The Advanced Batch System Usage page contains examples of running jobs which require parallel environments (OMP, MPI, GPU).
The following table lists the most commonly used options to request resources available on the SCC:
Directives to request SCC resources | |
Directive | Description |
---|---|
-l h_rt=hh:mm:ss | Hard run time limit in hh:mm:ss format. The default is 12 hours. |
-l mem_per_core=#G | Request a node that has at least this amount of memory per core. Recommended choices are: 3G, 4G, 6G, 8G, 12G, 16G, 18G and 28G |
-pe omp N | Request multiple slots for Shared Memory applications (OpenMP, pthread). This option can also be used to reserve a larger amount of memory for the application. N can vary. Currently, to request multiple cores on SCC’s shared nodes, we recommend selecting 1-4, 8, 16, 28, or 36 cores. |
-pe mpi_#_tasks_per_node N | Select multiple nodes for an MPI job. Number of tasks can be 4, 8, 12, 16, or 28 and N must be a multiple of this value. See Running Parallel Batch Jobs for more information. |
-t N | Submit an Array Job with N tasks. N can be up to 75,000. For more information see Array Jobs |
-l cpu_arch=ARCH | Select a processor architecture (broadwell, ivybridge, cascadelake…). See Technical Summary for all available choices. |
-l cpu_type=TYPE | Select a processor type (X5670, X5675, Gold-6132 etc.) See Technical Summary for all available choices. |
-l gpus=G | Requests a node with GPUs. G is the number of GPUs. See GPU Computing for more information. |
-l gpu_type=GPUMODEL | To see the current list of available GPU models, run qgpus command. See GPU Computing for more information. |
-l gpu_c=CAPABILITY | Specify minimum GPU capability. Current choices for CAPABILITY are 3.5, 5.0, 6.0, 7.0, and 8.6 |
-l gpu_memory=#G | Request a node with a GPU that has 12G, 16G, 24G, 32G, or 48G of memory. |
-l avx | Request a node that supports AVX and newer CPU instructions. A small number of modules require support for these instructions. |
The following table summarizes the wall-clock runtime limits for different jobs based on their type:
Run time lmits for shared nodes | |
Type of the job | Time limit on shared nodes |
---|---|
Single processor job | 720 hours (30 days) |
OMP | 720 hours (30 days) |
MPI | 120 hours (5 days) |
GPU | 48 hours (2 days) |
SGE Environment Variables
When the job is scheduled to run, a number of environment variables are set and may be used by the program:
Batch System Environment | |
Environment Variable | Description |
---|---|
JOB_ID | Current job ID |
JOB_NAME | Current job name |
NSLOTS | The number of slots requested by a job |
HOSTNAME | Name of execution host |
SGE_TASK_ID | Array Job task index number |
SGE_TASK_STEPSIZE | The step size of the array job specification |
SGE_TASK_FIRST | The index number of the first array job task |
SGE_TASK_LAST | The index number of the last array job task |
TMPDIR | The absolute path to the job’s temporary working directory |