Job dependency control
Out of necessity or for better throughput, an application may spawn a series of batch jobs which may be required to run in a specific order. For these applications, the job dependency can be controlled, in general, with the
- Example 1. All batch jobs in the group need to run in a specific sequence.
scc1% qsub -N job1 script1 scc1% qsub -N job2 -hold_jid job1 script2 scc1% qsub -N job3 -hold_jid job2 script3
- Example 2. A designated job must wait until the remaining jobs in the group have completed (aka post-processing).
In this example,
lastjobwon’t start until
scc1% qsub -N job1 script1 scc1% qsub -N job2 script2 scc1% qsub -N job3 script3 scc % qsub -N lastjob -hold_jid "job*" script4
In both examples, the use of the “
-N” option to assign job names makes job identification easier as the names of the referenced jobs are known a priori; this is especially helpful in the second example because a wild card (*) can be used effectively. Note that the above procedures are also applicable to parallel batch jobs (i.e. with the
-pe switch). As an alternative to the manual job-by-job submission shown above (for conceptual demonstration), incorporating all the steps into a script is more practical. For a complete example, visit Running Multiple Batch Jobs With qsub Array Job Option.
Submitting Array Jobs
If you submit many jobs at the same time that are largely identical, you should submit them as array jobs. Array jobs are easier to manage, faster to submit, and they greatly reduce the load on the scheduler. An array job executes multiple independent copies of the same job script. These multiple copies are referred to as “tasks” and are scheduled independently as resources become available, i.e. the tasks are not scheduled all at once. The number of tasks to be executed is set using the
-t start-end[:step] option
qsub command, where
start is the index of the first task (it has to be 1 or more, it can not be 0),
end is the index number of the last task, and
step is an optional step size (step size defaults to 1 if unspecified). Here’s an example of using this command:
scc % qsub -t 1-25 myscript.sh
The above command will submit an array job consisting of 25 tasks, numbered from 1 to 25. Since the step size was not specified, the default step size of 1 will be used. Each task will independently execute the
myscript.sh job file. The batch system sets the
SGE_TASK_ID environment variable, which can be used inside the script to pass the task ID to the program. Below is an example of how you can utilize that environment variable in the job file:
#!/bin/bash -l # Specify that we will be running an Array job with 25 tasks numbered 1-25 #$ -t 1-25 # Request 1 core for my job #$ -pe omp 1 # Give a name to my job #$ -N my_array_job # Join the output and error streams #$ -j y # Run my R script and give it the $SGE_TASK_ID environment # variable as a command-line argument Rscript myRfile.R $SGE_TASK_ID
There is a more advanced example code here as well.
When running the
qstat -u USER_ID command to check on the status of the Array job, the running tasks will be listed as separate lines, each with the corresponding task ID visible under the “ja-task-ID” column. All the remaining tasks that are not running, i.e. are queued, will be listed as a single line, their task IDs will be aggregated also under the “ja-task-ID” column (see the code block below, at the far right).
[aaly@scc1]$ qstat -u aaly job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 4960245 0.11489 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 1 4960245 0.11012 my_array_j aaly r 02/25/2021 20:31:37 email@example.com 14 2 4960245 0.10774 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 3 4960245 0.10630 my_array_j aaly r 02/25/2021 20:31:37 email@example.com 14 4 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 5 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 email@example.com 14 6 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 7 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 email@example.com 14 8 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 9 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 email@example.com 14 10 4960245 0.10535 my_array_j aaly r 02/25/2021 20:31:37 firstname.lastname@example.org 14 11 4960245 0.00000 my_array_j aaly qw 02/25/2021 20:29:33 14 12-25:1
In the above output of
qstat the submitted array job is composed of 25 tasks. Tasks 1-11 are running, and each is listed as a separate entry. Tasks 12-25 are still in the queue waiting to be scheduled; they appear as a single entry.
Notice all the tasks carry the same job “name”; here they are called
my_array_job (there’s no room to display the entire name in the column, so it is truncated). It is important to give a distinct name to your Array jobs. In addition, it is not recommended to assign a name to the output file using
-o option in the qsub file. This is because all the tasks will write to the same file.
In the event that an array job was submitted by mistake, simply delete all tasks with:
scc1% qdel job_ID
Once the array job has finished, there will an output file for each task that has run. We can execute the
ls command to see them as below:
[aaly@scc1]$ ls myscript.sh my_array_job.o4960245.12 my_array_job.o4960245.16 my_array_job.o4960245.2 my_array_job.o4960245.23 my_array_job.o4960245.4 my_array_job.o4960245.8 my_array_job.o4960245.1 my_array_job.o4960245.13 my_array_job.o4960245.17 my_array_job.o4960245.20 my_array_job.o4960245.24 my_array_job.o4960245.5 my_array_job.o4960245.9 my_array_job.o4960245.10 my_array_job.o4960245.14 my_array_job.o4960245.18 my_array_job.o4960245.21 my_array_job.o4960245.25 my_array_job.o4960245.6 myRfile.R my_array_job.o4960245.11 my_array_job.o4960245.15 my_array_job.o4960245.19 my_array_job.o4960245.22 my_array_job.o4960245.3 my_array_job.o4960245.7
By running the
ls command we see the output files of the array job listed. The output file naming format is
job_name.o<job_id>.<task_id>. In the above example, the job name is
my_array_job, the job id is
4960245, and the task id is between 1 and 25. Thus, we see the corresponding output file for each task in the Array job.
Customize qsub with .sge_request
With the current batch scheduler, a user may create a
.sge_request file (in their home directory) to customize preferred or frequently used
qsub settings. These settings will be in effect for all subsequent
qsh) batch submissions. Here is what the
.sge_request file might look like:
# .sge_request file must reside in home directory # # Send me mail when job gets aborted or ends normally -m ae # I want email sent to this email address (instead of default BU email) -M email@example.com # I have multiple projects, I want jobs charged to this project -P projectname
Your batch script,
my_batch_script, need not include those options already defined in
.sge_request. However, if you do, they will take precedence over those in
.sge_request. Furthermore, any option that appears on the
qsub command line input will supersede that which is in
my_batch_script. Here is an example
scc1% qsub -m a -l h_rt=48:00:00 my_batch_script
The above batch job will only send email if the job is aborted, not if it ends normally as the
.sge_request indicated. The combination of the other options specified in the
.sge_request and on the command line will also take effect.