Single- and up to 4-processor jobs that require approximately ten minutes or less of cputime may run interactively on the login node, katana.bu.edu. All other jobs must be submitted to the batch system for running on the compute nodes. Depending on the type of application, procedures for running jobs may vary. Instructions for several important types of serial and multiprocessing jobs, such as MPI and OpenMP jobs, are demonstrated below.
Batch system technical summary
- The batch system is Sun Grid Engine (manpage, User’s Guide).
Common commands are: qsh, qrsh, qsub, qstat, and qdel.
Caution: Many Sun Grid Engine commands have identical names to PBS commands. However, the input to the corresponding commands or their behavior may differ.
- This page uses the word processor to mean what vendors call a processor core. The SGE manpages also use the words job slot or simply slot to refer to the same concept.
- The cluster consists of nodes of several different CPU and memory configurations but the group of nodes assigned to any particular multi-node job will always be identically configured. The -l cpu_type=… and -l memory=… options can be used to select specific CPU and memory configurations.
- Since the nodes have multiple processors it is possible for several jobs to run on the same node. The scheduling policy ensures that the CPUs are not oversubscribed but the node memory, scratch space, and network bandwidth will be shared by all the jobs running on the node. To eliminate the possibility of resource contention between jobs running on the same node it is recommended that you use a “parallel environment” specification which allocates whole nodes for your job.
- Nodes are assigned to jobs at runtime and are not known a priori. At runtime the list of nodes assigned to the job is contained in the file specified by the $PE_HOSTFILE environment variable.
- A user can submit as many jobs as desired. However, no more than 64 processors of a single user can be in the run state simultaneously.
- The maximum run time limit is generally 24 hours (-l h_rt=24:00:00). However, we are now allowing a limited number of jobs per user to run up to 72 hours. A user can request up to 4 processors (as 4 single processor jobs or one 4 processor job for example) with a run time limit of 72 hours; we currently have 12 slots for this purpose among all users. The default limit is 2 hours if you do not specify a higher limit.
- See the Technical Summary section for additional information.
- Types of batch jobs include:
Submitting a batch job
Batch jobs are submitted with the qsub command. The general form of the command is:
katana:~ % qsub [qsub options] command [arg1 ...]
In general, command is a user supplied shell script. Table 1 describes the most important qsub options.
|-l h_rt=HH:MM:SS||Hard Run Time limit (aka WallClock limit). Default is 2:00:00 (2 hours)|
|-l cpu_type=PROCESSOR_NAME||Select a processor type (see “Parallel-environment options” table below). Options are 2216HE, 2218HE, E5450, X5570, and X5670.|
|-l memory=96G||Select large memory (96 GB per node) X5570 nodes.|
|-pe parallel_environment N||Used to request use of more than 1 processor. N is the number of processors desired (2 – 32) and parallel_environment specifies how they are allocated. See Table 2 below for parallel_environment choices.|
|-b y||Tells qsub that “command” is a binary executable rather than a shell script. (see example 1)|
|-e errorfile||Where stderr of job should go. Defaults to file called “command.e” in the current working directory when qsub is run.|
|-o outputfile||Where stdout of job should go. Defaults to file called “command.o” in the current working directory when qsub is run.|
|-j y||Causes the error stream to be merged with the output stream.|
|-m b|e|a|s|n||Controls when the batch system sends mail to you. When the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n). The default is ‘e.’|
|-hold_jid job_list||Setup job dependency list. job_list is a comma separated list of job ids and/or job names which must complete before this job can run. See examples here.|
|-N name||Gives the job a name. Defaults to basename of “command.”|
|-v env_var=value||Set the runtime environment variable env_var to value.|
|-V||Use the current values of all user defined environment variables at runtime. Without this flag, only a small number of system defined environment variables are set at runtime.|
Note that most qsub options can be included in the batch script instead of on the commandline by using a special form of comment: #$ < option>
An exception to this rule is when -b y is in effect. See the “Types of Applications” section below for more details.
The -pe (parallel environment) qsub option is used to request use of more than one processor. The -pe option takes two arguments: the name of a parallel environment and a number N specifying the number of processors required by the job. Table 2 below lists each of the supported parallel environments, along with their intended purpose, processor allocation rule, and restrictions, if any, on N.
Several PEs are available for MPI jobs. The only difference between the ones whose purpose is labeled MPI is the way in which processors are allocated amongst the available nodes.
- The generic “mpi” PE only uses nodes with 4 processors and may select nodes which are shared with other jobs. We recommend that it only be used by jobs in which N is not a multiple of 4.
- For MPI jobs in which N is a multiple of 4 but not a multiple of 8 we recommend using the “mpi_4_tasks_per_node” PE which allocates whole 4-processors nodes.
- For MPI jobs in which N is a multiple of 8 we recommend using the “mpi_8_tasks_per_node” PE which allocates whole 8-processors nodes. In this case, you can also use “mpi_?_tasks_per_node” which will select either 4-processor nodes or 8-processor nodes (all of the same type), whichever type is available soonest.
The PEs labeled “multi-threaded MPI” are intended to be used by hybrid OpenMP-MPI applications. In all cases the second argument N to the -pe qsub option should be the total number of processors required by the job while the argument passed to the -np mpirun option should be N / threads_per_task.
If your application can run on multiple nodes but doesn’t use MPI you will need a specialized PE. Send mail to firstname.lastname@example.org and we’ll create an appropriate PE for you.
|parallel-environment||Purpose||Allocation Rule||values of N||values of cpu_type|
|no pe specified||Single threaded jobs||one processor on a multiprocessor node||1||2216HE,
|All N requested processors on a single 4-processor node
(node may be shared with other jobs unless N is 4)
|1 – 4||2216HE,2218HE, X5670|
|All 8 requested processors on a single 8-processor node||8||E5450 , X5570, X5670|
|All 12 requested processors on a single 12-processor node||12||X5670|
|mpi||MPI||use fewest number of 4-processor nodes as possible
(nodes may be sharerd with other jobs)
|1 – 32||2218HE|
|mpi_4_tasks_per_node||MPI||whole 4 processors nodes||4,8,12,16,20,
|mpi_8_tasks_per_node||MPI||whole 8-processor nodes||8,16,24,32||E5450, X5670|
|mpi_12_tasks_per_node||MPI||whole 12-processor nodes||12,24,36||X5670|
|mpi_4_procs_per_task||multi-threaded MPI||whole 4-processor nodes to be used by a single mpi task||4,8,12,16,20,
|mpi_8_procs_per_task||multi-threaded MPI||whole 8-processor nodes to be used by a single mpi task||8,16,24,32||E5450, X5570, X5670|
|mpi_12_procs_per_task||multi-threaded MPI||whole 12-processor nodes to be used by a single mpi task||12,24,36||X5670|
- Running a serial program without a batch script
katana:~ % qsub -b y a.out
No command script is necessary with -b y which expects a binary executable, like a.out. This job will use the default runtime limit of 2 hours. To run for say 24 hours, add the -l h_rt=24:00:00 option:
katana:~ % qsub -l h_rt=24:00:00 -b y a.out
- Running a serial program (including MATLAB) with a batch script.
- A sample batch script, mybatch, looks like this
#!/bin/csh a.out < myinput > myoutput
For MATLAB applications, the script takes this form
#!/bin/csh matlab -nodisplay -singleCompThread -r "n=4, magic(n), exit"
where strings enclosed in double quotes (“) are valid MATLAB commands, including your own application m-files (without the .m suffix). The MATLAB exit command is required to quit MATLAB and end the batch job. The -singleCompThread must be present to suppress multithreading for a serial job.
- Don’t forget to enable the execute attribute of mybatch!
katana:~ % chmod +x mybatch
- Submit the batch job.
katana:~ % qsub mybatch
- Use qstat to query the status of job
katana:~ % qstat -u kadin job-ID prior name user state submit/start at queue . . . ---------------------------------------------------------------------- 477578 0.00000 mybatch kadin qw 03/16/2010 08:50:06
- Two output files will be generated in connection with the job; all errors are reported in mybatch.e477578 while output go to mybatch.o477578 (for MATLAB jobs this include the MATLAB splash screen and anything that goes to the command window). By default, an email will also be sent to the user’s SCV-issued email address notifying you that the job has been processed. You may forward these mails to your designated email address via the .forward file in your katana home directory.
- Note that the “-singleCompThread” comand flag is used to ensure that MATLAB will not try to use multiple threads (in a single processor job) to do implicit parallel processing, such as many MATLAB vector commands and many level-3 linear algebra operations. If you know that your MATLAB code will be able to take advantage of implicit parallel processing, remove “-singleCompThread” from
mybatch, submit the job as follows
katana:~ % qsub -pe omp 4 mybatch
- A sample batch script, mybatch, looks like this
For example, you can run up to 4 separate serial jobs (or other alternatives) on a 4-core node. See Technical Summary for choices.
katana:~ % qsub -l h_rt=6:00:00 -pe omp 4 myscript
where myscript is:
#!/bin/sh prog1 < myinput1 > myoutput1 & prog2 < myinput2 > myoutput2 & prog3 < myinput3 > myoutput3 & prog4 < myinput4 > myoutput4 & wait
Running 4 instances of MATLAB concurrently is a typical usage. The advantage of this procedure is its simplicity. The maximum number of processors one can use is 12 processors on the same node via “-pe omp 12″. For MATLAB applications, see Running Multiple Independent MATLAB Tasks.
Read this page to learn how to control the order of execution of multiple batch jobs.
The omp Parallel Environment is used to run OpenMP applications. There are two ways to define the number of processors required by an OpenMP application. It can either be compiled into the executable a.out (through the invocation of the OpenMP library function
omp_set_num_threadsin the source code) or can be determined at runtime via the OMP_NUM_THREADS environment variable.
In the first case the job can be submitted with:
katana:~ % qsub -pe omp 4 -b y a.out
In the second case the environment variable setting can be specified with the qsub‘s -v option:
katana:~ % qsub -v OMP_NUM_THREADS=4 -pe omp 4 -b y a.out
MPI jobs should be submitted with:
katana:~ % qsub myscript
where myscript is an appropriately customized version of the following batch script:
#!/bin/sh # # Example SGE script for running mpi jobs # # Submit job with the command: qsub myscript # # Note: A line of the form "#$ qsub_option" is interpreted # by qsub as if "qsub_option" was passed to qsub on # the commandline. # # Set the hard runtime (aka wallclock) limit for this job, # default is 2 hours. Format: -l h_rt=HH:MM:SS # #$ -l h_rt=2:00:00 # # Merge stderr into the stdout file, to reduce clutter. # #$ -j y # # Invoke the mpi Parallel Environment for N processors. # There is no default value for N, it must be specified. # #$ -pe mpi 4 # # end of qsub options # The system supports multiple implementations of MPI. # This variable is used by the mpirun command to set up the proper # runtime environment for the job. The allowed values are "openmpi" # (the default) and "mpich." The runtime setting should # match the setting in effect when the program was compiled. # export MPI_IMPLEMENTATION=openmpi # By default, the script is executed in the directory from which # it was submitted with qsub. You might want to change directories # before invoking mpirun... # cd SOMEWHERE # Invoke mpirun. # Note: $NSLOTS is set by SGE to the number of processors # requested by the "-pe mpi N" option. # mpirun -np $NSLOTS mpi_program arg1 arg2 ...
In this example, the executable
mpi_program must have been previously compiled with MPI_IMPLEMENTATION set to openmpi (which is the system default). See the programming section for information about compiling MPI applications.
You can override the SGE batch resource parameters such as number of processors and walltime limit, pre-defined in myscript, as follows:
katana:~ % qsub -pe mpi 8 -l h_rt=24:00:00 myscript
Hybrid OpenMP-MPI jobs should be submitted with:
katana:~ % qsub myscript
where myscript is an appropriately customized version of the following batch script:
#!/bin/sh # # Example SGE script for running hybrid OpenMP-MPI jobs # # Submit job with the command: qsub myscript # # Note: A line of the form "#$ qsub_option" is interpreted # by qsub as if "qsub_option" was passed to qsub on # the commandline. # # Set the hard runtime (aka wallclock) limit for this job, # default is 2 hours. Format: -l h_rt=HH:MM:SS # #$ -l h_rt=2:00:00 # # Merge stderr into the stdout file, to reduce clutter. # #$ -j y # # Invoke the mpi_K_procs_per_task Parallel Environment for N processors. # Here "K" is 2, 3, or 4, and must be specified. # There is no default value for N, it must be specified. # #$ -pe mpi_4_procs_per_task 12 # # Specify the number of threads per task, this should match "K" above. #$ -v OMP_NUM_THREADS=4 # # end of qsub options # The system supports multiple implementations of MPI. # This variable is used by the mpirun command to set up the proper # runtime environment for the job. The allowed values are "openmpi" # (the default) and "mpich." The runtime setting should # match the setting in effect when the program was compiled. # export MPI_IMPLEMENTATION=openmpi # By default, the script is executed in the directory from which # it was submitted with qsub. You might want to change directories # before invoking mpirun... # cd SOMEWHERE # Invoke mpirun. # The argument to the -np option should be N / K. # mpirun -np 3 openmp-mpi_program arg1 arg2 ...
You can “checkout” one or more processors for interactive use. Examples of interactive batch are: running a MATLAB or interactive debugging session. Interactive batch jobs are limited to 4 processors in the same node with a default 2-hour wallclock limit. The time limit can be up to 24 hours. If x-window display is required, use qsh.
katana:~ % qrsh [-l h_rt=HH:MM:SS . . .]
The above command gives you a login shell on one of the batch nodes. The optional argument — among others — specifies the run time limit for the shell. The default (i.e., if you do not specify the run time limit) is 2 hours. Note that the square brackets ( [ ] ) means that the enclosed item is optional. Do not type the brackets. Shown below is an example that requests a 4-hour run time limit:
katana:~ % qrsh -l h_rt=4:00:00
In the qrsh-launched interactive window, a MATLAB session must be launched with “matlab -nojvm -nodisplay -nosplash”. A MATLAB session without these options will fail to respond to any of the exiting commands such as “exit”, “quit”, and “Ctrl c” or “Ctrl d”.
katana:~ % qrsh -pe omp 4
You can also request up to four processors with the “-pe” option shown above. The “omp” selection should always be used even for MPI applications to ensure that processors in the same node will be allocated.
katana:~ % qsh [-l h_rt=HH:MM:SS -pe omp N . . .]
If x-window is needed (such as for GUI-based debugger or MATLAB applications), then qsh should be used. Once an ineractive batch job is launched, an x-window will appear and you can then run, for example, a matlab session from this new window. Please note the following if intending to “check out” multiprocessors via qsh
- MPI applications do not work with qsh.
- OpenMP applications work with qsh.
- Should always select the “omp” Parallel Environment (-pe omp N).
- To submit a batch job
katana:~ % qsub . . .
- To query the status of batch jobs
katana:~ % qstat . . .
If qstat reports the job status of a job with “Eqw”, it indicates an error.
katana:~ % qstat -u kadin job-ID prior name user state submit/start at . . . --------------------------------------------------------------------- 423848 1.10000 g03 kadin Eqw 02/22/2010 10:30:49
You can get more details about the error with
katana:~ % qstat -j 423848
- To delete a batch job from the system
katana:~ % qdel . . .
- To modify characteristics of previously submitted jobs
katana:~ % qalter . . .
- To hold back submitted jobs from execution
katana:~ % qhold . . .
- To release previously held jobs
katana:~ % qrls . . .
- To charge a batch job to a project
A batch job is normally charged to the user’s default project. If the user works on a single project or if the charge should be levied against the default project, no user action is required. On the other hand, users working on multiple projects may, at times, need to charge a batch job to a non-default project. Note that the charging procedure varies among all SCV machines (See FAQ, Project Accounting). Please consult the respective machine’s runningjobs webpage for the correct procedure. Described below is the charging procedure for the Katana Cluster.
project-name will be charged for this and subsequent batch jobs (submitted from this window, or shell) until the next newgrp command is executed. Charging procedure for batch jobs submitted from another (non-descendant) window will not be affected.
- To find out the projects of which you are a member
katana:~ % groups my_default_project my_second_project my_third_project . . .
The first on the list is always the default project which can be changed.