Defining Memory Requirements for Grid Jobs

Setting Grid Job Memory Requirements

When submitting the qsub or qlogin request, you should use flags to request the amount of RAM that you expect to need, so you specifically get handed to a machine with that amount of RAM free, and do not end up getting handed to a machine that’s already overextended, and run out of memory. You can also use hard and soft limits to gently or forcibly kill your own program in the event that it exceeds the expected amount of memory.

If you don’t specify limits, a default s_vmem value of 2G will be set.

Please use the following qsub options to request memory need and to set soft and/or hard limits:

-l mem_free=MEM_NEEDED
-l h_vmem=MEM_MAX
-l s_vmem=MEM_MAX

where MEM_NEEDED is the amount of memory (in megabytes M, or gigabytes G) that your job will require and MEM_MAX is the upper bound on the amount of memory your job is allowed to use. mem_free and swap_free are requestable complexes — so you can request, in your job script, that your jobs are only handed to nodes with more than X amount of memory free.

Keep in mind that the soft limit s_vmem will pass a SIGINT or SIGUSR1 to the program when it hits that memory usage, and the hard limit h_vmem will outright kill the program when it hits that memory usage. NOTE: The hard limit may prematurely kill programs that like to probe all available memory when they first start up, such as MATLAB and all MPI programs! You can solve this with MATLAB by setting a ulimit, but with MPI, you should just use soft limits and make sure that your program respects them.

For example, if your job will require 4GB of memory and you want to receive a signal at 4G usage, and if it doesn’t respond to the signal, you want the program to be killed forcibly at 6G usage, you could type:

qsub -cwd -l mem_free=4G,s_vmem=4G,h_vmem=6G batch.sh

(Remember that you can put the -l options in your .sh script rather than specifying them on the command line.)

To see within a shell (interactive or batch) what limits are in place, run ulimit, for example:

ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257184
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) 2097152
file locks                      (-x) unlimited 

Parallel Jobs

For parallel jobs (like for MPI) the grid will allocate resources on a per-slot basis.

For example, if you use -l s_vmem=4G  and -pe openmpi 4, it will set a separate 4GB soft limit on each of the 4 processes, whether they are on the same physical machine or not. The total memory allocation and limit for the job will be 16 GB.

For the threaded parallel environment, all processes will be on the same machine, so there will effectively be a memory limit on that machine of the s_vmem size multiplied by the number of slots requested. If no single machine has that amount of memory available at the time, the job will wait, but if no machine has that much in total, the job will be rejected with the message  error: no suitable queues .

Checking available memory

You can use  qstat -F  to check for available resources. If you include  -F  but leave out the list of resources it will list values for all resources. To see the three memory resources described here, plus the total amount of RAM detected, for bme.q:

$ qstat -F mem_free,s_vmem,h_vmem,mem_total -q bme.q
queuename                      qtype resv/used/tot. load_avg arch          states
———————————————————————————
bme.q@bme-compsim-1.bu.edu     BIP   0/20/20        0.01     linux-x64
hl:mem_total=62.809G
hl:mem_free=61.985G
hc:s_vmem=24.000G
qf:h_vmem=infinity…

For bme-compsim-1, the total amount of RAM detected is about 63 GB. The amount of memory actually available according to the operating system is about 62 GB. The grid is reserving some memory for each of the 20 used job slots, so the available s_vmem is down to 24 GB. h_vmem is only used for setting an upper limit on usage for a job and isn’t tracked, so it always shows infinity.

The two-letter code at the start of each resource entry shows how the grid came up with that value. For the first letter:

  • g – global setting
  • q – queue setting
  • h – host setting

And the second letter:

  • l – detected load value (via the OS)
  • c – consumable value (tracked by the grid)
  • f – fixed limit

For more information on resources see man qstat, particularly the “Full Format” section.

To see the resources defined by the grid for a particular host, look at the complex_values section of the  qconf -se HOST  output, like:

 qconf -se bme-compsim-1 | grep complex_values
complex_values        s_vmem=64G

So, the grid knows bme-compsim-1 has 64 GB RAM total, and will deduct its allocations from that.