- CPU architecture and processor type
- Graphics Processing Units (GPUs)
- Scratch Space
The amount of time your job takes to run on the compute node(s) is represented by a runtime resource. The default amount of runtime for interactive batch and batch jobs is 12 hours. One can request a longer or shorter runtime by supplying the
-l h_rt=hh:mm:ss option to their batch jobs. Identifying the amount of time your code takes to run and tailoring your job workflow and runtime resource requests accordingly plays a key role in ensuring that your jobs have access to the greatest amount of SCC resources. For example, almost all of the Buy-In nodes in the SCC will only run shared batch jobs that request 12 hours or less of runtime. By identifying jobs that fit within this runtime window and asking for the appropriate runtime resource you may greatly increase the resources available for your batch job to run on.
The SCC contains varying types of Intel and AMD CPU architectures and models. The Technical Summary contains more details on the hardware that comprises each node in the cluster. Throughout our documentation we use the word processor to mean what computer hardware vendors call a processor core. The SGE batch system manpages also use the words job slot or simply slot to refer to the same concept. The number of processors that your job requires depends on how well your code can utilize multiple processors for its computations. Another reason for requesting multiple processors is to aid in the memory usage of your job that is being shared with other jobs on the same node. One can also reserve a specific CPU processor architecture and type for their job although in general this is not necessary. Details on how to reserve the number or processors using the
-pe option and type/model of processor can be found on the Submitting your Batch Job page.
There are two sets of nodes that incorporate GPUs and are available to SCC users. The first set includes 20 nodes (scc-ha1..scc-he2 and scc-ja1..scc-je2). Each of these nodes has 8 NVIDIA Tesla M2070 GPU cards with 6 GB of Memory. The second set includes 24 nodes (scc-e* and scc-f*) with 3 NVIDIA Tesla M2050 GPU cards on each node with 3 GB of Memory. The third set includes 2 nodes (scc-sc1 and scc-sc2) with 2 NVIDIA Tesla K40m GPU cards with 12 GB of Memory. The second and third sets of nodes are part of the SCF buy-in program, so access to these nodes is limited to general users, based on the needs of the groups who own them.
For more details about GPU computing on the SCC refer to GPU computing on the SCF
The SCC consists of three types of networking architecture. The majority of nodes utilize a 1 Gigabit network interface for communication with our shared filesystems and other machines. A smaller set of nodes utilize a 10 Gigabit interface for jobs that require faster network access. For batch jobs that can utilize multiple nodes using MPI (Message Passing Interface) we offer an additional faster Infiniband based network architecture to aid in inter-node communication. Our current Infiniband configurations supports 40Gbps (QDR) and 56Gbps (FDR) speeds.
Memory options to the batch system are only enforced when the job is dispatched to the node. Once the job has been dispatched, the batch system cannot enforce any limits to the amount of memory the job uses on the node. Therefore each user is expected to follow “fair share” guidelines when submitting jobs to the cluster.
The memory on each node on the SCC is shared by all the jobs running on that node. Therefore a single-processor job should not use more than the amount of memory available per core (Memorytotal/Ncores where Memorytotal is the total memory on the node and Ncores is the number of cores.) For example on the nodes with 48GB of memory and 12 cores, if the node is fully utilized, a single-processor job is expected to use no more than 4GB of memory. See the Technical Summary for the list of nodes and the memory available on each of them.
If your job requires more memory, you can request the nodes with 96, 128, or 256 Gigabytes; users affiliated with Medical Campus projects can also request a node with 512GB. You can also request more slots to reserve a larger amount of memory. For example, if the job needs 50GB of memory, it can ask for a whole node with at least 96 GB of memory using the options
-l mem_total=94G -pe omp 16 or it can reserve a node with 256GB of memory and ask for 4 slots:
-l mem_total=254G -pe omp 4
There is a local scratch directory on each of the SCC nodes. This can be used as an additional (temporary) storage area. The files stored in the scratch directories are not backed up and will be deleted by the system after 31 day. To access a node’s local scratch space, simply refer to
/scratch. You can access a specific scratch space from any other node on the system by specifying its full network path:
xx is a two-letter string such as
# is generally a single-digit number (see the Technical Summary for the full list of node names):
scc% cd /net/scc-fc3/scratch
While the job is running, reading and writing to Project Disk Space is slower than to local storage, so you might want to consider saving intermediate results on a storage local to the compute node. When the job is dispatched, a local temporary directory is created and its path is stored in the environment variable
$TMPDIR. However, any data written to the
$TMPDIR directory will be automatically deleted after the job is finished. The application can also create its own sub-directory in the scratch space and use this for temporary storage. This directory can be accessed for up to 31 day, though users are encouraged to remove any files created in the scratch space when they are no longer needed, so as to free up resources for other users.