Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs.

It is very possible to achieve a speedup, in some cases, of more than 100 times using GPUs. A single GPU job can even greatly outperform a multi-processor CPU job. See this chart for a timing comparison of matrix multiplication between serial, MPI, and GPU codes.

GPU Resources

There are currently 3 sets of nodes that incorporate GPUs and are available to the SCF users. All three are part of the Shared Computing Cluster (SCC).

The first set includes 20 nodes (scc-ha1..scc-he2 and scc-ja1..scc-je2). Each of these nodes has an Intel Xeon X5675 CPU with 12 cores running at 3.07Ghz and 48 GB of memory. Each node also has 8 NVIDIA Tesla M2070 GPU cards with 6 GB of Memory.

The second set includes 24 nodes (scc-e* and scc-f*). Each of these nodes has an Intel Xeon X5650 CPU with 12 cores running at 2.66Ghz and 48 GB of memory. Each node also has 3 NVIDIA Tesla M2050 GPU cards with 3 GB of Memory. These nodes are part of the SCF buy-in program so access is somewhat limited to general users, based on the needs of the group who purchased this cluster.

The third set includes 2 nodes (scc-sc1 and scc-sc2). Each of these nodes has E5-2650v2 processors with 16 cores, 886GB of scratch space, !gb ethernet connection and FDR Infiniband. Each node also has 2 NVIDIA Tesla K40m GPU cards with 12 GB of Memory each and 3.5 compute capability. These nodes are part of the SCF buy-in program so access is somewhat limited to general users, based on the needs of the group who purchased this cluster.

For more details on nodes available on the SCC, please visit the Technical Summary page.

Running on the GPU Nodes

Access to GPU enabled nodes is via the batch system (qsub/qsh). Direct login to these nodes is not permitted. The GPU enabled nodes support all of the standard batch options in addition to the following GPU specific options. ( -l gpus=G/C is a required option)

GPU Batch Option Description
-l gpus=G/C G is the number of GPUs per node. C is the number of CPUs per node. G/C should be expressed as a decimal number (eg 1.5). In the case of a repeating decimal like one third, truncate it to .333.
-l gpu_type=GPUMODEL Current options for GPUMODEL are M2050, M2070 and K40m.
-l gpu_memory=#G #G represents the minimum amount of memory required per GPU. The M2050 has 3GB; the M2070 has 6GB and K40m has 12GB of memory.
-l gpu_compute_capability=# GPU compute_capability. On shared nodes GPUs have a compute capability
of 2.0. The buy-in nodes with K40m cards have 3.5 compute capability.

Below are some examples of requesting GPU resources.

Interactive Batch

To request an xterm with access to 1 GPU for 4 hours (this command requires X to be running on your local machine):

scc1% qsh -V -l h_rt=4:00:00 -l gpus=1

Non-interactive Batch Job

To submit a batch job with access to 3 GPUs and 1 CPU for 2 hours:

scc1% qsub -l gpus=3 your_batch_script

To submit a batch job with access to 1 GPU/node and 8 CPUs/node on 4 nodes:

scc1% qsub -l gpus=.125 –pe mpi_8_tasks_per_node 32 your_batch_script

Software with GPU Acceleration

As GPU computing remains a fairly new paradigm, it is not supported yet by all programming languages and is particularly limited in application support. We are striving to provide you with the most up to date information but this will be an evolving process. We are dividing languages and packages into three categories listed below.

Languages and software packages we have successfully tested for GPU support:

OpenACC C/C++
OpenACC Fortran
MATLAB (Parallel Computing Toolbox)
R (various packages)
Java (requires to load module jcuda)

Languages and packages which can support GPUs but which we have not yet verified on the SCF/SCC:

Maple (CUDA Package)

Packages which have announced that GPU support is forthcoming in later versions:

PyCUDA Python (exists now but is not yet available on the SCF)
Accelrys CHARMm
Gaussian and GaussView


Look below to see if your application seems suitable for converting to use GPUs.

CPUs are great for task parallelism:

  • High performance per single thread execution
  • Fast caches used for data reuse
  • Complex control logic

GPUs are superb for data parallelism:

  • High throughput on parallel calculations
  • Arithmetic intensity: Lots of processor cores to perform simple math calculations
  • Fast access to local and shared memory

Ideal applications for general programming on GPUs:

  • Large data sets with minimal dependencies between data elements
  • High parallelism in computation
  • High number of arithmetic operations

Physical modeling, data analysis, computational engineering, matrix algebra are just a few examples of applications that might greatly benefit from GPU computations.

GPU Consulting

SCV staff scientific programmers can help you with your questions concerning GPU programming. Please contact us at