Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs.

It is very possible to achieve a speedup, in some cases, of more than 100 times using GPUs. A single GPU job can even greatly outperform a multi-processor CPU job. See this chart for a timing comparison of matrix multiplication between serial, MPI, and GPU codes.

### GPU Resources

There are currently 3 sets of nodes that incorporate GPUs and are available to SCC users.

The first set includes 18 nodes. Each of these nodes has an Intel Xeon X5675 CPU with 12 cores running at 3.07Ghz and 48 GB of memory. Each node also has 8 NVIDIA Tesla M2070 GPU cards with 6 GB of Memory.

The second set includes 2 nodes. Each of these nodes has E5-2650v2 processors with 16 cores. Each node also has 2 NVIDIA Tesla K40m GPU cards with 12 GB of Memory each and 3.5 compute capability.

The third set includes 4 nodes. Each of these nodes has E5-2680 v4 processors with 28 cores and 849GB of scratch space. Each node also has 2 NVIDIA Tesla P100 GPU cards with 12 GB of Memory each and 6.0 compute capability.

For more details on nodes available on the SCC, please visit the Technical Summary page.

### Running on the GPU Nodes

Access to GPU enabled nodes is via the batch system (qsub/qsh). Direct login to these nodes is not permitted. The GPU enabled nodes support all of the standard batch options in addition to the following GPU specific options. ( -l gpus=G/C is a required option)

GPU Batch Option | Description |
---|---|

-l gpus=G/C |
G is the number of GPUs per node. C is the number of CPUs per node. G/C should be expressed as a decimal number (eg 1.5). In the case of a repeating decimal like one third, truncate it to .333. |

-l gpu_type=GPUMODEL |
Current options for GPUMODEL are M2070, K40m and P100. |

-l gpu_memory=#G |
#G represents the minimum amount of memory required per GPU. The M2070 has 6GB, K40m and P100 both have 12GB of memory. |

-l gpu_compute_capability=# |
GPU compute_capability. M2070 NVIDIA cards have a compute capability of 2.0. K40m cards have 3.5 compute capability and P100 – 6.0. |

Below are some examples of requesting GPU resources.

#### Interactive Batch

To request an interactive session with access to 1 GPU (any type) for 12 hours:

`scc1% qrsh -l gpus=1`

To request an interactive session with access to 1 GPU with compute capability of at least 3.5 (which includes K40m and P100) and 4 CPUs:

`scc1% qrsh -l gpus=0.25 -l gpu_c=3.5 -pe omp 4`

#### Non-interactive Batch Job

To submit a batch job with access to 1 GPU (compute capability of at least 3.5) and 1 CPU:

`scc1% qsub -l gpus=1 -l gpu_c=3.5 your_batch_script`

See example of a script to submit a batch job.

### Software with GPU Acceleration

As GPU computing remains a fairly new paradigm, it is not supported yet by all programming languages and is particularly limited in application support. We are striving to provide you with the most up to date information but this will be an evolving process. We are dividing languages and packages into three categories listed below.

#### Languages and software packages we have successfully tested for GPU support:

CUDA C/C++

CUDA FORTRAN

OpenACC C/C++

OpenACC Fortran

MATLAB (Parallel Computing Toolbox)

R (various packages)

Java (requires to load module jcuda)

### CPU vs. GPU

Look below to see if your application seems suitable for converting to use GPUs.

CPUs are great for task parallelism:

- High performance per single thread execution
- Fast caches used for data reuse
- Complex control logic

GPUs are superb for data parallelism:

- High throughput on parallel calculations
- Arithmetic intensity: Lots of processor cores to perform simple math calculations
- Fast access to local and shared memory

Ideal applications for general programming on GPUs:

- Large data sets with minimal dependencies between data elements
- High parallelism in computation
- High number of arithmetic operations

Physical modeling, data analysis, computational engineering, matrix algebra are just a few examples of applications that might greatly benefit from GPU computations.

### GPU Consulting

RCS staff scientific programmers can help you with your questions concerning GPU programming. Please contact us at help@scc.bu.edu.