Programming for GPUs using OpenACC in C/C++
OpenACC is a directives-based API for code parallelization with GPUs. In contrast, OpenMP is the API for shared-memory parallel processing with CPUs. For applications, programmers insert OpenACC directives before specific code sections, typically with loops, to engage the GPUs. This approach enables existing — especially legacy — codes to be parallelized without extensive rewriting using Nvidia’s CUDA programming language for GPU. However, while a code parallelized with OpenACC directives may require significantly less effort than its equivalent CUDA counterpart, it may also result in less stellar computational performance. For many large existing codes, rewriting them with CUDA is impractical if not impossible. For those cases, OpenACC offers a pragmatic alternative.
What you need to know or do on the SCC
- To use OpenACC, compile your C (or C++) code with a more recent version of the Portland Group compiler, pgcc (pgCC for C++) (the current system default is version 8.0 which does not support OpenACC). This version is accessible at /usr/local/apps/pgi-13.2/linux86-64/13.2/bin.
- Alternatively, to make this your default compiler, add the following to your shell script
- For C shell users (csh or tcsh), add this to your shell script
- For Bourne shell users (sh or bash), add this to your shell script
To make the above changes to your shell script effective for your current SCC session, don’t forget to source your-shell-script. After this, you can proceed with compilation. For example:
scc1% pgcc -o mycode -acc -Minfo mycode.c
In the above, -acc turns on the OpenACC feature while -Minfo returns additional information on the compilation. For details, see this.
- Provided below are detailed specifications for the SCC nodes with GPUs using PGI’s pgaccelinfo command
(See the Technical Summary for additional information on the nodes.)
- To submit your code (with OpenACC directives) to a SCC node with GPUs
scc1% qsub -l gpus=1 -b y mycode
In the above, 1 GPU (and in the absence of multiprocessor request, 1 CPU) is requested.
Additional details will be provided in the near future.