Historically, “multiprocessor programming” refers to parallel programming with multiple CPUs. With the advent of General Purpose GPU (GPGPU), we interpret this term with a broader meaning to include both CPUs and GPUs. Multiprocessing with CPU will be covered in the immediate sections below while multiprocessing with GPUs will be covered next.

Topics

Multiprocessing with CPUs

The use of multiple CPUs to achieve parallel speedup has been practiced for decades. Mature enabling paradigms and associated software are well understood and broadly adopted. Among these the most commonly used paradigms (which the SCC supports) are Message Passing Interface (MPI) for distributed memory systems and OpenMP for shared memory, thread-based, computer systems. These paradigms support common languages — C, C++, and Fortran — for which you can build executable with the SCC provided GNU and PGI families of compilers. Please visit the Compilers page for more details, such as how to optimize the performance of your code.

Compile a program parallelized with OpenMP directives

  • For GNU compilers, use the -fopenmp compiler flag to activate the OpenMP paradigm.
    scc1% gfortran myprogram.f              <-- OpenMP not turned on
    scc1% gfortran -fopenmp myprogram.f     <-- OpenMP turned on
    scc1% gfortran -fopenmp myprogram.f90   <-- OpenMP turned on
    scc1% gcc -fopenmp myprogram.c          <-- OpenMP turned on
    scc1% g++ -fopenmp myprogram.C          <-- OpenMP turned on

    Default executable name is a.out. Use -o my-executable to assign arbitrary name. Whenever possible, use -O3 for highest level of code optimization. See Compilers for more options.

  • For PGI compilers, use the -mp compiler flag to activate the OpenMP paradigm.
    scc1% pgfortran myprogram.f             <-- OpenMP not turned on
    scc1% pgfortran -mp myprogram.f         <-- OpenMP turned on
    scc1% pgfortran -mp myprogram.f90
    scc1% pgcc -mp myprogram.c
    scc1% pgCC -mp myprogram.C

    Default executable name is a.out. Use -o my-executable to assign arbitrary name. Whenever possible, use -O3 for highest level of code optimization. See Compilers for more options.

    Running OpenMP jobs

    For program development and debugging purposes, short OpenMP jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. Jobs exceeding these limit will be terminated automatically by the system. All other jobs (> 10 minutes and/or > 4 threads) should run in batch. (See Running Jobs page)

    1. Run executable a.out on a login node:
      scc1% setenv OMP_NUM_THREADS 2      <-- set thread count (for tcsh users)
      scc1$ export OMP_NUM_THREADS=2      <-- set thread count (for bash users)
      scc1% ./a.out
    2. Run executable a.out on a compute node in batch:
      scc1% setenv OMP_NUM_THREADS 2      <-- set thread count (for tcsh users)
      scc1$ export OMP_NUM_THREADS=2      <-- set thread count (for bash users)
      scc1% qsub -pe omp 2 -V -b y ./a.out

Compile a program parallelized with MPI

Compiling an MPI-enabled program requires the directory path from which the compiler can find the necessary header file (e.g., mpi.h) and MPI library. For ease of compilation, this additional information are built into wrapper scripts mpif77, mpif90, mpicc, and mpiCC for the respective languages they serve: Fortran 77, Fortran 90/95/03, C, and C++. By default, these wrappers are linked to the GNU compilers. For example, mpicc is, by default, linked to the gcc compiler while mpif90 points to the gfortran compiler. Switching to the PGI compilers can be accomplished by specifying the selection through the environment variable MPI_COMPILER. Note that an undefined (unset) MPI_COMPILER points the wrappers to their respective GNU compilers.

  • To make the MPI wrappers compile with GNU compilers:

    Step 1. The MPI wrappers will use GNU compilers if either MPI_COMPILER is unset or set as gnu.

    scc1% setenv MPI_COMPILER gnu       <-- select gnu compilers (for tcsh users)
    scc1$ export MPI_COMPILER=gnu       <-- select gnu compilers (for bash users)

    If you have never used PGI compilers, the above step is redundant as GNU is the default wrapper compilers.

    Step 2. Compile with MPI wrappers.

    scc1% mpif77 myprogram.f
    scc1% mpif90 myprogram.f90
    scc1% mpicc myprogram.c
    scc1% mpiCC myprogram.C
  • To make MPI wrappers compile with PGI compilers:

    Step 1. Setting MPI_COMPILER to pgi makes the wrappers compile with PGI compilers.

    scc1% setenv MPI_COMPILER pgi    <-- select PGI compilers (for tcsh users)
    scc1% export MPI_COMPILER=pgi    <-- select PGI compilers (for bash users)

    Step 2. Compile with MPI wrappers.

    scc1% mpif77 myprogram.f
    scc1% mpif90 myprogram.f90
    scc1% mpicc myprogram.c
    scc1% mpiCC myprogram.C

    Running MPI jobs

    For program development and debugging purposes, short MPI jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. All other MPI jobs (> 10 minutes and/or > 4 threads) should run in batch. (More in Running Jobs page)

    1. Run MPI executable a.out on a login node:
      scc1% mpirun -np 4 ./a.out
    2. Run executable a.out on a compute node in batch:
      scc1% qsub -pe mpi_4_tasks_per_node 4 -b y "mpirun -np 4 ./a.out"
  • Notes

    • If you always use the GNU family of compilers, none of the MPI_COMPILER settings described before is needed because the MPI wrappers point to the GNU compilers by default.
    • On the other hand, if you always use the PGI compilers for MPI compilation, you can permanently set MPI_COMPILER to PGI in your .cshrc or .bashrc shell script.
    • If you are using the PGI compilers, it is important to read the page on PGI compilers’ impact on job performance and portability.
    • For MPI options, please consult the specific wrapper manpage. For example,
      scc1% mpiman mpicc

    Compiling 32-bit executables

    By default, all compilers on the SCC Cluster produce 64-bit executables. To build 32-bit executables, you need a compiler flag:

    • Examples for building 32-bit GNU executables
      scc1% gcc -m32 -fopenmp myexample.c       <-- for OpenMP code
      scc1% mpicc -m32 myexample.c              <-- set MPI_COMPILER to gnu
    • Examples for building 32-bit PGI executables
      scc1% pgcc -tp p7-32 -mp myexample.c     <-- for OpenMP code
      scc1% mpicc -tp p7-32 myexample.c        <-- set MPI_COMPILER to pgi

Multiprocessing with GPUs

Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs. (See GPU Computing for detail.)