At present, the SCC provides two families of compilers: GNU and PGI (Portland Group Inc.). The GNU compilers are designed to generate portable executables — albeit not optimally — across computer architectures, including those available on the SCC: the Intel Sandy Bridge, Intel Nehalem, and AMD Opteron. On the contrary, by default the PGI compilers strive to produce executables with optimal computing performance on specific architectures. For better computing performance, users may consider using the PGI compilers. Since the SCC supports multiple processor architectures, caution is encouraged when submitting batch jobs that involve architecture-dependent executables. Be sure to specify the correct processor class on which to run to prevent the batch scheduler from sending the job to the “wrong” node (i.e., different architecture) which may result in runtime error (see -l cpu_arch=ARCH in Table 1 of Running Jobs). On the other hand, if performance is not essential, compiling for cross-platform execution (GNU compilers or PGI compilers with the -tp=x64 switch; see below) is recommended; an executable for cross-platform runs will work on all SCC nodes.

A source code compiled with pgcc (or pgCC, pgfortran) on a login node by default will run optimally on a compute node of the same processor architecture. For example, if the source code is compiled with pgcc (and without specifying a target architecture) on the Intel Sandy Bridge-based scc1/scc2/geo/scc4, it will run best on any compute node of the same Sandy Bridge architecture but may fail to run on any Nehalem-based nodes. On the other hand, if the code is compiled on the Nehalem-based Tanto (this login node is for CNS users only), the executable will run well on an SCC Nehalem node and may run, less than optimally, on a Sandy Bridge node. The difference in behavior reflects Intel and AMD’s practice of processor backward compatibility support; code compiled on the older Nehalem class would run on the newer Sandy Bridge class but, generally, not the other way (you will receive an “Illegal Instruction” system-generated message which is admittedly not very informative).

Since the batch scheduler has no knowledge of how an executable is built, a code compiled for a specific family of processor class may be dispatched to run on a node of a different processor architecture. Consequently, runtime error may result as mentioned above. This can be avoided by including a qsub -l cpu_arch=arch in the batch script or command line batch submission. Alternatively, PGI compilers have a provision to compile for cross-platform (i.e., different processor classes) runs. Executables compiled with this option will run on any Intel or AMD processor class but the computing efficiency may be less than optimal, especially if the code is compiled with a high level of optimization, such as -fast.

The antithetical objectives of PGI code compilations:

  • Compile for optimal efficiency (on specific processor)

    The primary objective of the PGI compilers is to generate an executable to run on a specific processor optimally. If this is your goal, then a PGI compiler is appropriate. There are two ways to achieve this goal:

    1. By default, the PGI compiler will generate an executable to run optimally on the processor on which you perform the compilation. For users using scc1, scc2, geo, scc4 login nodes, that would be for the E5, or Sandy Bridge, class of processors. For Tanto users, that would be for the Nehalem class (see Technical Summary)
    2. Whether you are on a Sandy Bridge or Nehalem class login node to compile a code, you can explicitly specify the target (see the -tp=target section below) architecture for which the code is to be compiled. This also comes in handy if you need to compile codes on a compute node. For example, you may compile code on a Nehalem-node to generate executables for the Sandy Bridge architecture.
  • Compile for ease of use (cross platform portability)

    If your objective is to build an executable that will run on many processor platforms, then the GNU compilers should probably be used. Similarly, the PGI compiler can compile for cross-platform applications with a special switch (see the -tp=target section). Note that, to fulfill the cross-platform objective on both GNU and PGI compilers, the performance efficiency may be compromised, especially if one uses more aggressive optimization procedure in compilation, such as -fast. How much of an efficiency degradation is code dependent.

The -tp=target switch

This switch may be used, on a login or a compute node, to specify the processor for which the executable is to be compiled and optimized. For example, one typically compiles code on a login node (except Tanto) with the Sandy Bridge class of processors (listed as “sandybridge” in the CPU Architecture column in the Technical Summary). Hence, if no -tp is specified, the executable would be optimized, by default, for the Sandy Bridge class. With -tp=nehalem, the executable would be built for the Nehalem class of processors (listed as “nehalem” in the CPU Architecture column in theTechnical Summary ), even though it was compiled on a Sandy Bridge processor. We recommend that -tp=target always be used for the PGI compilers. Relevant target choices on the SCC are:

  • -tp=x64
    This switch produces an executable that is compatible with all Intel and AMD families of processors. This usage essentially makes the executable portable but the efficiency may suffer.
    scc1% pgcc -o myprog myprog.c myutils.c -fast -tp=x64

    Since myprog has been compiled to run on all types of processors, no special CPU architecture request is needed for qsub.

    scc1% qsub -b y myprog
  • -tp=sandybridge
    This switch produces an executable for Intel Sandy Bridge family of processors. The Xeon E5-XXXX family of processors listed in the Technical Summary belong to this class. Executables generated with this switch must always run on this class of processors.
    scc1% pgcc -o myprog myprog.c -tp=sandybridge -fast
    scc1% qsub -l cpu_arch=sandybridge -b y myprog
  • -tp=nehalem
    This switch produces an executable for the Intel Nehalem (aka Westmere) family of processors. The x5560, x5570, x5650, x5675 processors listed in the Technical Summary belong to this family of processors. Executables generated with this switch must always run on this class of processors.
    scc1% pgcc -o myprog myprog.c myutils.c -fast -tp=nehalem

    To run a batch job on this class of processors

    scc1% qsub -l cpu_arch=nehalem -b y myprog

    As a side note, if your application needs GPUs, you must compile your code with -tp=nehalem. This is because, at present, all nodes that have GPU hardware are of the Nehalem architecture. See GPU Computing for additional details.

  • Default (no -tp)
    In the absence of -tp=target, the PGI compiler will compile for the processor class on which the code is compiled. Jobs must run on the same processor class. For example, if the code is compiled on the SCC1 login node and -tp target is not specified, the code is, by default, compiled for the Intel Sandy Bridge class of processors
    scc1% pgcc -o myprog myprog.c -fast

    You must ensure that the batch queue will send your job to a Sandy Bridge node

    scc1% qsub -l cpu_arch=sandybridge -b y myprog
  • scc1% pgcc -help       #for  more options

Compiling MPI programs with PGI compilers

For MPI programs, the MPI wrappers such as mpicc and mpif90 help to make compiling easier as they automatically include the include file and library paths required for MPI compilation. On the SCC, these wrappers are linked, by default, to the corresponding GNU compilers. You can instruct the wrappers to use PGI compilers by setting the MPI_COMPILER environment variable as follows:

scc1% setenv MPI_COMPILER pgi

For a more permanent setting, add the above statement to your .cshrc or .bashrc in your home directory. Since PGI compilers are used, attention to the processor architecture -tp=target switch, as mentioned above, must be considered. If you want to go back to GNU compilers

scc1% unsetenv MPI_COMPILER