The login and compute nodes on the SCC consist of a variety of CPU architectures. The available compilers have a number of different optimization options that can attempt to take advantage of specific features of the CPU architectures to produce faster programs. The great majority of compute nodes use Intel processors. There are additionally a small number of compute nodes that use AMD processors with the Bulldozer and Epyc architectures, although access to these is limited to particular groups. The complete list of compute nodes, CPU model, and CPU architecture available on the SCC can be found on the Technical Summary page.
One key aspect of how different CPU architectures are distinguished from each other involves instructions that make use of additional processor hardware for specialized computations. On the SCC a program compiled with options that optimize performance for a newer CPU architecture may be unable to run on older compute nodes. These include single instruction, multiple data (SIMD) instruction sets. SIMD instructions allow for multiple pieces of data to have the same instruction performed on them in parallel. Generally speaking, for scientific computing the main SIMD instruction sets of interest are those which apply to floating point calculations. Using SIMD instructions in compiled code can sometimes improve program performance by factors of 2-10x.
On the SCC the significant ones that require attention when compiling are the AVX, AVX2, and AVX-512 instruction sets. The AVX and AVX2 instructions, for example, can perform operations on up to 8 32-bit floating point numbers simultaneously in a single step. The AVX-512 instructions are more complicated, as Intel has added several additional ones with successive CPU generations. For more details, see this Wikipedia article. In the table below, AVX-512 indicates the set introduced by the Skylake architecture, which are common to all CPUs that support various versions of the AVX-512 instruction set.
SIMD Instructions by CPU Architecture
When using optimization options with compilers on the SCC, care must be taken to compile a program that contains instructions that are compatible with the architecture of the compute node that will execute the program. For some floating-point intensive codes, compiling in support for particular SIMD instructions can have a dramatic improvement in program performance. The following table shows SIMD support by CPU architecture..
| Intel Architecture | SSE4.2 | AVX | AVX2 | FMA4 | AVX-512 |
|---|---|---|---|---|---|
| Sandybridge | Yes | Yes | No | No | No |
| Ivybridge | Yes | Yes | No | No | No |
| Haswell | Yes | Yes | Yes | No | No |
| Broadwell | Yes | Yes | Yes | No | No |
| Skylake, Cascadelake, and Icelake | Yes | Yes | Yes | No | Yes |
| Sapphire, Emerald, and Granite Rapids | Yes | Yes | Yes | No | Yes |
| AMD Architecture | SSE4.2 | AVX | AVX2 | FMA4 | AVX-512 |
|---|---|---|---|---|---|
| Naples | Yes | Yes | Yes | No | No | Rome | Yes | Yes | Yes | No | No | Milan | Yes | Yes | Yes | No | No | Genoa | Yes | Yes | Yes | No | Yes |
When SIMD instruction output is specified in a compiler, the compute node that is used to run the software must have CPU support for the instructions. The SCC queue system has options that allow for the specification of CPU architecture, and batch jobs need to match compiler options for jobs to run successfully.
