CPU Architectures : TechWeb : Boston University

The login and compute nodes on the SCC consist of a variety of CPU architectures. The available compilers have a number of different optimization options that can attempt to take advantage of specific features of the CPU architectures to produce faster programs. The great majority of compute nodes use Intel processors. There are additionally a small number of compute nodes that use AMD processors with the Bulldozer and Epyc architectures, although access to these is limited to particular groups. The complete list of compute nodes, CPU model, and CPU architecture available on the SCC can be found on the Technical Summary page.

One key aspect of how different CPU architectures are distinguished from each other involves instructions that make use of additional processor hardware for specialized computations. On the SCC a program compiled with options that optimize performance for a newer CPU architecture may be unable to run on older compute nodes. These include single instruction, multiple data (SIMD) instruction sets. SIMD instructions allow for multiple pieces of data to have the same instruction performed on them in parallel. Generally speaking, for scientific computing the main SIMD instruction sets of interest are those which apply to floating point calculations. Using SIMD instructions in compiled code can sometimes improve program performance by factors of 2-10x.

On the SCC the significant ones that require attention when compiling are the AVX, AVX2, and AVX-5122 instruction sets. The AVX and AVX2 instructions, for example, can perform operations on up to 8 32-bit floating point numbers simultaneously in a single step.

SIMD Instructions by CPU Architecture

When using optimization options with compilers on the SCC, care must be taken to compile a program that contains instructions that are compatible with the architecture of the compute node that will execute the program. For some floating-point intensive codes, compiling in support for particular SIMD instructions can have a dramatic improvement in program performance. The following table shows SIMD support by CPU architecture.

Intel Architecture	SSE4.2	AVX	AVX2	FMA4	AVX-512
Sandybridge	Yes	Yes	No	No	No
Ivybridge	Yes	Yes	No	No	No
Haswell	Yes	Yes	Yes	No	No
Broadwell	Yes	Yes	Yes	No	No
Skylake	Yes	Yes	Yes	No	Yes
Cascadelake	Yes	Yes	Yes	No	Yes
Icelake	Yes	Yes	Yes	No	Yes

AMD Architecture	SSE4.2	AVX	AVX2	FMA4	AVX-512
Bulldozer	Yes	Yes	No	Yes	No
Epyc	Yes	Yes	Yes	No	No

When SIMD instruction output is specified in a compiler, the compute node that is used to run the software must have CPU support for the instructions. The SCC queue system has options that allow for the specification of CPU architecture, and batch jobs need to match compiler options for jobs to run successfully.