{"id":107602,"date":"2017-05-31T15:05:38","date_gmt":"2017-05-31T19:05:38","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=107602"},"modified":"2023-09-27T09:27:09","modified_gmt":"2023-09-27T13:27:09","slug":"pgi-compiler-flags","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/pgi-compiler-flags\/","title":{"rendered":"PGI Compiler Flags"},"content":{"rendered":"<p>The PGI compiler family is produced by <a href=\"http:\/\/www.pgroup.com\/\">The Portland Group<\/a> which is owned by Nvidia, Inc.\u00a0 It is available on SCC. As of the AlmaLinux 8 operating system upgrade (summer 2023) this family of compilers is installed as part of the <code>nvidia-hpc<\/code> module. Previous versions of the compilers are part of the <code>pgi<\/code> modules. The following table summarizes some relevant commands on the SCC:<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Command<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td>module avail pgi <strong>OR<\/strong> module avail nvidia-hpc<\/td>\n<td>List available versions of the\u00a0PGI compiler.<\/td>\n<\/tr>\n<tr>\n<td>module load nvidia-hpc\/2023-23.5<\/td>\n<td>Load a particular version.<\/td>\n<\/tr>\n<tr>\n<td>pgcc<\/td>\n<td>C compiler.<\/td>\n<\/tr>\n<tr>\n<td>pg++<\/td>\n<td>C++ compiler.<\/td>\n<\/tr>\n<tr>\n<td>pgf90<\/td>\n<td>Fortran compiler.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The C\/C++ and Fortran compilers use the same optimization flags, and both compilers have manuals available:<\/p>\n<pre><code class=\"code-block\">man pgcc\r\nman pgf90\r\n<\/code><\/pre>\n<p>The older <code>pgi<\/code> modules have an online <a href=\"https:\/\/docs.nvidia.com\/hpc-sdk\/pgi-compilers\/18.10\/x86\/pgi-user-guide\/index.htm#cmdln-options-use\">reference manual<\/a> that describes their compiler flags in detail. The <code>nvidia-hpc<\/code> compilers also have <a href=\"https:\/\/docs.nvidia.com\/hpc-sdk\/compilers\/hpc-compilers-user-guide\/index.html#cmdln-options-use\">online manuals<\/a> for command line options.<\/p>\n<h2>General Compiler Optimization Flags<\/h2>\n<p>The basic optimization flags are summarized below.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-O0<\/nobr><\/td>\n<td>Optimization level 0. Usually for debugging.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O1<\/nobr><\/td>\n<td>Optimization level 1. Scheduling within extended basic blocks is performed. No global optimizations are performed. It is the default level if none flag is specified.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O<\/nobr><\/td>\n<td>Optimization level 2. All level 1 optimizations are performed. In addition, traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O2<\/nobr><\/td>\n<td>All -O optimizations are performed. In addition, more advanced optimizations such as SIMD code generation, cache alignment and partial redundancy elimination are enabled.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O3<\/nobr><\/td>\n<td>All -O1 and -O2 optimizations are performed. In addition, this level enables more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O4<\/nobr><\/td>\n<td>All -O1, -O2, and -O3 optimizations are performed. In addition, hoisting of guarded invariant floating point expressions is enabled.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><code>pgi<\/code> Module Flags to Specify SIMD Instructions<\/h2>\n<p>These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-tp=nehalem-64<\/nobr><\/td>\n<td>For Intel Nehalem architecture Core processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=sandybridge-64<\/nobr><\/td>\n<td>For Intel SandyBridge and Ivybridge architecture Core processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=hashwell-64<\/nobr><\/td>\n<td>For Intel\u00a0Hashwell and Broadwell architecture Core processors.<\/td>\n<\/tr>\n<tr><\/tr>\n<tr>\n<td><nobr>-tp=bulldozer-64<\/nobr><\/td>\n<td>For AMD Bulldozer processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=x64<\/nobr><\/td>\n<td>For all Intel 64-bit processors and AMD 64-bit processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=px<\/nobr><\/td>\n<td>For any x86-compatible processors (including all above).<\/td>\n<\/tr>\n<tr>\n<td><nobr>-fast<\/nobr><\/td>\n<td>Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mcache_align -Mflushz -Mpre . Chooses generally optimal flags for target platforms and selects SIMD instructions that are available on the compiling computer.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><code>nvidia-hpc<\/code> Module Flags to Specify SIMD Instructions<\/h2>\n<p>These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC. The manual page for the compilers can be referenced for the specific version you&#8217;re using: <code>man pgfortran<\/code>.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-tp=bulldozer<\/nobr><\/td>\n<td>For AMD Bulldozer architecture processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=zen\/zen2\/zen3 (choose 1)<\/nobr><\/td>\n<td>For AMD Epyc and Ryzen architecture processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=sandybridge<\/nobr><\/td>\n<td>For Intel Sandybridge processors.<\/td>\n<\/tr>\n<tr><\/tr>\n<tr>\n<td><nobr>-tp=ivybridge<\/nobr><\/td>\n<td>For Intel Ivybridge processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=skylake<\/nobr><\/td>\n<td>For Intel Skylake processors.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-tp=px<\/nobr><\/td>\n<td>For any x86-compatible processors (including all above).<\/td>\n<\/tr>\n<tr>\n<td><nobr>-fast<\/nobr><\/td>\n<td>Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mcache_align -Mflushz -Mpre . Chooses generally optimal flags for target platforms and selects SIMD instructions that are available on the compiling computer.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Default Optimization Behavior<\/h2>\n<p>The PGI compilers by default will <em>always<\/em> produce executables that are tuned for the architecture of the compiling computer.\u00a0 This means that without the -tp=x64 or -tp=px flags the output executable when compiled on the SCC login nodes will only be compatible with the Broadwell architecture.\u00a0 The CPU architecture type of all of the login nodes on the SCC is Broadwell.<\/p>\n<h2>Recommendations<\/h2>\n<p>Here are recommendations\u00a0for compiling codes on SCC.\u00a0 Either the -tp=x64 or -tp=px flags should be used for compute node compatibility.\u00a0 The -tp=x64 flag will generally produce faster code at the cost of longer compile times but has been removed on the newer compiler versions. The -tp=px flag will usually compile notably faster. It is recommended that these flags be used to build executables on the SCC with the addition of an extra flag to enable the 128-bit SIMD instructions available on all SCC nodes:<\/p>\n<pre><code class=\"code-block\">pgc++ -fast -tp=px -Mvect=simd:128 mycode.cpp -o myexe<\/code><\/pre>\n<p>The generated executable will run on any compute node on the SCC.  And alternate set of optimization flags can be used which target the Sandybridge CPU architecture. This is also compatible on all SCC compute nodes:<\/p>\n<pre><code class=\"code-block\">pgc++ -fast -tp=sandybridge  mycode.cpp -o myexe<\/code><\/pre>\n<p>To build an optimized executable for a particular node the easiest on the SCC way is to compile your code on the compute node that will run your job and have the compiler auto-select the best SIMD instructions for that compute node:<\/p>\n<pre><code class=\"code-block\">pgcc -fast -tp=native mycode.cpp -o myexe<\/code><\/pre>\n<p>However, the resulting compiled code won&#8217;t execute on an older architecture, so compiling this way on a Skylake compute node will result in programs that won&#8217;t run on SCC compute nodes that lack the AVX-512 instructions.<\/p>\n<pre><code class=\"code-block\">qsub -l cpu_arch=skylake -b y .\/myexe<\/code><\/pre>\n<p>Another option is to compile the code as part of a batch job which completely avoids any architectural issues and allows for the maximum amount of optimizations. For example, a job that is submitted to run on a Buy-in node equipped with an Ivybridge architecture CPU could be compiled with options auto-selected by the compiler for that node. As a precaution the source is copied into $TMPDIR:<\/p>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h4 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\"><strong>Example Batch Script to Recompile on a Compute Node<\/strong><\/h4><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<pre><code class=\"code-block\">#!\/bin\/bash -l\r\n#$ -l cpu_arch=ivybridge\r\nmodule load nvidia-hpc\/2023-23.5 \r\ncp -R \/projectnb\/myproject\/mysource $TMPDIR\r\ncd $TMPDIR\/mysource\r\n\r\npgcc -fast -tp=native file1.c -o myexe \r\n\r\n.\/myexe arg1 arg2 ....<\/code><\/pre>\n<p><\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>The PGI compiler family is produced by The Portland Group which is owned by Nvidia, Inc.\u00a0 It is available on SCC. As of the AlmaLinux 8 operating system upgrade (summer 2023) this family of compilers is installed as part of the nvidia-hpc module. Previous versions of the compilers are part of the pgi modules. The&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":78157,"menu_order":4,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107602"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=107602"}],"version-history":[{"count":50,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107602\/revisions"}],"predecessor-version":[{"id":147889,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107602\/revisions\/147889"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78157"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=107602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}