{"id":107605,"date":"2017-05-31T15:07:03","date_gmt":"2017-05-31T19:07:03","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=107605"},"modified":"2026-03-18T10:47:44","modified_gmt":"2026-03-18T14:47:44","slug":"gcc-compiler-flags","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/gcc-compiler-flags\/","title":{"rendered":"GNU and LLVM Compiler Flags"},"content":{"rendered":"<p>The GNU family of compilers produce highly optimized code for Intel and AMD CPUs.\u00a0 As the LLVM C and C++ compilers deliberately share the majority of their optimization flags with their GNU equivalents the information here applies to both sets of compilers.\u00a0 As with all compilers, programs compiled with optimization should have their output double-checked for accuracy. If the numeric output is incorrect or lacks the desired accuracy less-aggressive compile options should be tried. The following table summarizes some relevant commands on the SCC for the GNU compilers:<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Command<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td>module avail gcc<\/td>\n<td>List available versions of the GNU compilers.<\/td>\n<\/tr>\n<tr>\n<td>module load gcc\/10.2.0<\/td>\n<td>Load a particular version.<\/td>\n<\/tr>\n<tr>\n<td>gcc<\/td>\n<td>GNU C\u00a0compiler.<\/td>\n<\/tr>\n<tr>\n<td>g++<\/td>\n<td>GNU C++ compiler.<\/td>\n<\/tr>\n<tr>\n<td>gfortran<\/td>\n<td>GNU Fortran 90\/95\/2003\/etc compiler.<\/td>\n<\/tr>\n<tr>\n<td>g77<\/td>\n<td>GNU Fortran 77 compiler.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>On AlmaLinux 8 the system gcc\/g++\/gfortran compilers are version 8.5.0.<\/strong><\/p>\n<p>The LLVM compilers commands are summarized here:<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Command<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td>module avail llvm<\/td>\n<td>List available versions of the LLVM compilers.<\/td>\n<\/tr>\n<tr>\n<td>module load llvm\/12.0.1<\/td>\n<td>Load a particular version.<\/td>\n<\/tr>\n<tr>\n<td>clang<\/td>\n<td>LLVM C compiler.<\/td>\n<\/tr>\n<tr>\n<td>clang++<\/td>\n<td>LLVM C++ compiler.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Manuals are available for all of the compilers after their modules are loaded:<\/p>\n<pre><code class=\"code-block\">man g++\r\nman gfortran\r\nman clang<\/code><\/pre>\n<p>The GNU Compiler Collection has their optimization flags described in an <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Optimize-Options.html\">online document<\/a>.<\/p>\n<h2>General Compiler Optimization Flags<\/h2>\n<p>The basic optimization flags are summarized below. Using these flags does not result in any incompatibility between CPU architectures.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-O<\/nobr><\/td>\n<td>Optimized compile.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O2<\/nobr><\/td>\n<td>More extensive optimization.\u00a0 This is recommended flag for most codes.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O3<\/nobr><\/td>\n<td>More aggressive than -O2 with longer compile times. Recommended for codes that loops involving intensive floating point calculations.<\/td>\n<\/tr>\n<tr>\n<td>-ffastmath<\/td>\n<td>Allows for higher performance with floating point calculations at the risk of a slight loss of precision.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-Ofast<\/nobr><\/td>\n<td>-O3 plus some extras. The GNU documentation notes that this option results in a disregard of &#8220;strict standards compliance. &#8220;<\/td>\n<\/tr>\n<tr>\n<td><nobr>-flto<\/nobr><\/td>\n<td>Link-time optimization, a step that examines function calls between files when the program is linked. This flag must be used to compile and when linking. Compile times are very long with this flag, however depending on the application there may be appreciable performance improvements when combined with the -O* flags.\u00a0 This flag and any optimization flags must be passed to the linker, and gcc\/g++\/gfortran should be called for linking instead of calling ld directly.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-mtune=<em>processor<\/em><\/nobr><\/td>\n<td>This flag does additional tuning for specific processor types, however it does <strong>not<\/strong> generate extra SIMD instructions so there are no architecture compatibility issues. The tuning will involve optimizations for processor cache sizes, preferred ordering of instructions, and so on. The useful values for the value <em>processor<\/em> on the SCC Intel nodes are the same as the architecture flags on the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/computing-resources\/tech-summary\/\">Tech Summary<\/a> page.\u00a0 On the AMD Bulldozer nodes the value to use is <em>bdver1<\/em>, and on the AMD Epyc nodes the value is <em>znver2<\/em>.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2>Flags to Specify SIMD Instructions<\/h2>\n<p>These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC. For AVX-512 instructions there are a variety of flags that can be used. If you are interested in compiling with support for those the easiest way is to specify an Intel CPU architecture that supports AVX-512 using the <code>-march=<em>arch<\/em><\/code> flag. For accepted architecture names check the manual for the compiler, e.g. <code>man gcc<\/code><\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-march=native<\/nobr><\/td>\n<td>Creates an executable that uses SIMD instructions based on the CPU that is compiling the code. Additionally it includes the optimizations from the -mtune=native flag. Not recommended as code compiled on newer architectures will not run on older architectures.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-march=<em>arch<\/em><\/nobr><\/td>\n<td>This will generate SIMD instructions for a particular architecture and apply the -mtune optimizations.\u00a0 The useful values of <em>arch<\/em> are the same as for the -mtune flag above.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-mavx<\/nobr><\/td>\n<td>Generates code with AVX instructions.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-mavx2<\/nobr><\/td>\n<td>Generates code with AVX2 instructions. Code compiled with this flag will not be able to run CPU architectures without AVX2 instructions.<code><br \/>\n<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Default Optimization Behavior<\/h2>\n<p>Most open source programs that compile from source code use the -O2 or -O3 flags. This will result in fast code that can run on any compute node on the SCC. The -march=native, which is sometimes used by default in open source programs, can be problematic when run on the login nodes as they are Broadwell architecture CPUs which support AVX2 instructions. Codes compiled with -march=native on a login node will only be able to execute on Broadwell architecture compute nodes on the SCC.<\/p>\n<h2>Recommendations<\/h2>\n<p>Most codes will be well-optimized with the -O2 or -O3 flags plus the -msse4.2 flag. Programs that involve intensive floating-point calculations inside of loops can additionally be compiled with the -x<em>arch<\/em> flag.\u00a0 For maximum cross-compatibility across the SCC compute nodes and probable highest performance a combination of flags should be used:<\/p>\n<pre><code class=\"code-block\">gcc -O3 -march=sandybridge -mtune=intel -c mycode.cpp<\/code><\/pre>\n<p>Note that selecting specific SIMD instructions with the -mavx* flag or -march=<em>arch<\/em> flag will restrict compatibility with compute nodes unless the job is submitted with this qsub flag: -l cpu_arch=<em>compatible_arch<\/em>. The <em>compatible_arch<\/em> value is an architecture name that matches the SIMD instructions.\u00a0 Alternatively, the qsub flag -l cpu_arch=<em>\\!compatible_arch<\/em> can be used to exclude an incompatible architecture:<\/p>\n<pre><code class=\"code-block\">gcc -O3 -ffastmath -march=broadwell mycode.cpp -o mycode\r\nqsub -l cpu_arch=broadwell -b y mycode\r\n# OR...as the -march=broadwell has produced AVX2 instructions\r\n# select nodes that support AVX2.\r\nqsub -l avx2 -b y mycode<\/code><\/pre>\n<p>Another option is to compile the code as part of a batch job which completely avoids any architectural issues and allows for the maximum amount of optimizations. For example, a job that is submitted to run on a Buy-in node equipped with an Ivybridge architecture CPU could be compiled with tunings for that node. As a precaution the source is copied into $TMPDIR:<\/p>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h4 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\"><strong>Example Batch Script to Recompile on a Compute Node<\/strong><\/h4><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<pre><code class=\"code-block\">#!\/bin\/bash -l\r\n#$ -l cpu_arch=ivybridge\r\nmodule load gcc\/9.3.0<\/code><\/pre>\n<pre><code class=\"code-block\"># Copy the source to $TMPDIR to avoid interaction\r\n# with other jobs running\r\ncp -R \/projectnb\/myproject\/mysource $TMPDIR\r\ncd $TMPDIR\/mysource\r\n<\/code><\/pre>\n<pre><code class=\"code-block\">gcc -O3 -march=native -ffastmath -c file1.c \r\ngcc -O3 -march=native -ffastmath -c file2.c \r\ngcc -o myexe file1.o file2.o -lm \r\nmyexe arg1 arg2 ....<\/code><\/pre>\n<p><\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>The GNU family of compilers produce highly optimized code for Intel and AMD CPUs.\u00a0 As the LLVM C and C++ compilers deliberately share the majority of their optimization flags with their GNU equivalents the information here applies to both sets of compilers.\u00a0 As with all compilers, programs compiled with optimization should have their output double-checked&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":78157,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107605"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=107605"}],"version-history":[{"count":10,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107605\/revisions"}],"predecessor-version":[{"id":161263,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107605\/revisions\/161263"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78157"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=107605"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}