{"id":107600,"date":"2017-05-31T15:03:43","date_gmt":"2017-05-31T19:03:43","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=107600"},"modified":"2023-07-14T13:48:02","modified_gmt":"2023-07-14T17:48:02","slug":"intel-compiler-flags","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/intel-compiler-flags\/","title":{"rendered":"Intel Compiler Flags"},"content":{"rendered":"<p>Intel produces compilers that produce highly optimized code for their CPUs. As with all compilers, programs compiled with optimization should have their output double-checked for accuracy. If the numeric output is incorrect or lacks the desired accuracy less-aggressive compile options should be tried. The following table summarizes some relevant commands on the SCC:<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Command<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td>module avail intel<\/td>\n<td>List available versions of the Intel compiler.<\/td>\n<\/tr>\n<tr>\n<td>module load intel\/2023.1<\/td>\n<td>Load a particular version.<\/td>\n<\/tr>\n<tr>\n<td>icc<\/td>\n<td>C compiler.<\/td>\n<\/tr>\n<tr>\n<td>icpc<\/td>\n<td>C++ compiler<\/td>\n<\/tr>\n<tr>\n<td>ifort<\/td>\n<td>Fortran compiler.<\/td>\n<\/tr>\n<tr>\n<td>icx<\/td>\n<td>New generation C compiler. (intel\/2023.1 and newer only)<\/td>\n<\/tr>\n<tr>\n<td>icpx<\/td>\n<td>New generation C++ compiler. (intel\/2023.1 and newer only)<\/td>\n<\/tr>\n<tr>\n<td>ifx<\/td>\n<td>New generation Fortran compiler. (intel\/2023.1 and newer only)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Intel compiler modules after 2023.1 will only support the new generation compilers (icx, icpx, ifx) as Intel is retiring the older ones (icc, icpc, ifort). <\/strong><\/p>\n<p>All compilers have manuals available, for example:<\/p>\n<pre><code class=\"code-block\">man ifx\r\nman icpx<\/code><\/pre>\n<p>Intel also has a <a href=\"https:\/\/software.intel.com\/en-us\/articles\/step-by-step-optimizing-with-intel-c-compiler\">document<\/a> that makes recommendations for optimization options.<\/p>\n<h2>General Compiler Optimization Flags<\/h2>\n<p>The Intel compilers optimization flags deliberately mimic many of those used with the GNU family of compilers.\u00a0 The basic optimization flags are summarized below.\u00a0 Using these flags does not result in any incompatibility between CPU architectures. Note that it is not recommended to use the Intel compiler when the program will be run on AMD processors due to lackluster executable performance in that case.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-O<\/nobr><\/td>\n<td>Optimized compile.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O2<\/nobr><\/td>\n<td>More extensive optimization.\u00a0 Recommended by Intel for general use.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-O3<\/nobr><\/td>\n<td>More aggressive than -O2 with longer compile times.\u00a0 Recommended for codes that loops involving intensive floating point calculations.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-Ofast<\/nobr><\/td>\n<td>-O3 plus some extras.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-ipo<\/nobr><\/td>\n<td>Interprocedural optimization, a step that examines function calls between files when the program is linked.\u00a0 This flag must be used to compile and when linking.\u00a0 Compile times are very long with this flag, however depending on the application there may be appreciable performance improvements when combined with the -O* flags.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-mtune=<em>processor<\/em><\/nobr><\/td>\n<td>This flag does additional tuning for specific processor types, however it does <strong>not<\/strong> generate extra SIMD instructions so there are no architecture compatibility issues.\u00a0 The tuning will involve optimizations for processor cache sizes, preferred ordering of instructions, and so on.\u00a0 The useful values for the value <em>processor<\/em> on the SCC are:\u00a0 broadwell,haswell,ivybridge,sandybridge, or cascadelake.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2>Flags to Specify SIMD Instructions<\/h2>\n<p>These flags will produce executables that contain specific SIMD instructions which may effect compatibility with compute nodes on the SCC.<\/p>\n<table class=\"research\">\n<tbody>\n<tr>\n<th>Flag<\/th>\n<th>Description<\/th>\n<\/tr>\n<tr>\n<td><nobr>-xHost<\/nobr><\/td>\n<td>Must be used with at least -O2. Creates an executable that uses SIMD instructions based on the CPU that is compiling the code. Not recommended as compiling on a newer architecture compute node results in a program that cannot run on older architectures.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-fast<\/nobr><\/td>\n<td>A combination of -Ofast, -ipo, -static (for static linking),\u00a0 and -xHost.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-m<em>arch<\/em><\/nobr><\/td>\n<td>Must be used with at least -O2 and pecifies the type of SIMD instructions to be generated.\u00a0 When combined with the <em>-ax <\/em>flag this sets the minimum SIMD instruction set.\u00a0 Also note that when the compiled software runs on an AMD processor the value specified by the <em>-mx <\/em>flag is used even if the processor supports other instruction sets. The values for this flag mimic those from the Gnu compilers: avx, avx2, and a large number of avx512 flags.<\/p>\n<p>There is an alternate form of this flag, <em>-x<\/em>, which uses the options given below with <em>-ax<\/em>. However, code compiled with <em>-x<\/em> will not execute at all on AMD processors so it is not recommended.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-ax<em>arch<\/em><\/nobr><\/td>\n<td>This must be used with at least -O2 and &#8211;<em>march<\/em>.\u00a0 The -m<em>arch<\/em> flag will produce specific SIMD instructions, and additional SIMD instructions can be supported by adding the -ax<em>arch<\/em> flag.\u00a0 Every function that can be compiled with SIMD instructions will have separate copies created for each instruction set. The executable will auto-detect CPU instruction support at runtime which version to run.\u00a0 The compile times can be very long as functions will be compiled multiple times over and the resulting binary will be large. The useful values for <em>arch<\/em> on the SCC are: AVX, CORE-AVX2, and CORE-AVX512. .\u00a0 Several instruction sets can included with this command when comma-separated.<\/p>\n<p>For example: \u00a0\u00a0 icx -c -O3 -mavx -axCORE-AVX2,CORE-AVX512 mycode.cpp<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Default Optimization Behavior<\/h2>\n<p>Most open source programs that compile from source code use the -O2 or -O3 flags.\u00a0 This will result in fast code that can run on any compute node on the SCC.\u00a0 The -fast flag can be problematic (due to its inclusion of the -xHost flag) when run on the login nodes as they are Broadwell architecture CPUs which support AVX2 instructions.\u00a0 Codes compiled with -fast will only be able to execute on Broadwell architecture compute nodes on the SCC.<\/p>\n<h2>Recommendations<\/h2>\n<p>Most codes will be well-optimized with the -O2 or -O3 flags.\u00a0 Programs that involve intensive floating-point calculations inside of loops can additionally be compiled with the -x<em>arch<\/em> flag.\u00a0 For maximum cross-compatibility across the SCC compute nodes and probable highest performance a combination of flags should be used:<\/p>\n<pre><code class=\"code-block\">icc -Ofast -mavx -axCORE-AVX2,CORE-AVX512 -c mycode.cpp<\/code><\/pre>\n<p>If benchmarking and testing of the compiled code does not show any improvement with the -x and -ax flags then they can be removed to improve compilation times.<\/p>\n<p>Note that selecting specific SIMD instructions with the -x<em>arch<\/em> flag alone will restrict compatibility with compute nodes unless the job is submitted with this qsub flag:\u00a0 -l cpu_arch=<em>compatible_arch<\/em>. The <em>compatible_arch<\/em> value is an architecture name that matches the SIMD instructions.\u00a0 In this example a code is compiled with AVX instructions and a Haswell architecture CPU is requested with qsub:<\/p>\n<pre><code class=\"code-block\">icc -Ofast -mavx mycode.cpp -o mycode\r\nqsub -l cpu_arch=haswell -b y mycode<\/code><\/pre>\n<p>If a code is relatively small in scope it can be compiled as part of a queue job.\u00a0 For example, a job that is submitted to run on a Buy-in node equipped with an Ivybridge architecture CPU could be compiled with tunings for that node.\u00a0 As a precaution the source is copied into $TMPDIR:<\/p>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h4 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\"><strong>Example Batch Script to Recompile on a Compute Node<\/strong><\/h4><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<pre><code class=\"code-block\">#!\/bin\/bash -l\r\n#$ -l cpu_arch=ivybridge\r\nmodule load intel\/2016\r\n\r\n# Copy the source to $TMPDIR to avoid interaction\r\n# with other jobs running\r\ncp -R \/projectnb\/myproject\/mysource $TMPDIR\r\n\r\ncd $TMPDIR\/mysource\r\n\r\nicc -Ofast -mtune=ivybridge -xHost -c file1.c\r\nicc -Ofast -mtune=ivybridge -xHost -c file2.c\r\nicc -o myexe file1.o file2.o -lm\r\n\r\nmyexe arg1 arg2 ....<\/code><\/pre>\n<p><\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Intel produces compilers that produce highly optimized code for their CPUs. As with all compilers, programs compiled with optimization should have their output double-checked for accuracy. If the numeric output is incorrect or lacks the desired accuracy less-aggressive compile options should be tried. The following table summarizes some relevant commands on the SCC: Command Description&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":78157,"menu_order":3,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107600"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=107600"}],"version-history":[{"count":16,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107600\/revisions"}],"predecessor-version":[{"id":146593,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/107600\/revisions\/146593"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78157"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=107600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}