{"id":78160,"date":"2014-05-08T14:55:34","date_gmt":"2014-05-08T18:55:34","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=78160"},"modified":"2024-06-12T16:39:24","modified_gmt":"2024-06-12T20:39:24","slug":"multiprocessor","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/","title":{"rendered":"Multiprocessor Programming"},"content":{"rendered":"<h4>Topics<\/h4>\n<ul>\n<li> <a href=\"#CPU\">Multiprocessing with CPUs<\/a><\/li>\n<ul>\n<li> <a href=\"#OpenMP\">Compile a program parallelized with OpenMP directives<\/a><\/li>\n<li> <a href=\"#MPI\">Compile a program parallelized with MPI<\/a><\/li>\n<li> <a href=\"#32-BIT\">Compiling 32-bit executables<\/a><\/li>\n<\/ul>\n<li> <a href=\"#GPU\">Multiprocessing with GPUs<\/a><\/li>\n<\/ul>\n<h2><a id=\"CPU\" name=\"CPU\"><\/a>Multiprocessing with CPUs<\/h2>\n<p>The use of multiple CPUs to achieve parallel speedup has been practiced for decades. Mature enabling paradigms and associated software are well understood and broadly adopted. Among these the most commonly used paradigms (which the SCC supports) are <a href=\"http:\/\/www.mcs.anl.gov\/research\/projects\/mpi\/\">Message Passing Interface (MPI)<\/a> for distributed memory systems and <a href=\"http:\/\/openmp.org\/wp\/\">OpenMP<\/a> for shared memory, thread-based, computer systems. These paradigms support common languages &#8212; C, C++, and Fortran &#8212; for which you can build executable with the SCC provided GNU and PGI families of compilers. Please visit the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/\">Compilers<\/a> page for more details, such as how to optimize the performance of your code.<\/p>\n<h3><a id=\"OpenMP\" name=\"OpenMP\"><\/a>Compile a program parallelized with OpenMP directives<\/h3>\n<ul>\n<li><b>For GNU compilers<\/b>, use the <b><code>-fopenmp<\/code><\/b> compiler flag to activate the OpenMP paradigm:\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">gfortran<\/span> <span class=\"placeholder\">myprogram.f<\/span>              <em><-- OpenMP not turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">gfortran -fopenmp<\/span> <span class=\"placeholder\">myprogram.f<\/span>     <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">gfortran -fopenmp<\/span> <span class=\"placeholder\">myprogram.f90<\/span>   <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">gcc -fopenmp<\/span> <span class=\"placeholder\">myprogram.c<\/span>          <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">g++ -fopenmp<\/span> <span class=\"placeholder\">myprogram.C<\/span>          <em><-- OpenMP turned on<\/em><\/code><\/pre>\n<p><em>Default executable name is <code>a.out<\/code>. Use <code>-o my-executable<\/code> to assign a specific name. Whenever possible, use <nobr>-O3<\/nobr> for highest level of code optimization. See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/\">Compilers<\/a> for more options.<\/em>\n<\/li>\n<li><b>For PGI compilers<\/b>, use the <b><code>-mp<\/code><\/b> compiler flag to activate the OpenMP paradigm:\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">pgfortran<\/span> <span class=\"placeholder\">myprogram.f<\/span>             <em><-- OpenMP not turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">pgfortran -mp<\/span> <span class=\"placeholder\">myprogram.f<\/span>         <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">pgfortran -mp<\/span> <span class=\"placeholder\">myprogram.f90<\/span>       <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">pgcc -mp<\/span> <span class=\"placeholder\">myprogram.c<\/span>              <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">pgc++ -mp<\/span> <span class=\"placeholder\">myprogram.C<\/span>             <em><-- OpenMP turned on<\/em><\/code><\/pre>\n<p><em>Default executable name is <code>a.out<\/code>. Use <code>-o my-executable<\/code> to assign a specific name. Whenever possible, use <nobr>-O3<\/nobr> for highest level of code optimization. See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/\">Compilers<\/a> for more options.<\/em><\/p>\n<li><b>For Intel compilers<\/b>, use the <b><code>-openmp<\/code><\/b> compiler flag to activate the OpenMP paradigm:\n<pre class=\"code-block\"><code>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">module load<\/span> <span class=\"placeholder\">intel\/2016<\/span>          <em><-- Load the Intel compiler module<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">ifort<\/span> <span class=\"placeholder\">myprogram.f<\/span>               <em><-- OpenMP not turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">ifort -openmp<\/span> <span class=\"placeholder\">myprogram.f<\/span>       <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">ifort -openmp<\/span> <span class=\"placeholder\">myprogram.f90<\/span>     <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">icc -openmp<\/span> <span class=\"placeholder\">myprogram.c<\/span>         <em><-- OpenMP turned on<\/em>\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">icpc -openmp<\/span> <span class=\"placeholder\">myprogram.C<\/span>        <em><-- OpenMP turned on<\/em> <\/code><\/pre>\n<p><em>Default executable name is <code>a.out<\/code>. Use <code>-o my-executable<\/code> to assign a specific name. Whenever possible, use <nobr>-fast<\/nobr> for highest level of code optimization. See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/compilers\/\">Compilers<\/a> for more options.<\/em><\/p>\n<h4><a name=\"OpenMP\"><\/a>Running OpenMP jobs<\/h4>\n<p>For program development and debugging purposes, short OpenMP jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. Jobs exceeding these limit will be terminated automatically by the system. All other jobs (> 10 minutes and\/or > 4 threads) should run in batch. (<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/\">See Running Jobs page<\/a>)<\/p>\n<ol>\n<li>Run executable <code><em>a.out<\/em><\/code> on a login node:\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">setenv<\/span> OMP_NUM_THREADS <span class=\"placeholder\">2<\/span>      &lt;-- set thread count (for <strong>tcsh<\/strong> users)\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">export<\/span> OMP_NUM_THREADS=<span class=\"placeholder\">2<\/span>      &lt;-- set thread count (for <strong>bash<\/strong> users)\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"placeholder\">.\/a.out<\/span><\/code><\/pre>\n<\/li>\n<li> Run executable <code><em>a.out<\/em><\/code> on a compute node in batch:\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">setenv<\/span> OMP_NUM_THREADS <span class=\"placeholder\">2<\/span>      &lt;-- set thread count (for <strong>tcsh<\/strong> users)\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">export<\/span> OMP_NUM_THREADS=<span class=\"placeholder\">2<\/span>      &lt;-- set thread count (for <strong>bash<\/strong> users)\r\n<span class=\"prompt\">scc1$<\/span> <span class=\"command\">qsub<\/span> -pe omp 2 -V -b y <span class=\"placeholder\">.\/a.out<\/span><\/code><\/pre>\n<\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<h3><a id=\"MPI\" name=\"MPI\"><\/a>Compile a program parallelized with MPI<\/h3>\n<p>Compiling an MPI-enabled program requires the directory path from which the compiler can find the necessary header file (<i>e.g.,<\/i> <code>mpi.h<\/code>) and MPI library. For ease of compilation, this additional information are built into wrapper scripts <code><span class=\"command\"> mpif77, mpif90, mpicc,<\/span><\/code> and <code><span class=\"command\">mpicxx<\/span><\/code> for the respective languages they serve: Fortran 77, Fortran 90\/95\/03, C, and C++. By default, these wrappers are linked to the GNU compilers. For example, <code>mpicc<\/code> is, by default, linked to the <code>gcc<\/code> compiler while <code>mpif90<\/code> points to the <code>gfortran<\/code> compiler. Switching to the PGI compilers can be accomplished by specifying the selection through the environment variable <code>MPI_COMPILER<\/code>. Note that an undefined (unset) MPI_COMPILER points the wrappers to their respective GNU compilers. To compile an MPI program with Intel compiler, use module commands to load the Intel compiler and a corresponding MPI implementation.<\/p>\n<ul>\n<li>\n<h4>To make the MPI wrappers compile with <b><code>GNU<\/code><\/b> compilers:<\/h4>\n<p>Step 1. The MPI wrappers will use GNU compilers if either <code>MPI_COMPILER<\/code> is unset or set as <em>gnu<\/em>:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">setenv<\/span> MPI_COMPILER <span class=\"placeholder\">gnu<\/span>       &lt;-- select gnu compilers (for <strong>tcsh<\/strong> users)\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">export<\/span> MPI_COMPILER=<span class=\"placeholder\">gnu<\/span>       &lt;-- select gnu compilers (for <strong>bash<\/strong> users)<\/code><\/pre>\n<p>Step 2. Compile with MPI wrappers:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpif77<\/span> <span class=\"placeholder\">myprogram.f<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpif90<\/span> <span class=\"placeholder\">myprogram.f90<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc<\/span> <span class=\"placeholder\">myprogram.c<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicxx<\/span> <span class=\"placeholder\">myprogram.C<\/span><\/code><\/pre>\n<\/li>\n<li>\n<h4>To make MPI wrappers compile with <b><code>PGI<\/code><\/b> compilers:<\/h4>\n<p>Step 1. Setting <code>MPI_COMPILER<\/code> to <code>pgi<\/code> makes the wrappers compile with PGI compilers:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">setenv MPI_COMPILER<\/span> <span class=\"placeholder\">pgi<\/span>    &lt;-- select PGI compilers (for <strong>tcsh<\/strong> users)\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">export MPI_COMPILER=<\/span><span class=\"placeholder\">pgi<\/span>    &lt;-- select PGI compilers (for <strong>bash<\/strong> users)<\/code><\/pre>\n<p>Step 2. Compile with MPI wrappers:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpif77<\/span> <span class=\"placeholder\">myprogram.f<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpif90<\/span> <span class=\"placeholder\">myprogram.f90<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc<\/span> <span class=\"placeholder\">myprogram.c<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicxx<\/span> <span class=\"placeholder\">myprogram.C<\/span><\/code><\/pre>\n<li>\n<h4>To compile an MPI program with <b><code>Intel<\/code><\/b> compilers:<\/h4>\n<p>Step 1. Use the module command to load the Intel compiler and a corresponding MPI implementation:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">module load<\/span> <span class=\"placeholder\">intel\/2016<\/span>    &lt;-- load Intel compiler\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">module load<\/span> <span class=\"placeholder\">openmpi\/1.10.1_intel2016<\/span>  &lt;-- load OpenMPI configured with Intel compiler\r\n<\/code><\/pre>\n<p>Step 2. Compile with MPI wrappers:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpifort<\/span> <span class=\"placeholder\">myprogram.f<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc<\/span> <span class=\"placeholder\">myprogram.c<\/span>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicxx<\/span> <span class=\"placeholder\">myprogram.C<\/span><\/code><\/pre>\n<p>To switch back to the previous compiler (GNU or PGI), use the module command to remove the Intel compiler and the MPI implementation that have been loaded: <\/p>\n<pre id=\"indent\" class=\"code-block\"><code>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">module rm<\/span> <span class=\"placeholder\">intel\/2016<\/span>    &lt;-- remove the Intel compiler\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">module rm<\/span> <span class=\"placeholder\">openmpi\/1.10.1_intel2016<\/span>    &lt;-- remove OpenMPI configured with Intel compiler\r\n<\/code><\/pre>\n<li>\n<h4>To check which compiler is currently in use:<\/h4>\n<p>Provide the option <b><code>-show<\/code><\/b> to the MPI wrappers:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc<\/span> <span class=\"placeholder\">-show<\/span>     &lt;-- show the real command hidden in the wrapper\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpif90<\/span> <span class=\"placeholder\">-show<\/span>    &lt;-- show the real command hidden in the wrapper\r\n<\/code><\/pre>\n<p>or query the path to the wrappers using the command <code>which<\/code>:<\/p>\n<pre id=\"indent\" class=\"code-block\"><code>\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">which<\/span> <span class=\"placeholder\">mpicc<\/span>     &lt;-- show the path to mpicc\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">which<\/span> <span class=\"placeholder\">mpif90<\/span>    &lt;-- show the path to mpif90\r\n<\/code><\/pre>\n<h4><a id=\"MPI\" name=\"MPI\"><\/a>Running MPI jobs<\/h4>\n<p>For program development and debugging purposes, short MPI jobs may run on the login nodes. These jobs are limited to 4 processors and 10 minutes of CPU time per processor. All other MPI jobs (> 10 minutes and\/or > 4 threads) should run in batch. (<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/\">More in Running Jobs page<\/a>)<\/p>\n<ol>\n<li>Run MPI executable <code><em>a.out<\/em><\/code> on a login node:\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">mpirun<\/span> -np 4 .\/a.out <\/code><\/pre>\n<\/li>\n<li> Run executable <code><em>a.out<\/em><\/code> on a compute node in batch:\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">qsub<\/span> -pe mpi_4_tasks_per_node 4 -b y \"mpirun -np 4 .\/a.out\"<\/code><\/pre>\n<\/li>\n<\/ol>\n<\/li>\n<h4> Notes<\/h4>\n<ul>\n<li> If you always use the GNU family of compilers, none of the <code>MPI_COMPILER<\/code> settings described before is needed because the MPI wrappers point to the GNU compilers by default.<\/li>\n<li> On the other hand, if you always use the PGI compilers for MPI compilation, you can permanently set <code>MPI_COMPILER<\/code> to PGI in your <code>.cshrc<\/code> or <code>.bashrc<\/code> shell script.<\/li>\n<li> <span style=\"background: #FFFF99;\">If you use the PGI compiler, it is important to read the page on <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/rcs-archive\/pgi-compilers\/\">PGI compilers&#8217; impact on job performance and portability<\/a>.<\/span><\/li>\n<li> <span style=\"background: #FFFF99;\">If you want to use the Intel compiler, use module commands to load it and its corresponding MPI impementation.<span><\/li>\n<li>For MPI options, please consult the specific wrapper manpage. For example,\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$<\/span> <span class=\"command\">mpiman<\/span> <span class=\"placeholder\">mpicc<\/span><\/code><\/pre>\n<\/li>\n<\/ul>\n<h3><a name=\"32-BIT\"><\/a>Compiling 32-bit executables<\/h3>\n<p>By default, all compilers on the SCC Cluster produce 64-bit executables. To build 32-bit executables, add the compiler flag <b><code>-m32<\/code><\/b>:<\/p>\n<ul>\n<li><strong>Examples for building 32-bit GNU executables:<\/strong>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">gcc -m32 -fopenmp<\/span> <span class=\"placeholder\">myexample.c<\/span>       <-- for OpenMP code\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc -m32<\/span> <span class=\"placeholder\">myexample.c<\/span>              <-- for MPI code<\/code><\/pre>\n<\/li>\n<li><strong>Examples for building 32-bit PGI executables:<\/strong>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">pgcc -m32 -mp<\/span> <span class=\"placeholder\">myexample.c<\/span>     <-- for OpenMP code\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc -m32<\/span> <span class=\"placeholder\">myexample.c<\/span>        <-- for MPI code<\/code><\/pre>\n<\/li>\n<li><strong>Examples for building 32-bit Intel executables:<\/strong>\n<pre id=\"indent\" class=\"code-block\"><code><span class=\"prompt\">scc1$ <\/span><span class=\"command\">icc -m32 -openmp<\/span> <span class=\"placeholder\">myexample.c<\/span>     <-- for OpenMP code\r\n<span class=\"prompt\">scc1$ <\/span><span class=\"command\">mpicc -m32<\/span> <span class=\"placeholder\">myexample.c<\/span>           <-- for MPI code <\/code><\/pre>\n<\/li>\n<\/ul>\n<\/ul>\n<\/ul>\n<h2><a id=\"GPU\" name=\"GPU\"><\/a>Multiprocessing with GPUs<\/h2>\n<p>Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs. (See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/\">GPU Computing<\/a> for detail.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Topics Multiprocessing with CPUs Compile a program parallelized with OpenMP directives Compile a program parallelized with MPI Compiling 32-bit executables Multiprocessing with GPUs Multiprocessing with CPUs The use of multiple CPUs to achieve parallel speedup has been practiced for decades. Mature enabling paradigms and associated software are well understood and broadly adopted. Among these the&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":64953,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78160"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=78160"}],"version-history":[{"count":50,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78160\/revisions"}],"predecessor-version":[{"id":152691,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/78160\/revisions\/152691"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/64953"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=78160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}