{"id":64015,"date":"2013-03-14T17:04:00","date_gmt":"2013-03-14T21:04:00","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=64015"},"modified":"2023-07-21T13:58:16","modified_gmt":"2023-07-21T17:58:16","slug":"cuda-fortran","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/gpu-computing\/cuda-fortran\/","title":{"rendered":"Programming for GPUs using CUDA in Fortran"},"content":{"rendered":"<p>CUDA is a parallel programming model and software environment developed by NVIDIA. It provides programmers with a set of instructions that enable GPU acceleration for data-parallel computations. The computing performance of many applications can be dramatically increased by using CUDA directly or by linking to GPU-accelerated libraries.<\/p>\n<h2>Setting up your environment<\/h2>\n<p>Your environment should by default be all set to take advantage of CUDA Fortran using the Portland Group Fortran compiler <code><span class=\"command\">pgfortran<\/span><\/code>. To use <code><span class=\"command\">pgfortran<\/span><\/code> you will need to load the <span>nvidia-hpc\/2023-23.5 module using the command<\/span><\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">module load <span>nvidia-hpc\/2023-23.5<\/span><\/span><\/code><\/pre>\n<h2>Compiling and running interactively a simple CUDA program using Portland Group CUDA Fortran<\/h2>\n<p>Run <code><span class=\"command\">man pgfortran<\/span><\/code> for usage instructions.<\/p>\n<p>There are two CUDA Fortran free-format source file suffixes; .cuf and .CUF. The .CUF files require preprocessing.<\/p>\n<p>As a test, you can download the CUDA Fortran matrix multiply example <a href=\"http:\/\/www.pgroup.com\/lit\/samples\/matmul.CUF\">matmul.cuf<\/a> and transfer it to the directory where you are working on the SCC.<\/p>\n<p>You should do your compiling of CUDA Fortran programs on one of our nodes with GPUs, <b>not on the login nodes<\/b>. You can get access to a GPU-equipped node by running the command below.<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qrsh<\/span> <span class=\"command\">-l gpus=1<\/span><\/code><\/pre>\n<p>After that, to compile this CUDA Fortran program, run:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc-je2%<\/span> <span class=\"command\">pgfortran<\/span> -fast -o <span class=\"placeholder\">matmul matmul.cuf<\/span><\/code><\/pre>\n<p>To run a CUDA program interactively, you then type in the name of the program at the command prompt:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">gpunode%<\/span> <span class=\"placeholder\">matmul<\/span><\/code><\/pre>\n<h2>Submit a CUDA program Batch Job<\/h2>\n<p>The following line shows how to submit the <code>matmul<\/code> executable to run in batch mode on a single CPU with access to a single GPU:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qsub -l gpus=1<\/span> -b y <span class=\"placeholder\">matmul<\/span><\/code><\/pre>\n<p>where the <code><span class=\"command\">\u2013l gpus=<\/span><span class=\"placeholder\">#<\/span><\/code> option indicates the number of GPUs. To learn about all options that could be used for submitting a job, please visit the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/rcs-archive\/system-usage-old\/running-jobs\/\">running jobs page<\/a>.<\/p>\n<h2>CUDA Libraries<\/h2>\n<p>Several scientific libraries that make use of CUDA are available:<\/p>\n<ul>\n<li><a href=\"https:\/\/developer.nvidia.com\/cublas\">cuBLAS<\/a> \u2013 Linear Algebra Subroutines. A GPU accelerated version of the complete standard BLAS library.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuFFT\">cuFFT<\/a> \u2013 Fast Fourier Transform library. Provides a simple interface for computing FFTs up to 10x faster.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuRAND\">cuRAND<\/a> \u2013 Random Number Generation library. Delivers high performance random number generation.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuSPARSE\">cuSparse<\/a> \u2013 Sparse Matrix library. Provides a collection of basic linear algebra subroutines used for sparse matrices.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/npp\">NPP<\/a> \u2013 Performance Primitives library. A collection of image and signal processing primitives.<\/li>\n<\/ul>\n<h2>Architecture specific options<\/h2>\n<p>There are currently 3 sets of nodes that incorporate GPUs and are available to the SCF users. All three are part of the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/computing-resources\/scc\/\">Shared Computing Cluster (SCC)<\/a>.<\/p>\n<p>For more details on nodes available on the SCC, please visit the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/computing-resources\/tech-summary\/#SCC\">Technical Summary<\/a> page.<\/p>\n<h2>Additional CUDA training resources<\/h2>\n<p><a href=\"http:\/\/www.pgroup.com\/doc\/pgicudaforug.pdf\">PGI CUDA Fortran Programming Guide and Reference<\/a><\/p>\n<p>NVIDIA provides a number of resources to learn CUDA programming at<br \/>\n<a href=\"https:\/\/developer.nvidia.com\/cuda-training\">https:\/\/developer.nvidia.com\/cuda-training<\/a>.<\/p>\n<h2>CUDA Consulting<\/h2>\n<p>SCV staff scientific programmers can help you with your CUDA code tuning. For assistance, please send email to <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>CUDA is a parallel programming model and software environment developed by NVIDIA. It provides programmers with a set of instructions that enable GPU acceleration for data-parallel computations. The computing performance of many applications can be dramatically increased by using CUDA directly or by linking to GPU-accelerated libraries. Setting up your environment Your environment should by&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":62821,"menu_order":3,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/64015"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=64015"}],"version-history":[{"count":21,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/64015\/revisions"}],"predecessor-version":[{"id":146761,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/64015\/revisions\/146761"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62821"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=64015"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}