{"id":62824,"date":"2013-03-04T09:53:11","date_gmt":"2013-03-04T14:53:11","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=62824"},"modified":"2022-06-03T14:01:44","modified_gmt":"2022-06-03T18:01:44","slug":"cuda-c","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/gpu-computing\/cuda-c\/","title":{"rendered":"Programming for GPUs using CUDA in C\/C++"},"content":{"rendered":"<p>CUDA is a parallel programming model and software environment developed by NVIDIA. It provides programmers with a set of instructions that enable GPU acceleration for data-parallel computations. The computing performance of many applications can be dramatically increased by using CUDA directly or by linking to GPU-accelerated libraries.<\/p>\n<h2>Setting up your environment<\/h2>\n<p>To link and run applications using CUDA you will need to make some changes to your path and environment. Load the appropriate version of cuda:<\/p>\n<pre class=\"code-block\"><code><span class=\"command\">module load<\/span> cuda\/11.3<\/code><\/pre>\n<p>The list of available versions of cuda can be obtained by executing the <code><span class=\"command\">module avail cuda<\/span><\/code> command.<\/p>\n<h2>Compiling a simple CUDA C\/C++ program<\/h2>\n<p>Consider the following simple CUDA program <a href=\"http:\/\/scv.bu.edu\/documents\/gpu_info.cu\">gpu_info.cu<\/a> that prints out information about GPUs installed on the system.<\/p>\n<p>Download the source code of <a href=\"http:\/\/scv.bu.edu\/documents\/gpu_info.cu\">gpu_info.cu<\/a> and transfer it to the directory where you are working on the SCC.<br \/>\nThen execute the following command to compile <code>gpu_info.cu<\/code>. Depending on the version of the <code>cuda<\/code> module that you are using you may need to load a newer version of the gcc compiler, such as <code>gcc\/8.3.0<\/code>. The CUDA <code>nvcc<\/code> compiler will print an error if a newer compiler is required:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">nvcc -o<\/span> <span class=\"placeholder\">gpu_info gpu_info.cu<\/span><\/code><\/pre>\n<h2>Running a CUDA program interactively on a GPU-enabled node<\/h2>\n<p>To execute a CUDA code, you have to login via interactive batch to a GPU-enabled node on the SCC. To request an interactive session with access to 1 GPU:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qrsh<\/span><span class=\"command\"> -l gpus=1<\/span><\/code><\/pre>\n<p>To run a CUDA program interactively, you then type in the name of the program at the command prompt:<\/p>\n<pre class=\"code-block\"><code><span class=\"placeholder\"><span class=\"prompt\">gpunode%<\/span><\/span> <span class=\"placeholder\">gpu_info<\/span><\/code><\/pre>\n<h2>Submit a CUDA program Batch Job<\/h2>\n<p>The following line shows how to submit the <code>gpu_info<\/code> program to run in batch mode on a single CPU with access to a single GPU:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qsub -l gpus=1<\/span> -b y <span class=\"placeholder\">gpu_info<\/span><\/code><\/pre>\n<p>where the <code><span class=\"command\">\u2013l gpus=<\/span><span class=\"placeholder\">#<\/span><\/code> option indicates the number of GPUs. To learn about all options that could be used for submitting a job, please visit the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/rcs-archive\/system-usage-old\/running-jobs\/\">running jobs<\/a> and <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/\">GPU computing<\/a> pages.<\/p>\n<h2>CUDA Libraries<\/h2>\n<p>Several scientific libraries that make use of CUDA are available:<\/p>\n<ul>\n<li><a href=\"https:\/\/developer.nvidia.com\/cublas\">cuBLAS<\/a> \u2013 Linear Algebra Subroutines. A GPU accelerated version of the complete standard BLAS library.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuFFT\">cuFFT<\/a> \u2013 Fast Fourier Transform library. Provides a simple interface for computing FFTs up to 10x faster.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuRAND\">cuRAND<\/a> \u2013 Random Number Generation library. Delivers high performance random number generation.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/cuSPARSE\">cuSparse<\/a> \u2013 Sparse Matrix library. Provides a collection of basic linear algebra subroutines used for sparse matrices.<\/li>\n<li><a href=\"https:\/\/developer.nvidia.com\/npp\">NPP<\/a> \u2013 Performance Primitives library. A collection of image and signal processing primitives.<\/li>\n<\/ul>\n<h2>Architecture specific options<\/h2>\n<p>Architecture specific features can be enabled using the <code>\u2013arch sm_<span class=\"placeholder\">##<\/span><\/code> flag during compilation. The &#8220;<code>sm<\/code>&#8221; stands for &#8220;streaming multiprocessor&#8221; and the number following <b>sm_<\/b> indicates the features supported by the architecture. For example, for a CUDA program running on the SCC you can add the <code>\u2013arch sm_60<\/code> flag to allow for functionality available on GPUs that have Compute Capability 6.0 (Pascal architecture). See the <a href=\"http:\/\/docs.nvidia.com\/cuda\/cuda-compiler-driver-nvcc\/index.html\">CUDA Toolkit documentation<\/a> for more information on this.<\/p>\n<h2>Additional CUDA training resources<\/h2>\n<p>NVIDIA provides resources for learning CUDA programming at<br \/>\n<a href=\"https:\/\/developer.nvidia.com\/cuda-training\">https:\/\/developer.nvidia.com\/cuda-training<\/a>.<\/p>\n<h2>CUDA Consulting<\/h2>\n<p>RCS staff scientific programmers can help you with your CUDA code tuning. For assistance, please send an email to <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>CUDA is a parallel programming model and software environment developed by NVIDIA. It provides programmers with a set of instructions that enable GPU acceleration for data-parallel computations. The computing performance of many applications can be dramatically increased by using CUDA directly or by linking to GPU-accelerated libraries. Setting up your environment To link and run&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":62821,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62824"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=62824"}],"version-history":[{"count":37,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62824\/revisions"}],"predecessor-version":[{"id":140618,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62824\/revisions\/140618"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62821"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=62824"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}