{"id":62821,"date":"2013-03-04T09:45:39","date_gmt":"2013-03-04T14:45:39","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=62821"},"modified":"2025-11-24T10:00:15","modified_gmt":"2025-11-24T15:00:15","slug":"gpu-computing","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/gpu-computing\/","title":{"rendered":"GPU Computing"},"content":{"rendered":"<p>Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs.<\/p>\n<ul>\n<li><a href=\"#GPURESOURCES\">GPU Resources<\/a><\/li>\n<li><a href=\"#RUNNINGONGPUS\">Running on the GPU Nodes<\/a><\/li>\n<li><a href=\"#GPUSOFTWARE\">Software with GPU Acceleration<\/a><\/li>\n<li><a href=\"#CPUVSGPU\">CPU vs. GPU<\/a><\/li>\n<li><a href=\"#CUDAVISIBLE\">Using Only Your Assigned GPUs &#8211; CUDA_VISIBLE_DEVICES<\/a><\/li>\n<li><a href=\"#GPUCONSULTING\">GPU Consulting<\/a><\/li>\n<\/ul>\n<h2>GPU Resources<a name=\"GPURESOURCES\" href=\"#GPURESOURCES\">&#x1F517;<\/a><\/h2>\n<p>The Shared Computing Cluster includes nodes with NVIDIA GPU cards, some of which are configured for computational workloads and some for interactive VirtualGL sessions.\u00a0 For more details on nodes available on the SCC, please visit the <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/computing-resources\/tech-summary\/#SCC\">Technical Summary<\/a> page.<\/p>\n<h2>Running on the GPU Nodes<a name=\"RUNNINGONGPUS\" href=\"#RUNNINGONGPUS\">&#x1F517;<\/a><\/h2>\n<p>Access to GPU enabled nodes is via the batch system (qsub\/qrsh). Direct login to these nodes is not permitted. The GPU nodes support all of the standard <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/#BATCHOPTIONS\">batch options<\/a> in addition to the following GPU specific options. ( <code><span class=\"command\">-l gpus=N<\/span><\/code> is a required option).<\/p>\n<p><a href=\"#QGPUS\" name=\"QGPUS\"><\/a>The utility <code><span class=\"command\">qgpus<\/span><\/code> can be used to see a list of all installed GPU models. Run the command to see the latest numbers for each GPU type installed on the SCC:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc % <\/span><span class=\"command\">qgpus<\/span>\r\n<span class=\"output\">gpu_type  total  in_use  available\r\n--------  -----  ------  ---------\r\nA100          5      0      5\r\nA100-80G     24     17      7\r\nA40          68     15     48\r\nA6000        77     27     50\r\nH200         16     16      0\r\n...etc...<\/code><\/pre>\n<p>Run the command with the &#8220;<code><span class=\"command\">-v<\/span><\/code>&#8221; flag to see the GPU compute capability, GPU memory, the number of CPU cores installed on the GPU nodes, and the queue assignment. This can also be combined with the flag to limit the results to the shared queues. An additional flag, &#8220;<code><span class=\"command\">-q queuename<\/span><\/code>&#8220;, can be specified to view the GPU configuration for a particular queue.  <\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc % <\/span><span class=\"command\">qgpus -v<\/span>\r\n<span class=\"output\">host      gpu_type  gpu_c  gpu_mem  cpu_   cpu_    gpu_   gpu_    gpu_   queue_list\r\n                                    total  in_use  total  in_use  avail\r\n--------  --------  -----  -------  -----  ------  -----  ------  -----  ------------------------------\r\nscc-212   A100      8.0    80G      32     16      4      3       1      a100\r\nscc-211   A40       8.6    48G      32     18      4      4       0      a40\r\nscc-a03   H200      9.0    144G     32     17      4      4       0      h200\r\nscc-a04   H200      9.0    144G     32     13      4      4       0      h200\r\n...etc...<\/code><\/pre>\n<p>The &#8220;<code><span class=\"command\">-s<\/span><\/code>&#8221; flag can be add to show the GPUs available in the shared compute node queues. When using these model names with the <code><span class=\"command\">gpu_type<\/span><\/code> qsub option (as shown below) don&#8217;t use the the suffix indicating the amount of memory. For example, use A100, not A100-80G. The <code><span class=\"command\">gpu_memory<\/span><\/code> qsub option is used to request specific amount of GPU RAM.<\/p>\n<table>\n<tbody>\n<tr>\n<th style=\"text-align: center;\">GPU Batch Option<\/th>\n<th style=\"text-align: center;\">Description<\/th>\n<\/tr>\n<tr>\n<td>-l gpus=<i>G<\/i><\/td>\n<td>G is the number of GPUs.<\/td>\n<\/tr>\n<tr>\n<td><nobr>-l gpu_type<\/nobr>=<i>GPU_MODEL<\/i><\/td>\n<td>Check the output of the <code>qgpus<\/code> tool for current GPU types.<\/td>\n<\/tr>\n<tr>\n<td>-l gpu_memory=<i>#<\/i>G<\/td>\n<td>#G represents the minimum amount of memory required per GPU.<\/td>\n<\/tr>\n<tr>\n<td>-l gpu_c=<i>#CC<\/i><\/td>\n<td>The GPU compute capability is NVIDIA&#8217;s jargon for the GPU architecture generation.\u00a0 NVIDIA maintains a <a href=\"https:\/\/developer.nvidia.com\/cuda-gpus#compute\">list of GPU models and their compute capability<\/a>.\u00a0RCS recommends using the compute capability and not the gpu_type where software requires it to provide the widest range of GPUs that can run a job.\u00a0 Using &#8220;gpu_type=V100&#8221; only allows a job to run on a V100 GPU, but specifying &#8220;gpu_c=7.0&#8221; allows the job to run on a V100 or newer GPU.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Below are some examples of requesting GPU resources.<\/p>\n<h4><a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/#INTERACTIVE_BATCH\">Interactive Batch<\/a><\/h4>\n<p>To request an interactive session with access to 1 GPU (any type) for 12 hours:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qrsh<\/span> <span class=\"command\">-l gpus=1<\/span><\/code><\/pre>\n<p>To request an interactive session with access to 1 GPU with compute capability of at least 6.0 (which includes the all GPUs except the K40m) and 4 CPU cores:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qrsh<\/span> <span class=\"command\">-l gpus=1 -l gpu_c=6.0 -pe omp 4<\/span><\/code><\/pre>\n<h4><a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/#SERIALBATCH\">Non-interactive Batch Job<\/a><\/h4>\n<p>To submit a batch job with access to 1 GPU (compute capability of at least 7.0) and 8 CPU cores:<\/p>\n<pre class=\"code-block\"><code><span class=\"prompt\">scc1%<\/span> <span class=\"command\">qsub -l gpus=1 -l gpu_c=7.0 -pe omp 8<\/span> <span class=\"placeholder\">your_batch_script<\/span><\/code><\/pre>\n<p>See an <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/batch-script-examples\/#GPU\">example of a script<\/a> to submit a batch job.<\/p>\n<h2>Software with GPU Acceleration<a name=\"GPUSOFTWARE\" href=\"#GPUSOFTWARE\">&#x1F517;<\/a><\/h2>\n<p>As GPU computing remains a fairly new paradigm, it is not supported yet by all programming languages and is particularly limited in application support. We are striving to provide you with the most up to date information but this will be an evolving process. We are dividing languages and packages into three categories listed below.<\/p>\n<h4>Languages and software packages we have successfully tested for GPU support:<\/h4>\n<p><a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/cuda-c\/\">CUDA C\/C++<\/a><br \/>\n<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/cuda-fortran\/\">CUDA FORTRAN<\/a><br \/>\n<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/openacc-c\/\">OpenACC C\/C++<\/a><br \/>\n<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/openacc-fortran\/\">OpenACC Fortran<\/a><br \/>\n<a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/rcs-archive\/rcs-software-packages\/matlab\/matlab-gpus\/\">MATLAB (Parallel Computing Toolbox)<\/a><br \/>\nR (various packages)<br \/>\nJava (requires to load module jcuda)<\/p>\n<h2>CPU vs. GPU<a name=\"CPUVSGPU\" href=\"#CPUVSGPU\">&#x1F517;<\/a><\/h2>\n<p>Look below to see if your application seems suitable for converting to use GPUs.<\/p>\n<p>CPUs are great for task parallelism:<\/p>\n<ul>\n<li>High performance per single thread execution<\/li>\n<li>Fast caches used for data reuse<\/li>\n<li>Complex control logic<\/li>\n<\/ul>\n<p>GPUs are superb for data parallelism:<\/p>\n<ul>\n<li>High throughput on parallel calculations<\/li>\n<li>Arithmetic intensity: Lots of processor cores to perform simple math calculations<\/li>\n<li>Fast access to local and shared memory<\/li>\n<\/ul>\n<p>Ideal applications for general programming on GPUs:<\/p>\n<ul>\n<li>Large data sets with minimal dependencies between data elements<\/li>\n<li>High parallelism in computation<\/li>\n<li>High number of arithmetic operations<\/li>\n<\/ul>\n<p>Physical modeling, data analysis, computational engineering, matrix algebra are just a few examples of applications that might greatly benefit from GPU computations.<\/p>\n<h2>Using Only Your Assigned GPUs &#8211; CUDA_VISIBLE_DEVICES<a name=\"CUDAVISIBLE\" href=\"#CUDAVISIBLE\">&#x1F517;<\/a><\/h2>\n<p>Please only use the GPUs assigned to <b>you<\/b>. These are indicated by the environmental variable: <code>CUDA_VISIBLE_DEVICES<\/code><\/p>\n<p>As many of the SCC compute nodes have multiple GPUs, each job must only run on the GPUs assigned to it by the batch system to avoid interference with other jobs. To ensure that, the batch system sets <code>CUDA_VISIBLE_DEVICES<\/code> to a comma-separated list of integers representing the GPUs assigned to the job. The CUDA runtime library consults this variable when it does device allocation. Therefore, unless the app does its own device allocation, it will automatically comply with this policy.<\/p>\n<p><strong>DO NOT<\/strong> manually set this variable to access other GPUs on the same node. For example, many Python codes written on developers&#8217; own local computers often come with lines of code like these that should be <strong>fixed by the user<\/strong> before running on the SCC:<\/p>\n<pre class=\"code-block\"><code>import os\r\n# THIS IS WRONG DO NOT DO THIS\r\nos.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"<\/code><\/pre>\n<p>Instead, you can check out the system assigned GPU id by:<\/p>\n<pre class=\"code-block\"><code>import os\r\nprint(os.getenv(\"CUDA_VISIBLE_DEVICES\"))<\/code><\/pre>\n<p>GPU software that refers to a specific GPU should always use GPU 0, which the CUDA runtime library will match with the value of CUDA_VISIBLE_DEVICES. Software fore a two-GPU job would use GPUs 0 and 1, and so on.<\/p>\n<h2>GPU Consulting<a name=\"GPUCONSULTING\" href=\"#GPUCONSULTING\">&#x1F517;<\/a><\/h2>\n<p>RCS staff scientific programmers can help you with your questions concerning GPU programming. Please contact us at <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs. GPU Resources Running on the GPU Nodes Software&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":75314,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62821"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=62821"}],"version-history":[{"count":50,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62821\/revisions"}],"predecessor-version":[{"id":160343,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/62821\/revisions\/160343"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/75314"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=62821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}