{"id":153787,"date":"2024-08-19T11:54:25","date_gmt":"2024-08-19T15:54:25","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=153787"},"modified":"2026-01-07T09:52:16","modified_gmt":"2026-01-07T14:52:16","slug":"process-reaper","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/process-reaper\/","title":{"rendered":"Process Reaper and Policy Enforcement"},"content":{"rendered":"<div>The Shared Computing Cluster (SCC) implements several automatic \u201cprocess reapers\u201d to enforce policy. These detect and terminate processes or batch jobs that use resources beyond the job request or that make inefficient use of resources. Actions taken by the process reapers are reported to the owner of the impacted process via email. Research Computing is available to assist researchers in optimizing their workflows and batch job specifications.<\/div>\n<h3>The Login Node Process Reaper<\/h3>\n<div>The SCC Login Nodes are the primary connection point for researchers using the SCC. These nodes can be used for administrative tasks and light work; long-term or high-cpu tasks should be <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/interactive-jobs\/\"> run as a batch job<\/a>.\u00a0<b>This process reaper enforces a time limit of 15 minutes of CPU time on each process on the login node.<\/b><\/div>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Example Message<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<table>\n<thead>\n<tr>\n<td><strong>To:<\/strong> username@bu.edu<br \/>\n<strong>From:<\/strong> root &lt;root@scc.bu.edu&gt;<br \/>\n<strong>Subject:<\/strong> Message from the process reaper on SCC1<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"background-color: #ffffff;\">\n<td>\n<div>\n<p>The following process, running on SCC1, has been terminated because it exceeded the limits for interactive use.\u00a0 An interactive process is killed if its total CPU time is greater than 15 minutes and greater than 25% of its lifetime.\u00a0 Processes which may exceed these limits should be submitted through the batch system.<\/p>\n<p>See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\">https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs<\/a> for more information.<\/p>\n<pre>COMMAND \u00a0 \u00a0 \u00a0STATE\u00a0 \u00a0 PID \u00a0 PPID TIME(min.) RATE(%) SIZE\u00a0 RSS\u00a0 \u00a0 START TIME\r\nprocessname \u00a0 \u00a0 \u00a0S \u00a0 5912 \u00a0 8049\u00a0 18 + 0 \u00a0 \u00a0 \u00a0 37 \u00a0 2883 2385\u00a0 05\/07  11:23<\/pre>\n<p>Please email <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a> for assistance.<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/div>\n<\/div>\n\n<h3>The CPU Limit Process Reaper<\/h3>\n<p><span style=\"font-weight: 400;\">Compute nodes should only run processes associated with jobs and jobs should use only the resources requested by the job submission. You can learn about process\/slot requests on our <\/span><a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/submitting-jobs\/\"><span style=\"font-weight: 400;\">Submitting Batch Jobs<\/span><\/a><span style=\"font-weight: 400;\"> page. <\/span><b>This process reaper terminates processes that are not associated with a job (e.g. SSH directly to a compute node) and jobs that use more processors than requested<\/b><span style=\"font-weight: 400;\">.<\/span><\/p>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Example Message<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<table>\n<thead>\n<tr>\n<td><strong>To:<\/strong> username@bu.edu<br \/>\n<strong>From:<\/strong> root &lt;root@scc.bu.edu&gt;<br \/>\n<strong>Subject:<\/strong> Message from the process reaper on <i>SCC_COMPUTE_NODE<\/i><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"background-color: #ffffff;\">\n<td>\n<div>\n<p>The following batch job, running on <em><span class=\"\u201dplaceholder\u201d\">SCC_COMPUTE_NODE<\/span><\/em>, has been terminated because it was using 17.1 processors but was allocated only 16. Please resubmit the job using an appropriate PE specification.<\/p>\n<p>See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\">https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs<\/a> for more information.<\/p>\n<pre>job JOBNUMBER owner: username pe: omp16 type: \"Single node batch\" slots: 16\r\n            sge_gid: 1000902 job_pid: 5407\r\n            cputime: 97 min. rate: 1711.70% starttime: 04\/25 16:14:20\r\nCOMMAND     STATE    PID   PPID TIME(min.) RATE(%) SIZE  RSS    START TIME\r\nprocess0        R   5634   5411    10        1827   1832  114  04\/25 16:19\r\nprocess1        S   5411   5410   0 + 87     1535    725   39  04\/25 16:14<\/pre>\n<p>Please email <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a> for assistance.<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/div>\n<\/div>\n\n<h3><span style=\"font-weight: 400;\">The Idle GPU Process Reaper<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Interactive sessions and batch jobs should make effective use of specialized resources, like GPUs, when they are requested. You can learn about the use of GPUs on our <\/span><a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/gpu-computing\/\"><span style=\"font-weight: 400;\">GPU Computing<\/span><\/a><span style=\"font-weight: 400;\"> page. <\/span><b>This process reaper terminates a job if all of the requested GPU(s) remain idle for two hours on Shared resources and some Buy-in resources.<\/b><\/p>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Example Message<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<table>\n<thead>\n<tr>\n<td><span style=\"font-weight: 400;\"><strong>To:<\/strong> username@bu.edu<br \/>\n<strong>From:<\/strong> root &lt;root@scc.bu.edu&gt;<br \/>\n<strong>Subject:<\/strong> Message from the process reaper on <i>SCC_COMPUTE_NODE<\/i><\/span><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"background-color: #ffffff;\">\n<td>\n<div>\n<p>The following batch job, running on <em><span class=\"placeholder\">SCC_COMPUTE_NODE<\/span><\/em>, has been terminated because all of its requested GPUs remained idle for 2 hours. Please ensure your software makes effective use of GPU resources and resubmit the job using an appropriate GPU specification.<\/p>\n<p>See <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\">https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs<\/a> for more information.<\/p>\n<pre>COMMAND         STATE     PID    PPID      START TIME\r\npython              S 1411234 1415123  12\/04 19:03:40<\/pre>\n<p>Please email <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a> for assistance.<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/div>\n<\/div>\n\n<h3><span style=\"font-weight: 400;\">The Unassigned GPU Process Reaper<\/span><\/h3>\n<div>GPUs are only accessible through batch jobs and batch jobs should use only the GPUs they are assigned. You can learn about use of GPUs and the <span class=\"placeholder\">$CUDA_VISIBLE_DEVICES<\/span> variable on our <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/gpu-computing\/\">GPU Computing<\/a> page.<b> This process reaper enforces GPU assignment of processes within a batch job \u2013 jobs and processes that use a GPU which is not assigned to the job are terminated.<\/b><\/div>\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Example Message for Case 1: A non-batch process accesses a GPU<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<table>\n<thead>\n<tr>\n<td><span style=\"font-weight: 400;\"><strong>To:<\/strong> username@bu.edu<br \/>\n<strong>From:<\/strong> root &lt;root@scc.bu.edu&gt;<br \/>\n<strong>Subject:<\/strong> Message from the process reaper on <em>SCC_COMPUTE_NODE<\/em><\/span><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"background-color: #ffffff;\">\n<td>\n<div>\n<p>The following process, running on <em><span class=\"placeholder\">SCC_COMPUTE_NODE<\/span><\/em>, has been terminated because it was using gpu 0, but it was not associated with a batch job. Only processes which are part of a batch job are allowed to use gpus.<\/p>\n<pre>COMMAND         STATE     PID    PPID      START TIME\r\npython              S 1415191 1415019  12\/04 15:23:10<\/pre>\n<p>Please email <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a> for assistance.<\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/div>\n<\/div>\n\n<div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Example Message for Case 2: A batch job process accesses a GPU that is not assigned to it.<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/p>\n<table>\n<thead>\n<tr>\n<td><span style=\"font-weight: 400;\"><strong>To:<\/strong> username@bu.edu<br \/>\n<strong>From:<\/strong> root &lt;root@scc.bu.edu&gt;<br \/>\n<strong>Subject:<\/strong> Message from the process reaper on <em>SCC_COMPUTE_NODE<\/em><\/span><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"background-color: #ffffff;\">\n<td>The following process, running on <em><span class=\"placeholder\">SCC_COMPUTE_NODE<\/span><\/em>, has been terminated because it was using gpu 2, which was not assigned to its associated batch job, <em><span class=\"\u201dplaceholder\u201d\">JOB_NUMBER<\/span><\/em>. Batch jobs are only allowed to use the gpus assigned to them via the <span class=\"command\">$CUDA_VISIBLE_DEVICES<\/span> environment variable.<\/p>\n<p>See https:\/\/www.bu.edu\/tech\/support\/research\/software-and-programming\/programming\/multiprocessor\/gpu-computing\/#CUDAVISIBLE for more information.<\/p>\n<pre>COMMAND         STATE     PID    PPID      START TIME\r\npython              S <span>1320228<\/span> <span>1320137<\/span>  12\/04 <span>13:16:18<\/span><\/pre>\n<p>Please email <a href=\"mailto:help@scc.bu.edu\">help@scc.bu.edu<\/a> for assistance.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>The Shared Computing Cluster (SCC) implements several automatic \u201cprocess reapers\u201d to enforce policy. These detect and terminate processes or batch jobs that use resources beyond the job request or that make inefficient use of resources. Actions taken by the process reapers are reported to the owner of the impacted process via email. Research Computing is&#8230;<\/p>\n","protected":false},"author":3593,"featured_media":0,"parent":137962,"menu_order":15,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/153787"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/3593"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=153787"}],"version-history":[{"count":37,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/153787\/revisions"}],"predecessor-version":[{"id":160611,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/153787\/revisions\/160611"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/137962"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=153787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}