{"id":139743,"date":"2022-04-12T16:38:35","date_gmt":"2022-04-12T20:38:35","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?p=139743"},"modified":"2022-04-13T15:41:12","modified_gmt":"2022-04-13T19:41:12","slug":"outage-shared-computing-cluster-bu-works-test-environment","status":"publish","type":"post","link":"https:\/\/www.bu.edu\/tech\/2022\/04\/12\/outage-shared-computing-cluster-bu-works-test-environment\/","title":{"rendered":"Resolved: Shared Computing Cluster batch jobs, BU Works test environment"},"content":{"rendered":"<p><strong>Incident Discovery Time:<\/strong> 02:30pm on 04\/12\/2022<br \/>\n<strong>Time of Resolution:<\/strong> 10:30pm on 04\/12\/2022<\/p>\n<p><strong>Services Impacted:<\/strong> Server Infrastructure<\/strong><\/p>\n<h4>Description of Impact<\/h4>\n<p>The MGHPCC (Holyoke Datacenter) overheated today and some servers and network switches shut down. This primarily affected batch jobs on the SCC. Other areas that were affected are not production areas. They include the nas2\/nas-ru2 mirror of the CRC production server, and a development\/test environment set up for the BU works team.<\/p>\n<h4>Incident Description and Resolution<\/h4>\n<p>The SCC nodes were brought back online by 6pm. The rest of the servers were back online by 10:30pm except for a test environment which will need further investigation.<\/p>\n<h4>Additional Information<\/h4>\n<p>The cause of this incident is still under investigation.<\/p>\n<p>If you continue to have issues, please contact the <a href=\"mailto:ithelp@bu.edu\">IT Help Center<\/a>.<\/p>\n<h4 class=\"update\">Previous Update<\/h4>\n<p><!-- more --> <strong>Incident Discovery Time:<\/strong> 02:30pm on 04\/12\/2022<\/p>\n<p><strong>Services Impacted:<\/strong> Server Infrastructure<\/p>\n<h4>Description of Impact<\/h4>\n<p>The Massachusetts Green High Performance Data Center (MGHPCC) overheated briefly today and a few servers were shutdown.<\/p>\n<h4>Current Status<\/h4>\n<p>SCC compute nodes are back online and all clients on the SCC were notified. Our data center operations team is en route to Holyoke to investigate other servers.<\/p>\n<p><strong>Next Update:<\/strong> 07:30pm<\/p>\n<h4 class=\"update\">Previous Update<\/h4>\n<p> <strong>Incident Discovery Time:<\/strong> 02:30pm on 04\/12\/2022<\/p>\n<p><strong>Services Impacted:<\/strong> Server Infrastructure<\/p>\n<h4>Description of Impact<\/h4>\n<p>The Massachusetts Green High Performance Data Center (MGHPCC) overheated briefly today and a few servers were shutdown.<\/p>\n<h4>Current Status<\/h4>\n<p>IS&#038;T teams have fixed the cooling issue and are currently waiting for temperatures to drop enough to bring servers back online.<\/p>\n<h4>Additional Information<\/h4>\n<p>Batch computing jobs on the Shared Computing Cluster and test environments for the BU Works Basic team were affected. IS&#038;T teams continue to analyze the scope and impact.<\/p>\n<p><strong>Next Update:<\/strong> 07:30pm<\/p>\n<h4 class=\"update\">Previous Update<\/h4>\n<p> <strong>Incident Discovery Time:<\/strong> 02:30pm on 04\/12\/2022<\/p>\n<p><strong>Services Impacted:<\/strong> Shared Computing Cluster Batch jobs, BU Works test environment<\/p>\n<h4>Description of Impact<\/h4>\n<p>The Massachusetts Green High Performance Data Center (<a href=\"https:\/\/www.bu.edu\/tech\/services\/infrastructure\/data-center\/mghpcc-data-center\/\" rel=\"noopener noreferrer\" target=\"_blank\">MGHPCC<\/a>)   has had an air conditioning issue. The room overheated to the point where some servers shut down. SCC login nodes and filesystem were not affected, but some batch jobs and BU works services may be unavailable.<\/p>\n<h4>Current Status<\/h4>\n<p>IS&#038;T teams are investigating the impact of this outage and working to get servers back online.<\/p>\n<p><strong>Next Update:<\/strong> 5:30pm.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The MGHPCC (Holyoke Datacenter) overheated today and some servers and network switches shut down. &#8230;<\/p>\n","protected":false},"author":1545,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[795,1867],"tags":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/posts\/139743"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1545"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=139743"}],"version-history":[{"count":12,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/posts\/139743\/revisions"}],"predecessor-version":[{"id":139768,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/posts\/139743\/revisions\/139768"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=139743"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/categories?post=139743"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/tags?post=139743"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}