Current SCC System Status
The SCV Systems group will update this status when there are changes. Click for additional information.
Power has been restored to the MGHPCC and the SCC is now fully operational. Some electrical issues remain, but it is believed that these can be resolved without further service interruptions. (December 11, 11:30PM)
- Replacement power equipment has been installed and is undergoing final testing. The MGHPCC will attempt to restore full power this evening. SCV staff remain on-site to bring the computer systems up as soon as possible once power is restored. (December 11, 6:30PM)
- The SCC login nodes and filesystems are now accessible using generator power at the MGHPCC. The compute nodes will continue to be unavailable until full power is restored. While the MGHPCC is expecting replacement parts for the main power feed tomorrow morning, we do not currently have an estimate on when full power will be restored. We will continue to post further updates on the SCC status page. (December 10, 9:00 PM)
- Due to equipment failures in the main power path, the MGHPCC was not able to return to operation on the target schedule. Equipment vendors and electric company personnel are on-site assessing the problem. SCV staff are also standing by on-site to return the computing systems to normal operation as soon as possible once power is restored. (December 10, 9:50 AM)
- Job queue is being drained in preparation for power outage at 10:00 PM 12/08 described below. Jobs which would not complete prior to the shutdown are being held until the system returns on 12/10. (December 6, 10:00 AM)
- December 9 – In order to address an exigent issue, there will be a full power outage at the MGHPCC on December 9th. This outage impacts the Shared Computing Cluster and ATLAS, as well as the on-campus Katana and LinGA clusters that use data stored on systems in Holyoke.
We anticipate the systems will be down from 10:00PM on December 8 until 9:000AM on December 10. More details are available here. (November 20, 4:00 PM)
- VNC is now available on the SCC. Using this software can greatly speed up the performance of GUI/graphical applications running on the SCC. (October 9, 4:00 PM)
- Note that if one login node is responding slowly, you may get better responsiveness by logging in to another. The login nodes are scc1.bu.edu, scc2.bu.edu, geo.bu.edu (for Earth & Environment department users), and scc4.bu.edu (for BU Medical Campus users). (August 19, 10:30 a.m.)
- Glenn Bresnahan, director of SCV, sends out to all SCF users an update on the SCC Performance Issues. (August 14, 12:30 p.m.)
- Performance back to normal.
As of 7:30 p.m. the SCC’s performance is back to normal. We are still trying to identify the underlying sources of these problems. (August 7, 7:35 p.m.)
- Recurring performance problem.
As of approximately 5:30 p.m. we have again been experiencing intermittent performance degradation. The Systems group is working on it to bring back normal performance as soon as possible. They also continue to try to locate and understand the underlying causes of these performance problems in an effort to prevent them from returning. Many apologies for the interruptions to your productivity. (August 7, 6:15 p.m.)
- File servers hung
Today at approximately 1:00 p.m. two file servers hung and took down the filesystem. The system was restored at 1:30 p.m. This incident was not related to the previous performance degradation issues. (August 7, 1:30 p.m.)
- Update: We believe we have resolved the problem below as of around 4:30 pm on August 6. It was unrelated to the issue on the 2nd and 5th. (August 6, 5:30 p.m.)
- Recurring performance problem.
We are aware that the SCC is having intermittent performance problems again. We are working on it and are trying to fix it as soon as possible. (August 6, 3:00 p.m.)
- On Friday, August 2nd at approximately 3:30pm and again on Monday, August 5th at approximately 12:30pm the Shared Computing Cluster (SCC) experienced system-wide degradations in performance lasting multiple hours . The SCV systems group has been working to identify the issue causing these degradations. At this time we believe that we have identified the issue and continue to work to fully rectify the problem. Users may continue to experience periods of degraded performance on the cluster until we have fully resolved the issue.
We apologize for any inconvenience that these issues may have caused you during the past few days and appreciate your continued patience as we continue to work hard to resolve them. (August 6, 11:00 a.m.)
- Problem with performance on SCC.
We are are aware that there is a problem with performance on the SCC cluster and are working on it now and will resolve it as soon as we can. Apologies in advance for the inconvenience.(August 5, 12:45 p.m.)
- Performance on the SCC cluster is back to normal. We are still investigating the cause of the problems recently experienced. (August 2, 5:05 p.m.)
- Problem with performance on SCC.
We are are aware that there is a problem with performance on the SCC cluster and are working on it now and will resolve it as soon as we can. Apologies in advance for the inconvenience.(August 2, 4:35 p.m.)
- Charging for usage in Service Units (SUs) begins on July 1, 2013. The compute nodes are charged at an SU factor of 2.6 SUs per CPU hour of usage. Also, note that the way usage is calculated on the SCC is different than it is on the Katana Cluster. Usage is charged on the SCC by wall clock time as on the Blue Gene rather than by actual usage as it is on the Katana Cluster. Thus if you request 12 processors and your code runs for 10 hours, you will be charged for the full 120 hours (multiplied by the SU factor for the node(s) you are running on) even if your actual computation only ran for, say, 30 hours. This change will also apply to the nodes which move out to become part of the SCC that used to be part of the Katana Cluster. (July 1, 2013)
- During the week of July 8-12, all of geo.bu.edu and the katana-d*, katana-e*, katana-j*, and katana-k* nodes will move out of the Katana Cluster to become part of the SCC. This includes all of the Buy-In Program nodes. All of these nodes will also be renamed during the transition. Details on this are in this note sent out on July 2.
The schedule is:
July 3rd: 6:00am-6:30am Katana outage to physically relocate July 7th: 7:00am Disable batch queues on machines that are moving. July 8th: 7:00am Power off all machines that are moving 8:00am-6:00pm Systems de-installed and moved to Holyoke 1:00pm "SCC3" becomes an alias for the system name "GEO" July 9: 8:00am-6:00pm Reinstallation and cabling of machines in Holyoke July 10: 12:00pm Target for GEO nodes in production July 11: 12:00pm Target for 2012 Buy-in nodes in production July 12: 12:00pm Target for all systems in production
(June 25, 2013)
- During the week of June 24, 2013, the BUDGE nodes are being moved out of the Katana Cluster to become part of the SCC. They will be operational again on Friday, June 28 with the new names scc-ha1..scc-he2 and scc-ja1..scc-je2. These nodes each have 8 NVIDIA Tesla M2070 GPU Cards with 6 GB of Memory. (June 24, 2013)
- A bug in the automounter on the SCC systems has been identified that prevents the /net/HOSTNAME/ automount space from working properly for certain servers. There are two known problem servers at this time:
nfs-archiveAs a workaround, until a proper bug fix becomes available, we have created a new automount space to handle the problem cases. If you experience a problem accessing /net/HOSTNAME/ for some HOSTNAME, look in /auto/. If HOSTNAME appears there, try that path, otherwise report the problem to email@example.com.
The /auto space is maintained manually so only the know problem servers can be accessed through that path. All other servers should be accessed through the usual /net path.
This problem does not affect the Katana Cluster. (June 17, 2013)
- The SCC officially went into production use on June 10, 2013. However, there are still some transitional things continuing. Not all software packages are yet installed and disk space is still in a transitional state for some projects. (June 10, 2013)
- You may or may not have noticed that most of the files on the old Project Disk on Katana have been moved to the new Project Disk on the SCC. You can continue to use your files from either system using the same paths that you always have. If you have not been accessing your files, we quietly moved them over the past week. If you have been accessing your files, we are contacting you individually to find a time that is convenient for you to take a break from accessing them while we move your files for you.Projects that did not have directories in /project and /projectnb on the old system do now have them on the new system with 50 GB quotas on each partition.A note for active Blue Gene users: Since the compute nodes are on a private network, we will not move your files at this time and will be contacting you over the next few weeks to discuss the details and options. (June 7, 2013)
- We will be hosting a seminar on June 11 from 12-2pm to go over issues related to the migration to the SCC. Please do register; a light lunch will be served. (June 3, 2013)The slides from these talks are posted here.
- MATLAB versions R2012b and R2013a are both available. R2012b is launched by /usr/local/bin/matlab at the moment, but you can access R2013a by running /usr/local/apps/matlab-2013a/bin/matlab. (May 30, 2013)
- In preparation for the new SCC Project Disk Space file systems going live in mid-June, we are making some changes – the first of which you may notice tomorrow, May 29, in the web forms and reports – the primary unit for reporting disk space will be Gigabytes, not Megabytes. In addition, when the SCC goes into production mid-June, all projects on the SCC will have directories and quotas on both backed up and not-backed-up Project Disk partitions. For projects that already have directories and quotas on Katana, they will be transferred to the SCC. Directories and quotas will be created for projects that did not already have them on Katana. The default minimum will be 50 GB on both partitions. For projects that need more quota, there is no charge for requests up to a total of 1 TB (200 GB backed up and 800 not-backed-up). Researchers who need more than that should look into the Buy-in options.
- MATLAB version R2013a is now installed on the SCC. (May 20, 2013)
- FTTW, Mathematica, Accelrys CHARMm, Gaussian, Grace, OpenGL/GLUT, and Nedit have all been installed on the SCC. (May 17, 2013)
- Production use of the SCC will begin in mid-June 2013.
- Added a table of available software packages on the SCC. This will be regularly updated during the friendly user period. (May 2, 2013)
- Made SCC web site live for everyone to access. (May 2, 2013)
- Friendly User access to the SCC begins. (April 26, 2013)
- Initial elements of the Shared Computing Cluster (SCC) are installed at the Massachusetts Green High Performance Computing Center (MGHPCC). BU is the first institution to install HPC resources at the MGHPCC. (January 22, 2013)