The Shared Computing Cluster (SCC) provides several different kinds of storage, each with its own characteristics. Most researchers will, at some point, use all of the different forms. By default, each new account is set up with a home directory and new projects are set up with a nominal amount of Project Disk Space. Scratch space, available on all the nodes, is frequently used as temporary space for running jobs and often provides the highest performance for this purpose. Lastly, Archive space is provided as an IS&T service for long term storage of data.
- Individual home directories (long-term, small size)
- Scratch (short-term, large size, highest performance)
- Backed-up project space (moderate to long-term, moderate size, high performance, shared, files that are not replaceable or reproducible, disaster recovery)
- Non-backed-up project space (moderate-term, any size, shared, high performance)
- /restricted project space (dbGaP compliant primarily for BUMC researchers)
- Archival (long term, any size, infrequent access)
Snapshots are available, enabling a researcher to conveniently retrieve accidentally lost files.
On the SCC, each user has a 10 GB home directory which is backed up nightly and protected by Snapshots. Additional quota is not available for home directories. To check the home directory quota, use the
quota -v command.
Project Disk Space
Researchers will use Project Disk Space most often in their work. Allocations are made to projects and, as such, each project member can write and access the files in their project directories. Project Disk allocations can be in any of three forms: Free, Buy-in, or Storage-as-a-Service. Functionally, purchased and rented Project Disk augment and are indistinguishable from free storage.
All Project Disk Space is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). For files that need to be protected in the event of a disaster, backed-up space is available. Please review the guidelines for requesting backed up versus not-backed-up Project Disk Space.
By default, each new project is set up with 50 GB on a backed up partition and another 50 GB on a not-backed-up partition. Additional Project Disk Space may be requested by a project’s Lead Project Investigator (LPI) via the RCS Project Management web site with more information on the RCS Project Management page. There is no charge for requests up to the Free Baseline Quota of a total of 1 TB.
Researchers requiring more than 1 TB can purchase additional, not-backed-up storage at a substantially subsidized rate through the Buy-in Program or rent storage through the Storage-as-a-Service offering. The current cost for storage is less than $140/TB/5 years under the Buy-In program and $91/TB/year under the Storage-as-a-Service program.
A portion of the Project Disk Space on the SCC is dbGaP compliant for data that requires it (primarily BUMC genomics projects) and can be accessed from all SCC compute nodes but only the
scc4.bu.edu login node. Other than the dbGaP compliant partitions, the Project Disk partitions are accessible from all SCC compute and login nodes.
STASH Storage Space
Second Tier Ancillary Storage Heap (STASH) storage space is a solution for those who need to maintain a second off-site copy of their data and are willing to take responsibility for maintaining that copy. STASH is offered at the same cost as not-backed-up storage. STASH storage is not intended to be used as a replacement for active/working primary storage nor to replace backed-up storage or the Data Archiving service. The standard STASH is mounted at
/stash on scc1, scc2, geo, and scc4 . The restricted STASH is mounted at
/restricted/stash on scc4 only. STASH is not accessible by the SCC compute nodes.
Just as with Project Disk Space, STASH storage is protected by hardware RAID against disk failures. Daily Snapshots (protecting against accidental deletion of files) are retained for 30 days.
STASH storage is available under the same service models and costs as primary, not-backed-up storage (currently less than $140/TB/5 years under the Buy-In program and $91/TB/year under the Storage-as-a-Service program). STASH allocations are by request only (no default allocations). Free, Buy-in, and Storage-as-a-Service quotas can be exchanged for STASH quota.
You can request STASH Storage via the RCS Project Management web site, with more information on the RCS Project Management page. If you have questions about this new service, please send email to: email@example.com.
Each of the nodes on the SCC has a
/scratch directory intended for high performance access while running jobs. The space is for temporary use and not backed up. Each node has its own local
/scratch partition making access efficient and fast; you should make sure that your code uses the
/scratch local to where your code is running. Please see the programming examples for using /scratch. Scratch space is shared by all users; please be considerate and move or delete files that you no longer need for computation. Files will be removed automatically after 30 days.
Both the home directories and the Project Disk on the SCC have a feature called snapshots implemented. The snapshots are copies of files and are stored within the file system, making them useful and convenient for retrieving files that are accidentally deleted, but not useful in the event of catastrophic failure. You have access to the snapshots of your files. Each directory contains a hidden subdirectory called
.snapshots. It is not visible in an
ls -a listing, but you can
cd into it:
scc1% cd .snapshots scc1% ls 130514/ 130515/ 130516/ 130517/
The directory names use the form YYMMDD to represent the day that the snapshot was created. They are taken at 12:01am every day. The snapshots are read-only, but you can copy from them into the main file system. The snapshots in no way count against your allocation; you can ignore how much file space they take up.
Data Loss Protection Policies
|File System||Hardware Protection||Snapshots Frequency||Snapshots Retention||Disaster Recovery Protection|
|Home Directories||RAID1||Daily||180 days||most recent 30 days of snapshots replicated off site|
|Backed Up Project Space (both restricted and non-restricted)||RAID6||Daily||180 days||most recent 30 days of snapshots replicated off site|
|Non-Backed Up Project Space (both restricted and non-restricted)||RAID6||Daily||30 days|
|Scratch Space||No Snapshots|
The Project Disk and Scratch file systems are particularly efficient when working with large files. The best performance is obtained by reading or writing large chunks of data at one time. A minimum suggested size is 128 KB. In C one should use the Linux read and write system calls with large buffers, avoiding the fread/fwrite family of routines which do an additional layer of buffering. In Fortran one should read and write large arrays to get the best performance. In all cases, using unformatted reads and writes gives the best performance, by as much as a factor of 50.
IS&T Data Archiving Service
Information Services & Technology offers a Data Archiving service for secure, long-term storage of large quantities of infrequently accessed data. PIs may request a free allocation of up to 1TB with more available at a subsidized charge. Once your Archive space has been established for your project, you can access it from the SCC:
/archive/replicated -> /auto/nfs-archive/ifs/archive /archive/not-replicated -> /auto/nfs-archive/ifs/noreplica /restricted/archive/replicated -> /auto/nfs-archive/ifs/archive /restricted/archive/not-replicated -> /auto/nfs-archive/ifs/noreplica
If you are storing a significant amount of data in your high-performance Project Disk Space that you do not need regular access to but wish to keep for future access, please consider requesting Archive space and moving your files there.
The old tape archive mass storage facility has been phased out. The data that was on it has been transferred to online storage on the SCC and will be permanently removed on December 30, 2014.
Data Protection Standards
The University has created standards for data protection which are spelled out in detail in the Data Protection Standards documentation.
BU Library and IS&T staff have assembled a collection of materials and guidelines for Data Management.