The Shared Computing Cluster (SCC) provides several different kinds of storage, each with its own characteristics. Most researchers will, at some point, use all of the different forms. By default, each new account is set up with a home directory and new projects are set up with a nominal amount of Project Disk Space. Scratch space, available on all the nodes, is frequently used as temporary space for running jobs and often provides the highest performance for this purpose. Lastly, Archive space is provided as an IS&T service for long term storage of data.
- Individual home directories (long-term, small size)
- Scratch (short-term, large size, highest performance)
- Backed-up project space (moderate to long-term, moderate size, high performance, shared, files that are not replaceable or reproducible, disaster recovery)
- Non-backed-up project space (moderate-term, any size, shared, high performance)
- /restricted project space (dbGaP compliant primarily for BUMC researchers)
- Archival (long term, any size, infrequent access)
Snapshots are available, enabling a researcher to conveniently retrieve accidentally lost files.
On the SCC, each user has a 10 GB home directory which is backed up nightly and protected by Snapshots. Additional quota is not available for home directories. To check the home directory quota, use the
quota -s command.
Project Disk Space
Researchers will use Project Disk Space most often in their work. Allocations are made to projects and, as such, each project member can write and access the files in their project directories. Project Disk allocations can be in any of three forms: Free, Buy-in, or Storage-as-a-Service. Functionally, purchased and rented Project Disk augment and are indistinguishable from free storage.
All Project Disk Space is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). For files that need to be protected in the event of a disaster, backed-up space is available. Please review the guidelines for requesting backed up versus not-backed-up Project Disk Space.
By default, each new project is set up with 50 GB on a backed up partition and another 50 GB on a not-backed-up partition. Additional Project Disk Space may be requested by a project’s Lead Project Investigator (LPI) via the RCS Project Management web site with more information on the RCS Project Management page. There is no charge for requests up to the Free Baseline Quota maximum, which is a total of 1 TB.
Researchers requiring more than 1 TB can purchase additional, not-backed-up storage at a substantially subsidized rate through the Buy-in Program or rent storage through the Storage-as-a-Service offering. The current cost for storage is no more than $100/TB/5 years under the Buy-In program and $91/TB/year under the Storage-as-a-Service program.
A portion of the Project Disk Space on the SCC is dbGaP compliant for data that requires it (primarily BUMC genomics projects) and can be accessed from all SCC compute nodes but only the
scc4.bu.edu login node. Other than the dbGaP compliant partitions, the Project Disk partitions are accessible from all SCC compute and login nodes.
STASH Storage Space
Second Tier Ancillary Storage Heap (STASH) storage space is a solution for those who need to maintain a second off-site copy of their data and are willing to take responsibility for maintaining that copy. STASH is offered at the same cost as not-backed-up storage. STASH storage is not intended to be used as a replacement for active/working primary storage nor to replace backed-up storage or the Data Archiving service. The standard STASH is mounted at
/stash on scc1, scc2, geo, and scc4 . The restricted STASH is mounted at
/restricted/stash on scc4 only. STASH is not accessible by the SCC compute nodes.
Just as with Project Disk Space, STASH storage is protected by hardware RAID against disk failures. Daily Snapshots (protecting against accidental deletion of files) are retained for 30 days.
STASH storage is available under the same service models and costs as primary, not-backed-up storage (currently less than $100/TB/5 years under the Buy-In program and $91/TB/year under the Storage-as-a-Service program). STASH allocations are by request only (no default allocations). Free, Buy-in, and Storage-as-a-Service quotas can be exchanged for STASH quota.
You can request STASH Storage via the RCS Project Management web site, with more information on the RCS Project Management page. If you have questions about this new service, please send email to: firstname.lastname@example.org.
Each of the nodes on the SCC has a
/scratch directory intended for high performance access while running jobs. The space is for temporary use and not backed up. Each node has its own local
/scratch partition making access efficient and fast; you should make sure that your code uses the
/scratch local to where your code is running. Please see the programming examples for using /scratch. Scratch space is shared by all users; please be considerate and move or delete files that you no longer need for computation. Files will be removed automatically after 30 days.
SCC home directories, Project Disk Space, and STASH storage have a feature called snapshots implemented. The snapshots are copies of files and are stored within the file system, making them useful and convenient for retrieving files that are accidentally deleted, but not useful in the event of catastrophic failure. You have access to the snapshots of your files. Each directory contains a hidden subdirectory called
.snapshots. It is not visible in an
ls -a listing, but you can
cd into it:
scc1 ~% cd .snapshots scc1 ~% ls 160514/ 160515/ 160516/ 160517/
The directory names use the form YYMMDD to represent the day that the snapshot was created. Snapshots are taken at 12:01am every day. Regardless of the permissions that appear to be set on the snapshots of your files, you cannot overwrite or remove them. You can, however, copy them to your directory in the main file system. The snapshots in no way count against your allocation; you can ignore how much file space they take up.
Data Loss Protection Policies
|File System||Hardware Protection||Snapshots Frequency||Snapshots Retention||Disaster Recovery Protection|
|Home Directories||RAID6||Daily||30 days||most recent 180 days of snapshots backed-up off site|
|Backed Up Project Space (both restricted and non-restricted)||RAID6||Daily||30 days||most recent 180 days of snapshots backed-up off site|
|Non-Backed Up Project Space (both restricted and non-restricted)||RAID6||Daily||30 days|
|Scratch Space||No Snapshots|
The Project Disk and Scratch file systems are particularly efficient when working with large files. The best performance is obtained by reading or writing large chunks of data at one time. A minimum suggested size is 128 KB. In C one should use the Linux read and write system calls with large buffers, avoiding the fread/fwrite family of routines which do an additional layer of buffering. In Fortran one should read and write large arrays to get the best performance. In all cases, using unformatted reads and writes gives the best performance, by as much as a factor of 50.
IS&T Network File Storage
Information Services & Technology offers an updated Network File Storage service for secure storage of data. Note that files stored in this remote file system can only be copied to and from the SCC; computation cannot be performed on them. Also note that although the Network File Storage service supports Restricted Use data, SCC does not.
LPIs may request a free allocation of up to 1TB per project (limit 5), with more available for a charge. Once your Network File Storage space has been established for your project, you can access it from the SCC:
If you are storing a significant amount of data in your high-performance SCC Project Disk Space that you do not need regular access to but wish to keep for future access, you might consider requesting an allocation on the Network File Storage service and moving your files there.
Data Protection Standards
The University has created standards for data protection which are spelled out in detail in the Data Protection Standards documentation.
BU Library and IS&T staff have assembled a collection of materials and guidelines for Data Management.