Where Should you Store your Files?

Users on the SCC are automatically granted several locations to store their files. Our overall file storage system is described here. Most users will be primarily storing files in three areas, all of which are generally accessible from all of the login and compute nodes; the exception is that the /restricted/ partitions are only accessible from the scc4.bu.edu login node and all of the compute nodes:

  • Home Directory – This directory is entirely controlled by you and the default permissions are that nobody else can see or otherwise access your files. Home directories have a quota of 10 GB and this will generally not be increased. You will naturally store files directly related to your account here, such as dotfiles. It is also commonly used to store personal files, such as email or personal images. Although it is possible to do work in your home directory if it fits within the 10GB limit, we recommend you use Project Disk Space in case you end up needing more space than you anticipate. Home directories are both protected by Snapshots and also backed up off site.
  • Backed Up Project Disk Space – Projects are by default granted 50 GB of space under /project/project_name/ (or /restricted/project/project_name/ for most BUMC projects). This number can be increased to a maximum of 200 GB at the request of the project leader(s) but it can not go beyond that. This data is both protected by Snapshots and also backed up off site. Depending on the workflow of the project, a reasonable approach is to keep code and files you hand-edit in /project/ and files downloaded or generated by code or applications in /projectnb/.
  • Not Backed Up Project Disk Space – Projects are by default granted 50 GB of space under /projectnb/project_name/ (or /restricted/projectnb/project_name/ for most BUMC projects). This can be increased for free to a maximum total allocation of free disk space of 1000 GB and then beyond that additional Not Backed Up space can be purchased through either Buy-In or Storage-as-a-Service. Despite the name for this space, it is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). You will want to use this space for any large quantities of data you have. We have guidelines for what data should be stored in each partition.
  • You can see which projects you belong to by running the command groups and you can see how much space each of them have available and where by running pquota. Note that there are a few special groups like gaussian that do not have any disk space associated with them; pquota will tell you where you have disk space you can use. However, for those with space in /restricted/project/ or /restricted/projectnb/ groups will list the directories as /rproject/project_name/ and /rproject/project_name/ but you must type the full /restricted/... to access them.

Checking How Much Space you are Using

Project Disk Space and STASH

Use the command pquota to see your quota and usage:

scc1% pquota -u animate
                                      quota      usage     usage
project space                          (GB)       (GB)   (files)
-----------------------------------  ------  ---------  --------
/project/animate                         50       0.00         1

/projectnb/animate                       50       3.45      4328
    15407                                         0.09        80
    73043                                         0.25        61
    82363                                         0.11       243
    dcornell                                      0.29       104
    laura                                         1.02      2114
    rcrnl                                         1.68      1723
    root                                          0.00         3

The -u option asks for a breakdown of usage by the users on the project, in addition to the default project totals. Information on quota (in GB), usage (in GB), and number of files is given for each partition the selected project group has access to. If there are any numbers instead of login names in the list, as in the example above, they refer to files owned by users who had accounts on the system long ago.

A project’s Lead Project Investigator (LPI) or IT/Admin Contact can request that we delete or make accessible to him or her any files in a given project’s Project Disk Space or STASH areas. This request should be sent to help@scc.bu.edu.

If your project needs more space, the project LPI or IT/Admin Contact can request additional space but there is a charge for requests over 1000 GB.

Home Directory

The command for home directories to show quota (10 GB for almost all users) and usage is quota -s:

scc1% quota -s
Home Directory Usage and Quota:
Name           GB    quota    limit in_doubt    grace |    files    quota    limit in_doubt    grace
adftest2  0.00212     10.0     11.0      0.0     none |      287        0        0        0     none

The important items are highlighted in yellow. They show your usage (in GB), quota (in GB), hard limit (in GB) and number of files you have. You can exceed your quota but not your hard Limit for a period of 7 days. You should NEVER go up to your hard limit or you may be unable to log in as you will be unable to write any files and this usually causes a problem when trying to log in.

You can see which directories, files, and dotfiles are taking up the most space in your home directory by running du -s .[^.]* * | sort -n; the largest items will be listed last:

scc1% du -s .[^.]* * | sort -n
32      helloWorld.o9436314
224     newdir
1760    .matlab

Transferring Files To and From the SCC

Please consult the appropriate instructions based on the operating system of the machine you are using to connect to the SCC.

Another option for file transfer is Globus Online which allows for transfer between your desktop/laptop and the SCC and also allows you to access data stored on a variety of national research clusters.

Working with Files/Directories under Linux

If you are not familiar with the commands for working with files and directories under Linux, please consult our Getting Started section, in particular the pages on commands and filesystem navigation.

Controlling Access to your Files

You can determine who, if anyone, can have read, write, and/or execute permission to your files using the commands chmod and umask. You can limit/allow access to each of your files/directories to yourself, your collaborators on a given research project, and/or all users of the system. The default behavior is that only you can modify the files/directories you create but others can read and, if applicable, execute them if they have access to the directory in which they are stored. The default is that others do not have any access to your home directory but your group members do have access to the Project Disk Space belonging to the project group.

Recovering Lost Files

Every night starting at 12:01am copies are made of your files using Snapshots. This feature will let you recover files you mistakenly delete or overwrite. Snapshots are implemented for Home Directories, Project Disk Space, and STASH space. Follow the example here to recover your files. Note that there is generally no way to recover a file you just created; the file(s) must have had a chance to be snapshotted overnight.

Tar and Compressed Files

There are four main archiving (combining multiple files into one archive file) and compression (reducing the size of a file) tools on Linux systems with associated tools to reverse the process. It is common to both archive a set of files and then compress the archive, such as a file named myarchive.tar.gz

Archive/ Compression Tool Unarchive /Uncompression Tool Archived/ Compressed Filename Extension Purpose
gzip gunzip .gz Compress a large single file
compress uncompress .Z Compress a large single file
zip unzip .zip Creating a single compressed file from a group of files
tar -c tar -x .tar Creating a single file from a group of files

Usage of the first two tools (gzip and compress) and their counterparts is straightforward:

scc1% ls
scc1% compress myfile # or replace with gzip
scc1% ls
scc1% uncompress myfile.Z # or replace with gunzip of a .gz file
scc1% ls

On Linux systems, tar is used much more commonly for archiving sets of files than zip. However, you may very well come across ZIP archives, often generated on Windows, which you will need to unzip:

scc1% unzip example.zip
Archive:  example.zip
   creating: Packet1/
   creating: Packet2/
scc1% ls
Packet1/     Packet2/     example.zip

Using Tar
Tar has many options. Shown below is an example of generating and then expanding a simple archive. Note that if you have a compressed tar file such as myarchive.tar.gz you will generally first want to uncompress it using the appropriate tool above and then untar it.

scc1% ls mydir
file1  file2
scc1% tar -cvf mydir.tar mydir/ # Generate the archive mydir.tar from the directory mydir and all of its contents.
scc1% rm -r mydir # Remove the original directory for now
scc1% ls
scc1% tar -tvf mydir.tar # Look at what is in the archive file.
drwx------ aarondf/scv       0 2014-04-30 13:26 mydir/
-rw------- aarondf/scv       8 2014-04-30 13:25 mydir/file1
-rw------- aarondf/scv      10 2014-04-30 13:26 mydir/file2
scc1% tar -xvf mydir.tar # Expand the archive file in my current directory.
scc1% ls *
file1  file2