- Where Should you Store your Files?
- Checking How Much Space you are Using
- Transferring Files To and From the SCC
- Working with Files/Directories under Linux
- Controlling Access to your Files
- Recovering Lost Files
- Tar and Compressed Files
Users on the SCC are automatically granted several locations to store their files. Our overall file storage system is described here. Most users will be primarily storing files in three areas, all of which are generally accessible from all of the login and compute nodes; the exception is that the
/restricted/ partitions are only accessible from the
scc4.bu.edu login node and all of the compute nodes:
- Home Directory – This directory is entirely controlled by you and the default permissions are that nobody else can see or otherwise access your files. Home directories have a quota of 10 GB and this will generally not be increased. You will naturally store files directly related to your account here, such as dotfiles. It is also commonly used to store personal files, such as email or personal images. Although it is possible to do work in your home directory if it fits within the 10GB limit, we recommend you use Project Disk Space in case you end up needing more space than you anticipate. Home directories are both protected by Snapshots and also backed up off site.
- Backed Up Project Disk Space – Projects are by default granted 50 GB of space under
/restricted/project/project_name/for most BUMC projects). This number can be increased to a maximum of 200 GB at the request of the project leader(s) but it can not go beyond that. This data is both protected by Snapshots and also backed up off site. Depending on the workflow of the project, a reasonable approach is to keep code and files you hand-edit in
/project/and files downloaded or generated by code or applications in
- Not Backed Up Project Disk Space – Projects are by default granted 50 GB of space under
/restricted/projectnb/project_name/for most BUMC projects). This can be increased for free to a maximum total allocation of free disk space of 1000 GB and then beyond that additional Not Backed Up space can be purchased through either Buy-In or Storage-as-a-Service. Despite the name for this space, it is protected by both hardware RAID (protecting against disk failures) and daily Snapshots (protecting against accidental deletion of files). You will want to use this space for any large quantities of data you have. We have guidelines for what data should be stored in each partition.
- You can see which projects you belong to by running the command
groupsand you can see how much space each of them have available and where by running
pquota. Note that there are a few special groups like gaussian that do not have any disk space associated with them;
pquotawill tell you where you have disk space you can use. However, for those with space in
groupswill list the directories as
/rproject/project_name/but you must type the full
/restricted/...to access them.
Use the command
pquota to see your quota and usage:
scc1% pquota -u animate quota usage usage project space (GB) (GB) (files) ----------------------------------- ------ --------- -------- /project/animate 50 0.00 1 /projectnb/animate 50 3.45 4328 15407 0.09 80 73043 0.25 61 82363 0.11 243 dcornell 0.29 104 laura 1.02 2114 rcrnl 1.68 1723 root 0.00 3
-u option asks for a breakdown of usage by the users on the project, in addition to the default project totals. Information on quota (in GB), usage (in GB), and number of files is given for each partition the selected project group has access to. If there are any numbers instead of login names in the list, as in the example above, they refer to files owned by users who had accounts on the system long ago.
A project’s Lead Project Investigator (LPI) or IT/Admin Contact can request that we delete or make accessible to him or her any files in a given project’s Project Disk Space or STASH areas. This request should be sent to email@example.com.
The command for home directories to show quota (10 GB for almost all users) and usage is
scc1% quota -s Home Directory Usage and Quota: Name GB quota limit in_doubt grace | files quota limit in_doubt grace adftest2 0.00212 10.0 11.0 0.0 none | 287 0 0 0 none
The important items are highlighted in
yellow. They show your usage (in GB), quota (in GB), hard limit (in GB) and number of files you have. You can exceed your quota but not your hard Limit for a period of 7 days. You should NEVER go up to your hard limit or you may be unable to log in as you will be unable to write any files and this usually causes a problem when trying to log in.
You can see which directories, files, and dotfiles are taking up the most space in your home directory by running
du -s .[^.]* * | sort -n; the largest items will be listed last:
scc1% du -s .[^.]* * | sort -n ... 32 helloWorld.o9436314 224 newdir 1760 .matlab
Please consult the appropriate instructions based on the operating system of the machine you are using to connect to the SCC.
Another option for file transfer is Globus Online which allows for transfer between your desktop/laptop and the SCC and also allows you to access data stored on a variety of national research clusters.
If you are not familiar with the commands for working with files and directories under Linux, please consult our Getting Started section, in particular the pages on commands and filesystem navigation.
You can determine who, if anyone, can have read, write, and/or execute permission to your files using the commands chmod and umask. You can limit/allow access to each of your files/directories to yourself, your collaborators on a given research project, and/or all users of the system. The default behavior is that only you can modify the files/directories you create but others can read and, if applicable, execute them if they have access to the directory in which they are stored. The default is that others do not have any access to your home directory but your group members do have access to the Project Disk Space belonging to the project group.
Every night starting at 12:01am copies are made of your files using Snapshots. This feature will let you recover files you mistakenly delete or overwrite. Snapshots are implemented for Home Directories, Project Disk Space, and STASH space. Follow the example here to recover your files. Note that there is generally no way to recover a file you just created; the file(s) must have had a chance to be snapshotted overnight.
There are four main archiving (combining multiple files into one archive file) and compression (reducing the size of a file) tools on Linux systems with associated tools to reverse the process. It is common to both archive a set of files and then compress the archive, such as a file named
|Archive/ Compression Tool||Unarchive /Uncompression Tool||Archived/ Compressed Filename Extension||Purpose|
|gzip||gunzip||.gz||Compress a large single file|
|compress||uncompress||.Z||Compress a large single file|
|zip||unzip||.zip||Creating a single compressed file from a group of files|
|tar -c||tar -x||.tar||Creating a single file from a group of files|
Usage of the first two tools (
compress) and their counterparts is straightforward:
scc1% ls myfile scc1% compress myfile # or replace with gzip scc1% ls myfile.Z scc1% uncompress myfile.Z # or replace with gunzip of a .gz file scc1% ls myfile
On Linux systems,
tar is used much more commonly for archiving sets of files than
zip. However, you may very well come across ZIP archives, often generated on Windows, which you will need to
scc1% unzip example.zip Archive: example.zip creating: Packet1/ creating: Packet2/ scc1% ls Packet1/ Packet2/ example.zip
Tar has many options. Shown below is an example of generating and then expanding a simple archive. Note that if you have a compressed tar file such as
myarchive.tar.gz you will generally first want to uncompress it using the appropriate tool above and then untar it.
scc1% ls mydir file1 file2 scc1% tar -cvf mydir.tar mydir/ # Generate the archive mydir.tar from the directory mydir and all of its contents. mydir/ mydir/file1 mydir/file2 scc1% rm -r mydir # Remove the original directory for now scc1% ls mydir.tar scc1% tar -tvf mydir.tar # Look at what is in the archive file. drwx------ aarondf/scv 0 2014-04-30 13:26 mydir/ -rw------- aarondf/scv 8 2014-04-30 13:25 mydir/file1 -rw------- aarondf/scv 10 2014-04-30 13:26 mydir/file2 scc1% tar -xvf mydir.tar # Expand the archive file in my current directory. mydir/ mydir/file1 mydir/file2 scc1% ls * mydir: file1 file2 mydir.tar