Storage of Ongoing Data
We use the term storage for a system that holds data. There are numerous technologies, such as disk drives, tapes, and solid state devices. Generally, faster access means higher cost, and storage is arranged in a hierarchy, from a limited amount of fast solid state memory, through a larger amount of hard disk storage, to slower archival tape-based systems.
Factors to consider:
- Capacity – amount of data that can potentially be stored
- Speed – access bandwidth
- Convenience – ability to easily access data, possibly from multiple locations
- Sharing – allowing multiple people/entities to access data
- Reliability and redundancy – the percentage of time the data is available, and how resistant to loss it is
- Backups –relatively short-term (one month to one year) services to mitigate disasters resulting from environmental issues such as power outages, and accidental deletion or overwriting of data
- Archiving – long-term storage solutions for data preservation, protection and dissemination
- Dissemination – ability to provide online access to others
- Access control – ability to limit who can see data
- Security – protection against loss or theft
- “Working” storage: Usually supplied in conjunction with specialized computational facilities.
- Backups: Regular, automated backups of research data can mean the difference between success and failure of a project. See the IS&T Services page for BU offerings.
- Archiving: See the IS&T Services page for BU offerings.
- Consulting: Help with decisions about storage solutions that will ensure safe and efficient management through the life of the project. Recommendations of best practices for managing storage in the BU environment.
Storage systems provide large, fast, secure, and reliable disk access for high-performance computation. A number of different storage solutions are available to University faculty, their students, and their collaborators engaged in research computing on the Scientific Computing Facilities. These storage solutions are available for research and for educational use in courses related to computational science. See the IS&T Services page.
Keeping reliable backups is an integral part of data management. Your personal computer, external hard drives, departmental or university servers are examples of tools used for backing up data. CDs or DVDs are not recommended because they fail so frequently.
Recommendation: make 3 copies (e.g. original + external/local + external/remote). Have them geographically distributed (local vs. remote depends on recovery time needed).
Data Backup Options
- Hard drive (examples: via Vista backup, Mac Time Machine, UNIX rsync)
- Laptop/notebook computer backups
- Backup and restore service
- Tape backup system
Test your backup system – in order to make sure that your backup system is working properly, try to retrieve your data files and make sure that you can read them. You should do this upon initial setup of the system and on a regular schedule thereafter.
Archival storage is long-term, extremely safe storage for data protection and preservation. The researcher will almost surely archive final data, hard-to-reproduce intermediate data, notes, and publications; in short, everything needed to reconstruct the full workflow of the final results.
It is also good practice to archive the full state of the workflow periodically to enable reconstruction of the evolution of the workflow and allow “reverting” to an earlier stage. For instance, if you explore an analytic technique that turns out to be a blind alley, you might want to revert back to the spot at which you took that detour. This is sometimes referred to as saving a “checkpoint,” and is a simple, manual kind of version control.
Your data will be most easily read by you, and others in the future, if it has been unencrypted. However, if you do need to encrypt your data because of its sensitivity:
- Keep passwords and keys on paper (2 copies), and in a PGP (pretty good privacy) encrypted digital file;
- Don’t rely on 3rd party encryption alone.
It’s also ideal to store data in an uncompressed format. If you do need to conserve space, we recommend limiting compression to your third backup copy.
Security needs to be considered for all copies of your data, including your working data set, backup copies and archived copies.