BU Libraries Digital Preservation Policy
(draft December 2011)
Introduction
Boston University Libraries are committed to providing long-term access to the digital content they collect and curate. The Libraries’ long history of preserving traditional content informs its decisions about digital preservation. The Libraries recognize that collaborative approaches to preserving digital content are essential. Support for collaborative efforts such as Portico (http://www.portico.org/digital-preservation/) and LOCKSS (http://lockss.stanford.edu/lockss/Home) is an important part of the Libraries overall digital preservation strategy. The Libraries work with Boston University’s Information Services and Technology to provide secure storage and backup of locally hosted digital content. Participation in Portico and LOCKSS provides a mechanism for keeping multiple copies of the digital content in distributed repositories.
Digital content presents many challenges for preservation, not the least being rapidly changing file formats and obsolescent hardware and software. The Libraries adhere to best practices for digital preservation to insure data accessibility, fixity and usability. BU Digital Common, the University’s institutional repository, functions as a primary repository and focus for the Libraries’ preservation efforts. For each additional locally hosted collection of digital content, the Libraries define appropriate levels of preservation support.
Preservation Support Levels
The Libraries commit to maintaining the fixity (bitstream integrity) for all digital objects submitted to BU Digital Common. Digital objects are assigned a persistent identifier to provide persistent access. Secure storage and backup services are provided for all digital content. The Libraries will take reasonable steps to ensure the usability of the digital objects placed in BU Digital Common. Preservation steps include format migration, emulation, and normalization (actions that convert an unsupported format to a supported format). The preservation steps performed by the Libraries is determined by the file format of the digital objects. More extensive action will be taken to preserve the usability of objects in file formats that are open, fully disclosed, well documented, widely adopted, and most accessible for migration, emulation, or normalization actions. Fewer actions will be taken to preserve usability of file formats that are proprietary and/or undocumented, and those that are considered working formats (e.g., Photoshop .psd) and/or are not widely adopted.
The support levels indicated below are:
- Bit-Level — the Libraries commit to maintaining the files on the level of ones and zeroes, but not to maintaining format integrity. This level of support is generally assigned to proprietary formats.
- Format — the Libraries commit to maintaining bitstream and format integrity. This level of support is generally assigned to formats (proprietary or not) for which open standards are available. For instance, if PDF as a format changes substantially, we will endeavor to convert all of our PDF bitstreams to the latest version. One exception to this is HTML, which changes so relatively frequently that the Libraries do not have the resources to ensure format integrity in its case.
More information about file formats known and/or supported by DSpace (the software running Digital Common) is available here.
File Support and Preservation Best Practices
The following table details BU Libraries’ preservation support levels for commonly used file formats. Best practices for file formats are included.
TEXT AND MICROSOFT OFFICE FILE FORMATS
| Format | File | Support Level | Best Practices |
|---|---|---|---|
| HTML | .html, .htm | Bit-Level | Must include all other referenced files, including CSS files and any includes. |
| Microsoft Word | .doc | Bit-Level | Though acceptable for deposit, the best practice is to convert to PDF prior to deposit |
| Microsoft PowerPoint | .ppt | Bit-Level | Disable all macros and other effects. Conversion to PDF is also an excellent option. |
| Microsoft Exel | .xls | Bit-Level | Disable all macros. You may also wish to export dataset into a tab-delimited text file (.txt) prior to deposit. |
| Format | |||
| Postscript | .ps | Format | |
| Rich Text | .rtf | Bit-Level | For increased support, you may wish to consider conversion to PDF |
| Plain Text | .txt | Format | It is recommended that .txt files be saved using UTF-* (Unicode) character set. |
| SGML | .sgm, .sgml | Bit-Level | Requires that the depositor also include the DTD along with the SGML file. |
| XML | .xml | Bit-Level | To ensure the best available support, include the DTD along with a well-formed XML file that is valid according to the included DTD. |
IMAGE FILE FORMATS
| Format | File | Support Level | Best Practices |
|---|---|---|---|
| BMP | .bmp | Bit-Level | |
| GIF | .gif | Format | |
| JPEG | .jpg | Format | When possible save using no compression |
| JPEG2000 | .jp2 | Bit-Level | This file format holds much potential once tools and support become more widely available. Best practice is to save master files using no compression. |
| PNG | .png | Format | |
| Photo CD | .pcd | Bit-Level | |
| Photoshop | .psd | Bit-Level | |
| TIFF | .tif, .tiff | Format | Considered the best format for storing your master images. Best practice is to save these files with no compression. |
AUDIO FILE FORMATS
| Format | File | Support Level | Best Practices |
|---|---|---|---|
| MPEG audio | .mp3 | Bit-Level | |
| Real Audio | .ra, .rm, .ram | Bit-Level | Proprietary format. It may be appropriate to convert to another format prior to deposit. |
| Wave | .wav | Format | Recommended format for capturing digital audio. This format can store all the data in an uncompressed format and its wide use suggests long-term community support. |
| Windows | .wma | Bit-Level |
VIDEO FILE FORMATS
| Format | File | Support Level | Best Practices |
|---|---|---|---|
| AVI | .avi | Bit-Level | |
| MPEG-1 | .mp1 | Bit-Level | |
| MPEG-2 | .mp2 | Bit-Level | |
| MPEG-4 | .mp4 | Bit-Level | |
| Quicktime | .mov | Bit-Level | |
| Windows Media Video | .wmv | Bit-Level |
Additional Information
It normally saves time to include planning for how digital content will be preserved in the initial design phase of a digital project. Before beginning any imaging or audio project, you may wish to review the following documents. Each addresses current best practices in the area of digital imaging and digital audio creation. Boston University librarians are eager to consult with you about your preservation planning. For assistance, please contact dcommon-help@bu.edu.
Digital Imaging
http://www.cdlib.org/inside/diglib/guidelines/bpgimages/cdl_gdi_v2.pdf
Collaborative Digitization Program
http://www.bcr.org/dps/cdp/best/index.html
Library of Congresshttp://www.digitalpreservation.gov/formats/content/still_preferences.shtml
National Archives and Records Administration (NARA)
http://www.archives.gov/preservation/technical/guidelines.pdf
Digital Audio
Collaborative Digitization Program
http://www.bcr.org/dps/cdp/best/index.html
http://deepblue.lib.umich.edu/bitstream/2027.42/40248/1/Audio-Best_Practice.pdf



