What is a Data Repository?
According the the Registry of Research Data Repositories (re3data.org) a data repository is a
subtype of a sustainable information infrastructure which provides long-term storage and access to research data that is the basis for a scholarly publication. Research data means information objects generated by scholarly projects for example through experiments, measurements, surveys or interviews.
In other words, a data repository provides long-term storage to the data that supports scholarly publications. Data repositories are institutional efforts to provide sustainable preservation to the data created by researchers. Data repositories serve to ensure research data is accessible beyond the life of a grant, research project, or individual careers.
Things to Consider
Many data repositories exist today. Some will be a better fit for your needs more than others. Here are some tips on selecting a data repository for your research:
Is the repository a reputable source? Check to see if it is endorsed by a funding agency, scholarly journal, professional society, library, or if it is listed in the Registry of Research Data Repositories. Publishing your data, like publishing an article, is best done with a reputable partner that is backed by an institution or your research community.
Having you data deposited in a repository that is unsustainable defeats the point of depositing it. This is why it is important to make sure your repository has the support of an institution, community, or funder. You’ll want to ensure the depository you select will be providing access to your data for well over 5 years. Many repositories will also have preservation plans and contingency plans on the outside chance funding is ceased. Lastly, don’t be afraid to ask about these plans.
One of the primary reasons to deposit your data in a repository is to obtain a unique identifier that others can use to cite your data. This service will increase the visibility of your data within the scholarly literature and allows researchers to find it later on. Ensure your data repository offers a DOI (digital object identifier), handle, or another unique indentifer.
Another way to think about visibility is to ask if researchers in your field use a repository. Some disciplines have an agreed upon repository that everyone uses and knows about. Ensure that you’re putting your data where the appropriate researchers are likely to find it (and hopefully use it).
The usability of a data repository is also important in ensuring that others will be able to access your data. Unfortunately, not all repositories have the funding to create great web interfaces with simple, intuitive interactions. However, if your peers are unable to find and download your data it will limit the effectiveness of sharing your data. A usable data repository should allow for users to easily upload, download, and cite data sets.
Some data repositories have really great features like integrations with Open Science Framework, GitHub, or other commercial storage solutions. While these feature may not be the keystones to providing long-term access to your data, they can help you share your data more frequently and effectively. Additionally, an author dashboard (a place you can view statistics, like downloads, on your data sets) or easy-to-understand licensing, like Creative Commons, can make your life a little easier. Lastly, you’ll want to review the upload and storage limits. Some repositories offer limited free storage before a fee is charged. Be sure to look over each data repository’s features and compare them with comparable services.
Most data repositories are able to handle most formats; however, this doesn’t always guarantee that they’ll be able to work with your data. Be sure to take a look at the repositories documentation to ensure they can store the data you’ve generated. In addition, see if the repository can generate previews or provide other user interactions with your data. While these features are not essential from a preservation perspective, they do help users understand and access your data.
For additional help selecting a data repository you can email us or review the following site and materials:
- Registry of Research Data Repositories
- FORCE11 (The Future of Research Communications and e-Scholarship) Data Publications
- NIH Data Sharing Repositories
- Data Seal of Approval
- Data Cite
OpenBU is the institutional repository for all creative and scholarly research outputs of Boston University. BU Research Data is an archive collection within OpenBU for digital research data generated by the university’s faculty, researchers, students, alumni, and staff. OpenBU provides long-term digital preservation and open access to data. All data in the collection are curated to increase potential for access and are assigned permanent links (Handles).
For help with using OpenBU, please contact us!
There are a number of data repositories available to scholars beyond BU. But there are a few things to keep in mind when you deposit your data.
A “general” data repository is subject independent and will have data from many fields. General data repositories are often well-known solutions with large user communities. General repositories are great places to store all your data because they tend to have robust features (like simple GitHub integration), strong institutional backing, and are indexed by major search engines like Google and Bing. However, the downside of general repositories is that because there is a lot of everything, users might have more difficulty finding your work.
A few examples of general data repositories are:
- Dryad: Dryad is a non-profit membership organization. Members like Oxford University Press and the American Association for the Advancement of Science help finance Dryad whose mission is “to provide the infrastructure for, and promote the re-use of, data underlying the scholarly literature.” Dryad worked with over 80 journals and published 3,927 data packages in 2015.
- Figshare: Figshare is a part of the Digital Science portfolio of services. Although only some of Figshare’s services are free to users, it’s business model also includes working with institutions, publishers, and researchers to fund its data repository. Since 2012 Figshare has 800,000 user uploads, 7.5 millon downloads, over 26 million page views.
- Harvard Dataverse: Harvard’s Dataverse is both a platform for institutions and a data repository. Backed and developed by Harvard’s IQSS, Libraries, and Information Technology, Dataverse has 22 installations with over 48,000 datasets, and 2 million downloads.
- Zenodo: Funded by CERN, OpenAIRE, and Horizon 2020, Zenodo accepts 50GB per dataset and integrates nicely with GitHub. While Zenodo doens’t seem to detail its download numbers like other services, it is partnered with CERN, which stores more than 100PB (petabytes) of data.
Many subject-specific data repositories exist today. Unlike a general data repository, discipline-based repositories can be very specific and well-known within a particular field. This can be both a good thing and a bad thing. On the upside, if your field has a specific repository you’re data will likely be seen by the right people - increasing its chance for reuse and further influence. The downside is that researchers outside of that discipline might not know where to look for your data. Generally speaking, if a subject-specific data repository exists for your research it is a good idea to use.
Finding, listing, and keeping up with all the repositories in existence is best done by directories. A few we recommend:
- Re3data.org: The Registry of Resarch Data Repositories is a service provided by DataCite (a global non-profit that provides DOIs - Digial Object Identifiers). With over 1,500 data repositories listed, re3data.org is likely to have a repository in your discipline.
- OpenDOAR: OpenDOAR (Directory of Open Access Repositories) is an curated and authorative list of academic open access repositories. Not only do staff of OpenDOAR visit each repository listed but they also review each repository for quality (a pretty big task considering they have 2,600 listings). Included in OpenDOAR are datasets, articles, books, and software.
- Simmons College hosts the Open Access Directory’s list of Data Repositories. The Open Access Directory is maintained by the Open Access community and an editorial board. It includes repositories ranging from archaeology to physics.