How to Cite Data
An article, paper or presentation that refers to, or draws, information from a data set should cite the data set, just as it would cite other sources such as books and articles. A citation gives appropriate credit to the data set creator(s), and allows interested readers to find the data set so they can confirm the data is being correctly represented, or can use it in their own work.
There is no universal standard for formatting a data set citation, but this guide suggests two approaches that can be used to create a reasonable citation:
- Consult the style manual for the citation style you are using, and adapt it as needed to fit data sets.
- Consult the supporting material for the repository or archive in which the data set resides, and see if it suggests a format for citations.
As with all questions of citations and formatting, the ultimate authority is the style guide for the publication to which the article will be submitted (or the instructor, in the case of class papers).
Starting with a Citation Style
There are many different styles for formatting citations, such as APA and Chicago Manual of Style. In addition, most publications have their own style, either unique to themselves or based on an existing style.
A few of these styles, such as APA 6th edition, specify how to cite data sets; if you are using such a style, you should follow its suggestions.
However, most citation style manuals do not currently cover citing data sets. In such cases, you can adapt the styles’ general format to the needs of data sets.
Typically, the information needed about a data set is:
- Year of Publication
- Edition or version
- Access information
This is quite similar to the information needed for a book, so the styles’ book format can generally be adapted.
For example, take the data set in ICPSR for the Chicago Longitudinal Study 1986-1989. It has the following information:
- Author(s): Arthur Reynolds (the principal investigator can be used as the “author” of a data set)
- Title: Chicago Longitudinal Study, 1986-1989
- Year of Publication: 2009-08-07 (the date the data set was added to ICPSR)
- Publisher: Ann Arbor, MI: Inter-university Consortium for Political and Social Research (publisher information traditionally includes the location of the publisher)
- Edition or version: None (there has only been one published version of this data set)
- Access information: doi:10.3886/ICPSR25921 (ICPSR provides Digital Object Identifiers (doi) for its data sets, which can be used for access. For data sets without a doi, the URL can be used instead.)
With this information, a citation can then be created in the desired style. For the Chicago Author-Date style, books are cited in this manner:
Pollan, Michael. 2006. The Omnivore’s Dilemma: A Natural History of Four Meals. New York: Penguin.
Following this model and adding access information (not needed for print books), the data set could be cited as:
Reynolds, Arthur. 2009. Chicago Longitudinal Study, 1986-1989. Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR25921
Citations for other styles can be created in a similar manner.
Starting with the Repository or Archive
If the data set you want to cite comes from a standard repository or archive, it is possible that the repository provides suggestions on citing its material. Here are examples from some standard repositories:
- Dryad (repository of data in the basic and applied biosciences): How to cite data from Dryad
- ICPSR (data archive of social science research): Includes a recommended citation with all its data sets.
- Roper Center Public Opinion Archives: How to cite Roper Center data
- NOAA Paleoclimatology data: Data Citation
- How to Cite Data from Michigan State University Libraries. Includes useful bibliography for further reading on the topic.