What are Metadata?
Metadata are documentation for your data set, expressed using a formal syntax. They are used to record all information needed for your data’s future use, and attached to the data set itself, usually as separate files. Here is a list of what metadata might contain:
- a brief or detailed description of the data itself;
- names, labels and descriptions for variables, records and their values;
- explanation of codes and classification schemes used;
- codes of, and reasons for, missing values;
- derived data created after collection, with code, algorithm or command file used to create them;
- weighting and grossing variables created and how they should be used;
- data listing with descriptions for cases, individuals or items studied, for example for logging qualitative interviews;
- descriptions of applications (commercial or open-source) were used to run analyses, and the versions of those applications;
- descriptions of file formats used to store the data;
- documentation of experimental protocols;
- documentation of the code written for statistical and other analyses.
What Kind of Metadata Should I Use?
That depends on your field. Here are some standards and controlled vocabularies (standard terminology for specific fields). If you don’t see your field represented, let’s talk — we may be able to find a standard for you, or help in some other way.
- Astronomy Visualization Metadata Standard
- Content Standard for Digital Geospatial Metadata
- Darwin Core
- Dublin Core
- Ecological Metadata Language
- Data Documentation Initiative (DDI)
- Swingle Plant Anatomy Collection data dictionary
- Altova Schema library
- Seeing Standards: A Visualization of the Metadata Universe (humanities: information on over a hundred cultural heritage metadata standards)
- Text Encoding Initiative
What Else Should I Document?
This really depends on your project, but here are some ideas to get you started. We will be happy to help you come up with a final list of documentation needed for your dataset.
- context of data collection (project history, aims, objectives, hypotheses);
- data collection methods (data collection protocol, sampling design, instruments, hardware and software used, data scale and resolution, temporal coverage and geographic coverage);
- structure and organization of data files;
- data sources used;
- data validation, quality assurance procedures carried out;
- transformations of data from the raw data through analysis;
- information on confidentiality, access & use conditions.