Metadata

What are Metadata?

Metadata are documentation for your data set, expressed using a formal syntax.  They are used to record all information needed for your data’s future use, and attached to the data set itself, usually as separate files. Here is a list of what metadata might contain:

  • a brief or detailed description of the data itself;
  • names, labels and descriptions for variables, records and their values;
  • explanation of codes and classification schemes used;
  • codes of, and reasons for, missing values;
  • derived data created after collection, with code, algorithm or command file used to create them;
  • weighting and grossing variables created and how they should be used;
  • data listing with descriptions for cases, individuals or items studied, for example for logging qualitative interviews;
  • descriptions of applications (commercial or open-source) were used to run analyses, and the versions of those applications;
  • descriptions of file formats used to store the data;
  • documentation of experimental protocols;
  • documentation of the code written for statistical and other analyses.

What Kind of Metadata Should I Use?

That depends on your field. Here are some standards and controlled vocabularies (standard terminology for specific fields). If you don’t see your field represented, let’s talk — we may be able to find a standard for you, or help in some other way.

What Else Should I Document?

This really depends on your project, but here are some ideas to get you started. We will be happy to help you come up with a final list of documentation needed for your dataset.

  • context of data collection (project history, aims, objectives, hypotheses);
  • data collection methods (data collection protocol, sampling design, instruments, hardware and software used, data scale and resolution, temporal coverage and geographic coverage);
  • structure and organization of data files;
  • data sources used;
  • data validation, quality assurance procedures carried out;
  • transformations of data from the raw data through analysis;
  • information on confidentiality, access & use conditions.