What Is “Research Data”?
Data are distinct pieces of information, usually formatted in a special way. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word. In database management systems, data files are the files that store the database information.
Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. The word “data” is used throughout this site to refer to research data.
Research data can be generated for different purposes and through different processes, and can be divided into different categories. Each category may require a different type of data management plan.
- Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neurological images.
- Experimental: data from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
- Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
- Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
- Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.
Research data may include all of the following:
- Text or Word documents, spreadsheets
- Laboratory notebooks, field notebooks, diaries
- Questionnaires, transcripts, codebooks
- Audiotapes, videotapes
- Photographs, films
- Test responses
- Slides, artifacts, specimens, samples
- Collection of digital objects acquired and generated during the process of research
- Data files
- Database contents including video, audio, text, images
- Models, algorithms, scripts
- Contents of an application such as input, output, log files for analysis software, simulation software, schemas
- Methodologies and workflows
- Standard operating procedures and protocols
The following research records may also be important to manage during and beyond the life of a project:
- Correspondence including electronic mail and paper-based correspondence
- Project files
- Grant applications
- Ethics applications
- Technical reports
- Research reports
- Master lists
- Signed consent forms
Data vs. Information
Data are plain facts. When data are processed, organized, structured or presented in a given context so as to make them useful, they are called information.
It is not enough to have data (such as statistics on the economy). Data in themselves are fairly useless. But when these data are interpreted and processed to determine their true meaning, they become useful and can be called information. Data is the computer’s language. Information is our translation of this language.
Data vs. Metadata
Metadata is structured data about data, of any sort in any media, that imposes order on a disordered information universe. In database management systems, metadata are index files and data dictionaries that store administrative information.