Data Documentation and Metadata

Why Document?

In order for your data to be used properly by you, your colleagues, and other researchers in the future, the data must be documented. Data documentation (also known as metadata) enables you to describe the content, formats, and internal relationships of your data in detail and will enable other researchers to find, use, and properly cite your data.

It is critical to start documenting your data at the very beginning of your research project, before data collection begins. Doing so will make documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.

What to Document

Research Project Documentation

  • Context of data collection
  • Data collection methods
  • Structure, organization of data files
  • Data source used (see citing data)
  • Data validation, quality assurance
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access & use conditions

Data Documentation

  • Variable names and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format and software (including version) used

How to Document Data

Researchers can choose among various metadata standards, often tailored to a particular discipline or file format. The Digital Curation Center has created a directory of discipline specific metadata standards:

Data centers and subject-specific repositories may require specific metadata in order to deposit your data. Check with any repositories before you begin outlining the metadata plan for your data. If you have doubts about which metadata fields are required for your repository, contact us.

Below are some general aspects of your project and data that you should document, regardless of your discipline. At minimum, store this documentation in a "readme.txt" file, or the equivalent, with the data itself.

Citing Data

It is important to reinforce the citation of data in your presentations and papers as well as to give a permanent location to your data for others to use and cite. Indeed, many publications are now requesting a permanent URL for your data in order to publish your paper. Following are the elements to include in a citation:

  • Author(s)
  • Title
  • Year of publication: the date when the dataset was published or released (rather than the collection or coverage date)
  • Publisher: the data center/repository
  • Any applicable identifier (including edition or version)
  • Availability and access: URL or other location information for the data

This material adapted from MIT Libraries, California Digital Library/UC3, and University of Oregon Libraries, used under a Creative Commons Attribution-Share Alike license: Metadata standards information adapted with permission from the University of Wisconsin-Madison.