Getting Started

What data should I manage?

We refer to research data as any information or artifact that serves as evidence for a research discovery or result. Data can be qualitative (e.g., text interviews, images and videos, audio recordings) or quantiative (e.g., tabular data, structured databases). Our focus here is on digital data, both qualitative and quantiative.

As part of your research data management you should manage any data and code, as well as documentation about them, that are created or used as part of a research project. This might include:

  • Quantitative and qualitative data
  • Primary (raw) and secondary (cleaned or analyzed) data
  • Notes
  • Laboratory or research notebooks
  • Codebooks
  • Code or software used to run data analyses
  • Data workflows or pipelines
  • Metadata (documentation describing the data)

Overall, you should know the location of all data produced by or used in a research project. It should be annotated sufficiently so that others can understand and reproduce your work, and possibly re-use your data in future studies.

What should I document about my research data?

Depending on your type of data, you might need to document some or all of the items below. Keep in mind what is the sufficient information needed for others to be able to use your data, understand or replicate your work.

Research Project Documentation:

  • Rationale and context for data collection
  • Data collection methods
  • Structure and organization of data files
  • Data sources used
  • Strategies for data validation and quality assurance
  • Analytical steps and pipelines (if any) used to process data
  • Information on data confidentiality, access and use conditions

Dataset documentation:

  • Variable names and descriptions (for quantitative data)
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data (including code)
  • File format (including version) for any software used