Analyze

The way you manage your data during analysis depends entirely on the type of data you’re using and what you’re doing with it. There are, however, several strategies you can adopt to avoid disaster, save time, and improve your ability to make sense of your work later on.

Keep your data secure. 

Save your raw data. It is vitally important to maintain a copy of your data in its rawest, least processed form. This allows you to start over if something goes wrong, or to re-analyze the same dataset testing different variables or protocols.

  • Consider saving snapshots of your data at a number of different stages (e.g., raw, cleaned up, subsetted).
  • Distinguish between these datasets in the file names and/or documentation.

Control your versions. Keeping track of file versions can be done via consistently applied naming conventions. In projects that involve code or software development where there are frequent edits or multiple contributors, consider using a more elaborate version control system. Git is a popular choice, but your research community or lab may have a preferred environment.

Back things up. Proper storage and backup strategies are key to preventing catastrophic data loss due to things like hardware failure, natural disaster, computer viruses, or theft. Maintaining working copies of your data requires thoughtful consideration of hardware, redundant storage locations, and a disaster plan.

  • LOCKSS (“lots of copies keep stuff safe”) is a helpful motto to remember. The more copies of your data, the better...as long as they’re not all in the same place.
  • Use the 3-2-1 backup rule as rule of thumb: 3 copies, on 2 different types of storage media, 1 off-site.
  • Test your system frequently to make sure it’s working.

Document your steps. 

Whether for your future self or other researchers, it is crucial that you describe the process of your analysis. This can mean taking good notes, saving log files, or capturing your every step in an electronic lab book. Be sure to keep a copy together with any data or code you produce so that you can follow your trail later on.

  • Scan paper notebooks, especially if they contain sketches or annotations that may not be captured by transcription.
  • Include any pre-processing or data-cleaning steps to ensure reproducibility. 
  • Electronic Lab Notebooks (ELNs) can help you automate the process. 

Tools and Resources