There’s no one way to organize your data, but a consistent and descriptive file structure can save you time and money later. Use a system that makes sense to you so that keeping things in order becomes a habit instead of a chore.
- Save money. Well-organized, easy-to-find files make a big impact on the efficiency of your research. This is especially important if you are sharing active data with collaborators.
- Increase your impact. In the long run, well-managed data are more discoverable, accessible, and reusable (including by your future self!). This will help increase the visibility and impact of your work.
- Demonstrate integrity. Discoverable, accessible, and reusable data are fundamental to ensuring reproducible or replicable research.
- It’s required. Most funding agencies now require a data management plan for grant proposals. Publishers increasingly insist on data sharing as a condition of publishing your work. Organized, reusable data help demonstrate compliance.
Making sure data are accessible in the future is a challenge, but choosing file formats carefully helps avoid obsolescence. Use formats that are:
- Non-proprietary, open, documented standards (e.g., .tif, .txt, .csv, .pdf)
- Encoded with standard characters (e.g., ASCII, UTF-8)
- Used commonly in your research community
Adopt a naming convention and use it throughout a project (or throughout your career). Consider including a README.txt file that explains your naming convention and any codes or abbreviations you use. File names should:
- Describe the contents of the file, but not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
- Include dates. Don’t rely on system dates, which can be misleading. Recommended formats look like: YYYYMMDD or YYYY-MM-DD.
- Reserve 3-letter file extensions for application-specific codes (e.g., .jpg, .mov, .tif).
- Not contain special characters like "/ \ : * ? " < > [ ] & $. These have meaning in software and operating systems and can cause trouble.
- Not contain spaces. These are problematic for some operating systems. Use underscores (file_name), dashes (file-name), or camel case (FileName) instead.
Tools and Resources
- Don’t name your files one at a time! Use a free batch-renaming tool:
- OpenRefine is a powerful, free and open source tool for working with messy data.
- The Texas Advanced Computing Center (TACC) offers powerful advanced computing resources and consultation for managing complex data collections at the terabyte and petabyte scale.
- Information Technology Services (ITS) offers virtual machine hosting for storing and managing data collections, as well as common good services like email and encryption, data security, and network access.