Skip to main content Link Menu Expand (external link) Left Arrow Right Arrow Document Search Copy Copied

Organize and Document - writing with a pencil

Organization and Documentation

Data documentation and organization supports how findable, accessible, interoperable, and reusable your data is.

Table of Contents

Organize and Document your Data

A category icon turning into two cats Raw data isn’t easy to understand and re-use. Raw numbers in a spreadsheet can be hard to interpret, and variable names might have abbreviations that make it hard for others to understand what it truly is (What did you mean when you made this column “cat2”? Did it mean cat, category, or something else?).

Documentation is an important best practice to make data easier to understand and reuse.

Some key questions:

  • If you need to use data you collected over a year ago, how easy would they be to find and use?
  • Would you know what all the variables and file names mean?
  • Would you have information about when/where/how the data was collected?
  • Have you ever gone to analyze data or publish a paper only to find that some critical piece of information was not recorded, or you don’t remember where you wrote it down?

Good documentation and organization helps research reliability, validity, reproducibility, and integrity.

Research Project Management

When doing your research, there are a lot of tools that can make your life easier.

Collaboration
Google Docs and Microsoft Office let your team work documents in real-time, avoiding multiple versions and copies sent by email.
Reference Management
Zotero, Mendeley, and EndNote support collaboration through shared citation libraries. Zotero is a free open-source resource that lets you take your library with you if you change institutions. McMaster also provides a paid license to EndNote for free for everyone on campus.
Notetaking Software
Obsidian, Evernote, OneNote, Notion, or an Electronic Lab Notebook allow you to create organized, linked notes that you can use to document your research practices.
Open Science Framework (OSF)
The Open Science Framework (OSF) is a free platform for research collaboration lets you manage files, data, code, and protocols in one central location. If you want to learn more, check out our workshop for project organization using the OSF.

File Organization

Something very basic that we sometimes slack on is keeping files organized. The key to organizing files is to make it a habit.

Keeping your files organized makes it easier to know where files are and where new files should go.

File organization schemes include:
  • By project
  • By researcher
  • By experiment type
  • By date (often year)
  • By some combination of the above
  • ie a two level structure of year -> project
a file system sorted by year, project, then experiment type

The example above shows files organized by year (2021), followed by project (T-rex bone microstructure), and then experiment type (Histology).

File Naming

Once you’ve come up with a solid folder structure, you can start giving your files good names.

Organizing your research files makes your data easier to understand, share, and archive - both now and in the future. A good file organization system should be descriptive, standardized, and implemented consistently.

Some key things that you might want to include in a file name include a date, a reference to the project, perhaps a short description, and maybe even the initials of the researcher that created the file. Version numbers and even locations are sometimes included as well.

The goal of file naming is to make your files easy to search for, and easy to understand.

e.g. 2022_10_01_LakeMercury_TestData1_TM_v3.csv

The file name above includes:

Date
2022_10_01 (collection date)
Project Name
LakeMercury
Short Description
TestData1
Name
TM (Tracy MacDern)
Version Number
v3

Without even having seen or worked on this file, you can easily understand what it is.

Tip

If you write dates at the start of your files with the year, month, and day (in that order), the file system on your computer will automatically organize them by date when you’re sorting your files alphabetically.

Once you’ve established a file organization and naming system, describe it in a README file for your research project and make sure everyone on your research team knows the system.

Try this quick quiz - how do you make a descriptive file and folder system?

Documentation Files

Once you have file names, and you have folder organization, you can start to think about documentation files.

Documentation files come in a number of different formats.

README
A simple text document (.txt) that describes folder hierarchy and file organization, description of important file contents, and any other important project information.
Data Dictionaries
A document for tabular data that describes names, labels, units, and contraints.
Codebooks
Like data dictionaries but for suvery or statistical data - includes the survey layout and structure, and codes for questions and answers.

Example README file

You can find an portion of an example README documentation file on the right side.

It starts off with some of the important project information, such as the project's name, the date that the project took place in, a small description about the project, funder information, and finally the researcher's contact information.

It then mentions the file organization system, followed by any naming coventions that the files follow.

Keeping an active README for your project is a key best practice. You can also expand on this with other details such as licenses, related publications, data and file overviews, and methodological information if you share your data.

Kristin Briney's README file outlining project details, organization, and filenames Image credit: Data Ab Initio

Build a documentation scheme you will actually use!

The most important aspect of documentation is doing it. If you choose a system that’s too complicated, you’re less likely to follow it.

Whatever file naming and organization scheme you choose, make sure it’s descriptive, use it consistently, and document it (in a README.txt file).

Take advantage of the software that is out there, including note-taking software, reference management software, and collaboration software.

Try this quick quiz - what should go in a README file?

Key Points / Summary

  • Good documentation and organization supports reliability, validity, reproducibility, and integrity.
  • There are lots of tools that make collaboration, reference management, and notetaking easier.
  • Build a documentation scheme you will actually use and make file organization a habit.
  • README files, data dictionaries, and codebooks help you and others understand your data.

Additional Resources

Research Project Management Software