Link Search Menu Expand Document

Part 1: Explore the data

In this part, we’ll investigate the data to consider its usefulness and how it may need to be transformed before we can do more in-depth analyses.

Follow along with the video and the instructions beneath.

Open your data in a spreadsheet

If you haven’t already downloaded and opened a properly-encoded version of the dataset, following the instructions on the preparation page before proceeding.

Get to know the data

Take some time to understand the data structure. What do the columns mean? What data types are there (categories, dates, numbers, etc.).

  • Is the data in wide or long format?
  • Explore some of the variables in each column. Activate the column filters and inspect the unique values. Try sorting some of the numerical columns from highest to lowest (and vice-versa).
  • Undo your changes each time if you want to go back to the data’s original order.

Exercises

Now that you have some familiarity with the raw data, answer a few questions:

  1. Before doing any analysis, what do you think the data could tell you? What can’t it tell you?
  2. Imagine the dataset is a human source. What questions would you ask of it during an interview?

Once you’ve completed the exercises, continue to part 2 to reshape and visualize the data.