TDM 20200: Project 6 — 2024

Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we are going to dive into plotting using plotly, a very popular open source graphing library that can interact graphs online.

Context: Create good visualizations about housing data using plotly. In the next project, we will continue to learn about and become comfortable using plotly.

Scope: Python, visualizations, plotly

Learning Objectives
  • Demonstrate the ability to create basic graphs with default settings.

  • Demonstrate the ability to modify axes labels and titles.

  • Demonstrate the ability to customize a plot (color, shape/linetype).

Dataset(s)

The following questions will use the following dataset:

  • /anvil/projects/tdm/data/zillow/Metro_time_series.csv

Readings and Resources

  • Make sure to read about, and use the template found here, and the important information about projects submissions here.

  • You may refer to plot introduction

  • Read about the plotly express functions on this page.

Questions

Dr Ward created 4 videos to help with this project.

Question 1 (2 points)

  1. Read in the data and called it myDF.

  2. Update the column Date to the datetime data type.

  3. For each Date, sum the values in the column InventoryRaw_AllHomes.

  4. Now plot a line chart that shows the overall housing inventory (i.e., the results from question 1c), for the date in the years 2010 through 2015.

  • You might choose to use to_datetime() to convert the data type in part b.

Some students turned in the previous project without executing their code in their Jupyter Lab notebook! Please remember that you need to run all of your cells before submitting your Jupyter Lab notebook.

Question 2 (2 points)

For this question, United_States is one of the regions, but we want you to ignore that region in this question.

  1. Using the column InventoryRaw_AllHomes, group the data according to the RegionName, and find the mean of this column for each RegionName.

  2. Similar question: Using the column AgeOfInventory, group the data according to the RegionName, and find the mean of this column for each RegionName.

  3. Now make a bar chart from your results in part 2a, to visualize the top 5 regions with the highest inventory of homes (on average, in those regions).

  4. Similarly, use the results from 2b, to make a bar chart to visualize the top 5 regions with the oldest inventory of homes (on average, in those regions).

  • groupby() and mean() are useful to get the average values

  • sort_values() is useful to sort data in order

Question 3 (2 points)

  1. Convert your work from question 2d into a box plot, so that you can visualize the top 5 regions with the oldest inventory of homes (on average, in those regions).

Question 4 (2 points)

  1. Extract the data from the columns Date and ZHVI_SingleFamilyResidence, for the RegionName 28580.

  2. Use this data to demonstrate how the Zillow Home Value Index for Single Family Homes has fluctuated in the RegionName 28580 during the available time period in the data set.

  • Be certain to document your work, i.e., how did you make the plot, and what did you learn from the plot?

Question 5 (2 points)

  1. Now make a data visualization of your own, using the data set. Be sure to explain why you decided to choose the data that you analyzed from this data set, and be sure to explain your reasoning. Also be sure to explain the type of visualization method that you chose. Your work should be well documented throughout this question.

Just (gently) repeating the earlier warning: Some students turned in the previous project without executing their code in their Jupyter Lab notebook! Please remember that you need to run all of your cells before submitting your Jupyter Lab notebook.

Project 06 Assignment Checklist

  • Jupyter Lab notebook with your code, comments and output for the assignment

    • firstname-lastname-project06.ipynb

  • Python file with code and comments for the assignment

    • firstname-lastname-project06.py

  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.