TDM 20200: Project 6 — 2024
Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we are going to dive into plotting using plotly
, a very popular open source graphing library that can interact graphs online.
Context: Create good visualizations about housing data using plotly
. In the next project, we will continue to learn about and become comfortable using plotly
.
Scope: Python, visualizations, plotly
Dataset(s)
The following questions will use the following dataset:
-
/anvil/projects/tdm/data/zillow/Metro_time_series.csv
Readings and Resources
|
Questions
Dr Ward created 4 videos to help with this project. |
Question 1 (2 points)
-
Read in the data and called it
myDF
. -
Update the column
Date
to thedatetime
data type. -
For each
Date
, sum the values in the columnInventoryRaw_AllHomes
. -
Now plot a line chart that shows the overall housing inventory (i.e., the results from question 1c), for the date in the years 2010 through 2015.
|
Some students turned in the previous project without executing their code in their Jupyter Lab notebook! Please remember that you need to run all of your cells before submitting your Jupyter Lab notebook. |
Question 2 (2 points)
For this question, United_States
is one of the regions, but we want you to ignore that region in this question.
-
Using the column
InventoryRaw_AllHomes
, group the data according to theRegionName
, and find themean
of this column for eachRegionName
. -
Similar question: Using the column
AgeOfInventory
, group the data according to theRegionName
, and find themean
of this column for eachRegionName
. -
Now make a bar chart from your results in part 2a, to visualize the top 5 regions with the highest inventory of homes (on average, in those regions).
-
Similarly, use the results from 2b, to make a bar chart to visualize the top 5 regions with the oldest inventory of homes (on average, in those regions).
|
Question 3 (2 points)
-
Convert your work from question 2d into a box plot, so that you can visualize the top 5 regions with the oldest inventory of homes (on average, in those regions).
|
Question 4 (2 points)
-
Extract the data from the columns
Date
andZHVI_SingleFamilyResidence
, for theRegionName
28580. -
Use this data to demonstrate how the Zillow Home Value Index for Single Family Homes has fluctuated in the
RegionName
28580 during the available time period in the data set.
|
Question 5 (2 points)
-
Now make a data visualization of your own, using the data set. Be sure to explain why you decided to choose the data that you analyzed from this data set, and be sure to explain your reasoning. Also be sure to explain the type of visualization method that you chose. Your work should be well documented throughout this question.
Just (gently) repeating the earlier warning: Some students turned in the previous project without executing their code in their Jupyter Lab notebook! Please remember that you need to run all of your cells before submitting your Jupyter Lab notebook. |
Project 06 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project06.ipynb
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project06.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |