Fall 2022 Datathon Project: Transit and Housing in California
By Gain Boonvanich, Anita Ding, Yixin Huang, and Yehchan Yoo
Introduction to Problem and Why It is Important
Background and Domain Knowledge
Data and Methodologies
First, let's import some packages.
Gathering Housing and Population Data
We first downloaded the following data about housing units per capita in each county in California from U.S. Census.
Looking at Transit Passengers
For this section, we downloaded data containing the total number of transit employees by county. (Link) Here, we tried to look at the relationship between number of transit employees and housing units per capita. We used data from 2017, as 2017 was the most recent year from which the transit employee data was available.
Graphing Transit Financial Data/Fitting Linear Regression Model
We used data from the following link for housing units per capita data for this section. Like in the previous data set,We looked at six prominent Californian counties: Los Angeles County, San Diego County, Orange County, Riverside County, San Bernardino County, and Santa Clara County.
Conclusion
Impact and Future Avenues
To be honest, our work right now is pretty much showing that the current transit cost allocation is pretty reasonable, and we could predict, based on how much transit money the state government of California could allocate, the housing units per capita which the private business sectors (real estate companies) have the powers over. We could potentially apply our models to other counties (additional data) besides the six that we analyzed, and we do believe that linear regression would be a reasonable model, although it is quite simple (but yet powerful).