Methodology
Below are all imports to be made. See requirements.txt for all install packages for this analysis.
Setup
Big Query
A collection of our Big Query requests found in the
This chart is empty
Chart was probably not set up properly in the notebook
Clean Comparison Dataframes
Analysis
New York Survey Data
Figure 1.1
Figure 1.2
Figure 1.3
A larger portion of the participants do not bike. It is possibly that these participants don't particularly use public transportation in any circumstance. Individuals classified as WEIRD (western, educated, industrialized, rich, and democratic) might less inclined to utilize the accessible programs in within their neighborhood. Interesting enough, we found the second largest group stating that the program does not have access to the neighborhoods. The next step in this analysis will connect the survey responses with "Not in my neighborhood" to station areas in NY bike-share data via the stations table.
Figure 1.4
While the initial find that the bike share program was not in their neighborhood was promising, these participants often have a few stations present locally. These stations might not be accessible due to usage or lack bikes available. As the stations table in the NY Citi Bike Big Query dataset doesn't include the number of bikes used per day for the stations, our analysis with the current zip codes has reached a culmination.
San Francisco Survey Data
Only 19 respondents of 804 respondents for the survey indicated that they used a bike as a mode of transportation.
Only 1 respondent rode a bike for more than one of the days asked about by the survey.
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
There aren't any huge differences between the purposes of bike trips and overall trips except that no bike trips were for traveling to school. Looking at the demographics of the respondents for the survey, it may not be representative of the overall population of San Francisco. Most have an income between $100k and $200k, so it makes sense that more trips would be by other types of transportation.
Big Query Comparison Visualizations
Figure 3.1
Figure 3.2
There are significantly more subscribers than customers in both cities.
Figure 3.3
In both cities, customers take marginally longer rides that subscribers. Subscriber ride use remains fairly consistent, regardless of the month, but customer ride time is more variable.
Figure 3.4
San Francisco bike share use seems to ramp up in the fall, while New York seems to be a bit more consistent with only a small increase in the fall. Both cities see a pretty dramatic fall is use in December.
Figure 3.5
Most trips for both New York and San Francisco are taken by riders between ages 20 and 40. In both datasets, there are some outliers that don't make sense, like riders over 100 years old.
San Francisco Geospatial Visualizations
Figure 4.1.1
Figure 4.1.2
There are longer rides towards the outskirts of the city. In general, the central areas see rides mostly under 30 minutes.
Figure 4.1.3
Many of the popular routes seem to be near water or in the city center. There also seem to be 'nodes' near popular landmarks where riders either pick up or drop off bikes.
New York Geospatial Visualizations
Figure 4.2.1
Figure 4.2.2
As with San Francisco, New York seems mostly shorter rides (under 1 hour). The range of average ride length is much larger for New York. The longer rides are a bit further outside of the city.
Figure 4.2.3
Many of the most popular routes again are near water, as with San Francisco. New York also sees many rides through central park.
Significance Testing
We conducted several significance tests to assess the differences in riders between cities. The results of the significance tests tell us... (1) the average duration of female riders in SF is less than the average duration of female riders in NYC (2) the average duration of male riders in SF is less than the average duration of male riders in NY (3) the average distance of female riders in SF is less than the average distance of female riders in NY (4) the average distance of male riders in SF is less than the average distance of male riders in NY (5) The proportion of female riders in SF is less than the proportion of female riders in NY (6) There are no significant differences in the proportions of any generation between cities.
All significance testing results can be found in the notebook: https://github.com/kendalldyke14/SanFranciscoBikeShare/blob/main/Sig_tests_bike_share.ipynb. Due to memory issues in the DeepNote environment, many of the tests involving the NYC data could not be run in this DeepNote notebook and were run locally.
Discussion
Our exploratory data analysis incorporated additional transportation data to understand riders sentiment. While the survey initially seemed promising, we found that there were only 9 participants in the San Francisco survey and 259 participants within the New York survey who were actively using the public bike share programs.
Looking forward, this team plans to expand this analysis to national and international programs. Though there are some additional cities that can be incorporated from Big Query, there are many more bike share programs that don't use this free program. They instead have a smaller API that is supported by Bike-share research (bikeshare-research.org). It is called pyBikes, and they have large city ride top level daily numbers. Though some of our analysis focuses on the difference between subscriber and customer behavior, we would still be able to compare do to some of our time series analysis techniques.