Scraping the r/soccer subreddit to understand the post history of 'Big 6' PL clubs in 2022
Downloading the required data from r/soccer
Importing the essential libraries
Using the Reddit app credentials to create a connection
Doing keyword search for the Big 6 PL clubs in r/soccer
Creating a filepath and folder structure to store the data
Creating a pandas dataframe and loading the subreddit data into a csv file
Final Code
Data Deep-dive using Pandas and Matplotlib
Prelim Checks
Creating a flag for each record based on the PL club
Creating a date column
Plotting the data using Matplotlib
The highest number of posts are understandably coming from the month of Aug'22 which is the final month of the summer transfer window, while you see peaks during May'22 and Jan'22. The peak at May'22 can be attributed to the end of the PL season while the Jan'22 number can be explained by the Winter transfer window.
We can clearly see that Chelsea had the most number of reddit posts in r/soccer in the time period we are considering while Tottenham had the least. Manchester United, Liverpool and Arsenal all have a similar post count while Manchester City is in the bottom 2. One thing to look at here would be the number of posts that were from the Summer transfer window to see how that impacted the overall post count as Chelsea and Manchester United both had a very busy transfer window compared to the other clubs (By busy, I mean the number of players that these two clubs were linked with was relatively higher than the other 4 clubs)
As you can see, for Chelsea and Manchester United, ~50% of the posts came during the Summer transfer window 10 June - 31 August) which in turn has increased their overall post count and hence became a major factor in them being in the top 2. We could see a better illustration of this point in the chart below.
As mentioned above, Manchester United and Chelsea have ~50% of their posts in 2022 coming during the summer transfer window, while Liverpool are at the bottom with only 27% of their posts coming from the transfer window. Although there were friendlies and the first few matches of the new season also being part of this period, posts related to them would be almost identical for all the clubs and hence the excess can be quite clearly attributed to the transfer activity for the respective clubs.
Since both the number of upvotes and comments are positively skewed, I am showing the median number of upvotes and comments for all the PL clubs in 2022 to better understand the level of interactions happening across clubs.
Despite being in the bottom in terms of total number of posts, Tottenham has the highest median number of upvotes and comments per post in 2022, which signals decent number of interactions in most of the posts involving the club in r/soccer.