Scraping the EPL Stats Website (Deepnote Edition)
Dynamic Website Scraper built in Deepnote (designed to run anywhere e.g. locally). Duplicate to run in your own Deepnote Project.
The Official EPL Stats Website is a great source of quality data in regards to Football, specifically the English Premier League. The Player Goals Table specifically fetches data from the Pulse Live Football API (footballapi.pulselive.com). Use browser tools > network (refresh page) > XHR to see the
goals? request. However, if you try and request that data yourself through the browser, e.g copy paste the query to pulse live API, your request will be denied! 🙅♂️
So, we need to scrape the HTML. However, if we were to simply request the page's html (via http get request), we'd get the initial page load which defaults to the all-time goal scorers list (
requests example below). We'd also have a problem scraping all the goal scorers because the table is paginated. So..we know the data in the table is dynamically loaded, and a simple http request of the page is not enough.
All this means it's time to break out the headless browser, automate with watir, parse tables with pandas, sit back...and enjoy 👻🤖🐼💅