match data features: ['match_type', 'date', 'home_team', 'away_team', 'result', 'capacity']
stadium data features: ['stadium_name', 'href', 'city', 'capacity']
Features
X1: Capacity of the stadium
X2: Home attendance percentage
X3: Current ranking of home teams
X4: Current ranking of away teams
X5: Ranking of home teams in the last season
X6: Ranking of away teams in the last season
X7: Result of the 1st match in the last 5 Premier League matches of home teams
X8: Result of the 2nd match in the last 5 Premier League matches of home teams
X9: Result of the 3rd match in the last 5 Premier League matches of home teams
X10: Result of the 4th match in the last 5 Premier League matches of home teams
X11: Result of the 5th match in the last 5 Premier League matches of home teams
X12: Result of the 1st match in the last 5 Premier League matches of away teams
X13: Result of the 2nd match in the last 5 Premier League matches of away teams
X14: Result of the 3rd match in the last 5 Premier League matches of away teams
X15: Result of the 4th match in the last 5 Premier League matches of away teams
X16: Result of the 5th match in the last 5 Premier League matches of away teams
X17: GF of home teams in the last 5 Premier League matches of current season
X18: GF of away teams in the last 5 Premier League matches of current season
X19: GA of home teams in the last 5 Premier League matches of current season
X20: GA of away teams in the last 5 Premier League matches of current season
X21: Results of the last 4 matches for home teams played against away teams
X22: How many matches of home teams have been played
Output y: Unbeatean rate
Total numbers of Premier League games 11037
feature_name
Model Abbrev F1 score (Variance)
----------------------------------------------------------------------
K Nearest Neighbors Classifier KNN 0.834 (0.004)
Bagging Classifier[KNN] KNNbag 0.836 (0.003)
Stacking Classifier[MLP, [LR, ET, KNN]] Stacking 0.835 (0.006)
Gradient Boosting Classifier Boosting 0.837 (0.004)
Extra Trees Classifier ET 0.837 (0.004)
Final: Voting Classifier Voting 0.838 (0.003)
Student’s t-test p-value between KNN (best model in first stage) and Voting model: 0.00027
team_name
Assumed_Home_attendance_percentage
80 / 100
Covid period
Mean(Home Attendance Percentage): 0 , Mean(real unbeaten rates): 0.6026 , Mean(predicted unbeaten rates): 0.6245
Assume not covid
Mean(Assumed Home Attendance Percentage): 80 , Mean(predicted unbeaten rates): 0.7151