# Kickball Team Generator (with Machine Learning)

A couple months ago, a few of my college friends got together and played a kickball-esque game. At the time, we arbitrarily divided ourselves into two teams based purely on intuition. By the end of the day however, my team ended losing with 6 runs compared to 19 runs of the opposing team (despite them have even LESS players). Naturally, I began to wonder how my team could've lost with such a big margin. Could it have been luck? Or perhaps there was some hidden metric that determined how good a player is other than their physical build.

With that in mind, I got to work designing a system to evenly split players into two baseball/kickball teams based on their skills across 5 categories (knowledge of the rules, running abilities, throwing abilities, catching abilities, and lastly but most importantly batting/kicking abilities).

## Getting Started

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

I first conducted a survey with my closest friends to gather testing data. In that survey I asked them to rate their skills in each of our 5 categories. A major flaw in this system that I anticipated was response bias, so I specified that survey particiapnts should rate themselves in comparison to those that they played baseball/kickball with most recently. It's not exactly a perfect method, but it significantly reduces variation in responses by establishing a standard that most are already familiar with.

We'll start by reading in a dataset from this survey.

player_data = pd.read_csv('EvenNumPlayers22.csv')

A quick data clean (renaming columns and mapping string values to numerical ones). I'm also deciding to drop the column from the survey relating to overall skill level. I realized we can simply sum the values in all the other columns to achieve an overall score, which would give us higher granularity for that feature (would be out of 25 as opposed to out of 10).

player_data = player_data.drop(columns=['Timestamp', 'I confirm that I have played kickball before.', 'How would you rate your overall kickball skill level?'])
player_data = player_data.rename(columns={' [Rate your level of understanding for the rules of kickball]': 'Knowledge',
' [Rate your running skills in kickball]': 'Running',
' [Rate your throwing skills in kickball]': 'Throwing',
' [Rate your catching skills in kickball]': 'Catching',
' [Rate your kicking skills in kickball]': 'Kicking'})
player_data = player_data.replace(['Poor', 'Below Average', 'Average', 'Above Average', 'Excellent'], [1, 2, 3, 4, 5])

def add_overall_score(data):
data['Overall'] = data['Knowledge'] + data['Running'] + data['Throwing'] + data['Catching'] + data['Kicking']
return data
add_overall_score(player_data).head()

## Team Formation with Single Feature

Let's start with a simple algorithm. We'll sort the players in order of their overall skill level, then put every other player on a separate team. The average skill level of both teams should be similar.

def simple_sort_algorithm(data):
data = data.sort_values('Overall', ascending=False).reset_index(drop=True)
simple_team_1 = data[data.index % 2 == 0]
simple_team_2 = data[data.index % 2 != 0]
return [simple_team_1, simple_team_2]
simple_sort_algorithm(player_data)

We'll write a function in order to compare how fair our matchmaking algorithm is.

def generate_comparison_table(team_1, team_2):
teams = ['Team 1', 'Team 2']
knowledge_total = [np.sum(team_1['Knowledge']), np.sum(team_2['Knowledge'])]
running_total = [np.sum(team_1['Running']), np.sum(team_2['Running'])]
throwing_total = [np.sum(team_1['Throwing']), np.sum(team_2['Throwing'])]
catching_total = [np.sum(team_1['Catching']), np.sum(team_2['Catching'])]
kicking_total = [np.sum(team_1['Kicking']), np.sum(team_2['Kicking'])]
overall_total = [np.sum(team_1['Overall']), np.sum(team_2['Overall'])]
overall_avg = [np.mean(team_1['Overall']), np.mean(team_2['Overall'])]
metric_totals = pd.DataFrame(data={'team': teams, 'knowledge_total': knowledge_total, 'running_total': running_total,
'throwing_total': throwing_total, 'catching_total': catching_total, 'kicking_total': kicking_total,
'overall_total': overall_total, 'overall_avg': overall_avg})
metric_totals = metric_totals.set_index('team')
difference = metric_totals.loc['Team 1'] - metric_totals.loc['Team 2']
difference.name = 'Difference'
metric_totals = metric_totals.append(difference)
percent_difference = (metric_totals.loc['Difference'] * 100 / metric_totals.loc['Team 1']).astype(int)#.astype(str) + '%'
percent_difference.name = '% Difference'
metric_totals = metric_totals.append(percent_difference)
return metric_totals
simple_results = simple_sort_algorithm(player_data)
generate_comparison_table(simple_results[0], simple_results[1])

So far, so good. As a matter of fact, you could totally stop here and use this system to create kickball teams! However, one potential problem that may arise is that as long as there is an even number of players, the average skill level of team 1 (the team with the all the odd ranked players) will ALWAYS be greater than or equal to that of team 2. This is because their highest ranked player is the highest ranked player overall, and their lowest ranked player is only the second lowest ranked overall. Let's try a more advanced algorithm.

Let's try something else. In our greedy algorithm, we will traverse through all the players in our dataset and decide at each moment which team is best for them based on their overall skill. We'll keep track of the cumulative skill of both teams and assign a player to the team with the smaller amount of cumulative skill. If the cumulative skill is equal between the two teams, then we'll randomly assign the player to a team. We will repeat this process until every player has been assigned a team.

def greedy_algorithm(data):
data = data.sort_values('Overall', ascending=False).reset_index(drop=True)
team_1 = pd.DataFrame(columns=data.columns)
team_2 = pd.DataFrame(columns=data.columns)
team_1_total_score = 0
team_2_total_score = 0
for i in range(len(data.index)):
if team_1_total_score > team_2_total_score:
team_2 = team_2.append(data.iloc[i])
team_2_total_score += data.iloc[i]['Overall']
elif team_1_total_score < team_2_total_score:
team_1 = team_1.append(data.iloc[i])
team_1_total_score += data.iloc[i]['Overall']
else:
indicator = np.random.randint(2)
if indicator == 0:
team_1 = team_1.append(data.iloc[i])
team_1_total_score += data.iloc[i]['Overall']
else:
team_2 = team_2.append(data.iloc[i])
team_2_total_score += data.iloc[i]['Overall']
return [team_1, team_2]
greedy_algorithm(player_data)

greedy_results = greedy_algorithm(player_data)
generate_comparison_table(greedy_results[0], greedy_results[1])

Already, we can see that the difference between our overall skill is lower than before, which is definitely a step in the right direction.

Another point of unfairness is that despite the average skill levels of both teams being similar, one team may lack in a certain category while the other team is more well-rounded, giving the latter an unfair advantage. For instance, every player on team 1 may be exceptionally skilled in all categories EXCEPT kicking, while the players in team 2 are moderately skilled in ALL categories. There is a possibility that our skill metrics may be similar, but that would only be due to chance and we will need to devise a way to systematically ensure it. We can address this issue by considering those metrics and dividing players in to teams not solely based on their overall rank. In other words, players who lack in a certain category should be clustered with other players who can make up for it. To accomplish this, we'll use machine learning!

## Team Formation with Multiple Features

Now let's incorporate more features. We'll look into each skill category in determining our teams to ensure that the they are similar across all areas.

Pairplot is a useful function included in the Seaborn library. As you can see I'm quickly able to create scatterplots between all of my features with just a single line of code. We're going to offset each of our datapoints by a random value so that we can better visualize the density of our data. For clustering, I prefer this method of visualization over heat maps or KDE plots, since you can see the individual points.

sns.pairplot(player_data.drop(columns=['First Name', 'Overall']) + np.random.normal(0, 0.1, size=(len(player_data), 5)))

from sklearn.cluster import KMeans

It's easy for humans to cluster points on a 2D graph, but when 5 dimensions are involved it gets a little bit harder. This is where machine learning comes in. Using a K-Means clustering algorithm, our computer is able to decide for itself the different types of players there in our dataset. We can see this in action below!

def generate_clusters(data, num_clusters):
kmeans = KMeans(n_clusters=num_clusters)
y = kmeans.fit_predict(data.drop(columns=['First Name', 'Overall']))
data['Player Type'] = y
return data.sort_values('Overall', ascending=False).reset_index(drop=True)
def display_clusters(data):
for i in data['Player Type'].value_counts().index.tolist():
display(data[data['Player Type'] == i].sort_values('Overall', ascending=False).reset_index(drop=True))
display_clusters(generate_clusters(player_data, 5))

For the sake of demonstration, we told our algorithm to identify 5 types of players and it delivered. Based on these clusters, it seems like we have the following groups:

1. A group that is above average in most fields, and exceptional in running — our all-around players

2. A group that is average in most fields, but above average in batting/kicking — our kickers

3. A group that is overall average, but has a solid understanding of the game — our strategists

4. A group that is at or above average in all areas but kicking — our designated outfielders

5. A group that may be new to baseball/kickball — our learners

Of course, we will fine tune the amount of clusters we tell our algorithm to make in order to improve the skill balance between the two teams. This could mean either increasing the number of clusters, decreasing the number of clusters, or even setting the number relative to the total number of players.

Now that we have our clusters (different types of players), we will assign players in each category to either team 1 or team 2 using the greedy algorithm. The difference now is that with our clusters, we are able to ensure that each team receives and equal amount of players from each player category, therefore reducing the variation across skill levels between teams.

def generate_clustered_teams(data):
num_clusters = int(len(data.index) / 2)
data = generate_clusters(data, num_clusters)
kmeans_team_1 = pd.DataFrame(columns=data.columns)
kmeans_team_2 = pd.DataFrame(columns=data.columns)
kmeans_team_1_total_score = 0
kmeans_team_2_total_score = 0
for i in range(len(data.index)):
player_type = data.iloc[i]['Player Type']
if kmeans_team_1[kmeans_team_1['Player Type'] == player_type].shape[0] > kmeans_team_2[kmeans_team_2['Player Type'] == player_type].shape[0]:
kmeans_team_2 = kmeans_team_2.append(data.iloc[i])
kmeans_team_2_total_score += data.iloc[i]['Overall']
elif kmeans_team_1[kmeans_team_1['Player Type'] == player_type].shape[0] < kmeans_team_2[kmeans_team_2['Player Type'] == player_type].shape[0]:
kmeans_team_1 = kmeans_team_1.append(data.iloc[i])
kmeans_team_1_total_score += data.iloc[i]['Overall']
else:
if kmeans_team_1_total_score > kmeans_team_2_total_score:
kmeans_team_2 = kmeans_team_2.append(data.iloc[i])
kmeans_team_2_total_score += data.iloc[i]['Overall']
elif kmeans_team_1_total_score < kmeans_team_2_total_score:
kmeans_team_1 = kmeans_team_1.append(data.iloc[i])
kmeans_team_1_total_score += data.iloc[i]['Overall']
else:
indicator = np.random.randint(2)
if indicator == 0:
kmeans_team_2 = kmeans_team_2.append(data.iloc[i])
kmeans_team_2_total_score += data.iloc[i]['Overall']
elif indicator == 1:
kmeans_team_1 = kmeans_team_1.append(data.iloc[i])
kmeans_team_1_total_score += data.iloc[i]['Overall']
kmeans_team_1 = kmeans_team_1.sort_values('Overall', ascending=False).reset_index(drop=True)
kmeans_team_2 = kmeans_team_2.sort_values('Overall', ascending=False).reset_index(drop=True)
return [kmeans_team_1, kmeans_team_2]
generate_clustered_teams(player_data)

kmeans_results = generate_clustered_teams(player_data)
generate_comparison_table(kmeans_results[0], kmeans_results[1])

These results are much better than our results from the simple sort we did at the beginning of the notebook. We can see a decrease in the difference multiple categories, so assuming people fairly reported their data, our teams should be pretty darn even (although not perfect)!

# Simple sort
generate_comparison_table(simple_sort_algorithm(player_data)[0], simple_sort_algorithm(player_data)[1])

# Greedy algorithm
generate_comparison_table(greedy_algorithm(player_data)[0], greedy_algorithm(player_data)[1])

In some cases, our greedy algorithm appears to perform better than our clustering algorithm, so we will write a function to use the greedy algorithm in those cases.

def generate_teams(data):
greedy_results = greedy_algorithm(data)
kmeans_results = generate_clustered_teams(data)
greedy_diff = np.absolute(np.sum(greedy_results[0]['Overall']) - np.sum(greedy_results[1]['Overall']))
kmeans_diff = np.absolute(np.sum(kmeans_results[0]['Overall']) - np.sum(kmeans_results[1]['Overall']))
if greedy_diff > kmeans_diff:
print('k')
return kmeans_results
elif greedy_diff < kmeans_diff:
print('g')
return greedy_results
elif greedy_diff == kmeans_diff:
greedy_score = np.sum(np.absolute(generate_comparison_table(greedy_results[0], greedy_results[1]).drop(columns=['overall_total', 'overall_avg']).loc['Difference']))
kmeans_score = np.sum(np.absolute(generate_comparison_table(kmeans_results[0], kmeans_results[1]).drop(columns=['overall_total', 'overall_avg']).loc['Difference']))
if greedy_score < kmeans_score:
print('g')
return greedy_results
else:
print('k')
return kmeans_results
generate_teams(player_data)

To evaluate the full effectiveness of our new algorithm, let's re-sample smaller groups from our original dataset and see how teams are formed using different combinations of our existing data rather than the whole set. We'll also make note of the differences in skill levels across our five categories.

def display_resampled_matches(data, num_resamples, sample_size):
results = pd.DataFrame(columns=['team', 'knowledge_total', 'running_total', 'throwing_total', 'catching_total',
'kicking_total', 'overall_total', 'overall_avg'])
results_greedy = pd.DataFrame(columns=['team', 'knowledge_total', 'running_total', 'throwing_total', 'catching_total',
'kicking_total', 'overall_total', 'overall_avg'])
for s in range(num_resamples):
sampled_data = data.sample(n=sample_size, replace=False).reset_index(drop=True)
sampled_results = generate_teams(sampled_data)
#display(sampled_data)
sampled_scorecard = generate_comparison_table(sampled_results[0], sampled_results[1])
results = results.append(sampled_scorecard.loc['Difference'])
results = results.rename(columns={'knowledge_total': 'knowledge_diff', 'running_total': 'running_diff', 'throwing_total': 'throwing_diff', 'catching_total': 'catching_diff',
'kicking_total': 'kicking_diff', 'overall_total': 'overall_diff', 'overall_avg': 'overall_avg_diff'})
results = results.reset_index(drop=True).drop(columns='team')
return results
display_resampled_matches(player_data, 10, 20)