학부생 이석호’s Untitled project

Event Data

In the 2022 World Cup match between Portugal and Korea, Hwang Hee-chan scored a goal. At that moment, Son Heung-min ran across the field at incredible speed — he covered 70 meters in 11 seconds, ran all the way to the opponent's goal area, and made a pass that led to Hwang Hee-chan's shot. In the event data, it simply records that Son Heung-min started from this point, dribbled to that point, then passed to Hwang Hee-chan, and Hwang took the shot. For example, if he ran from point A to point B, event data only records the moment he started the dribble and the moment he ended it — just those two points. In contrast, tracking data includes all the points between A and B as the player moves. In the case of passing as well, tracking data includes all the positions in between the pass’s starting and ending points. So, tracking data contains much more information. That means we can do much more with it compared to event data. In this example, the tracking data includes a vast amount of information — from the moment Son Heung-min started running to the moment the goal was scored. In the past, we had to put in a lot of effort to digitize event data. Because you had to input all the points between A and B into the computer manually, tracking data didn’t really exist. But now, thanks to AI, we can identify all the intermediate points automatically, allowing us to collect a massive amount of data. We’re going to show you what we can actually do with this data.

Off-the-Ball

Event data focuses on the player with the ball. For example, if there’s a pass from A to B, you can know the position of the ball — but the players who aren't involved in the play are scattered elsewhere. There’s usually no information about those players in event data. So while you can track the location of the player with the ball, you can’t see where the others were during the play. From a visual standpoint, you can’t tell where the other players were during the pass from A to B. But tracking data contains all of that. Before tracking data even existed, because this type of data was so important, companies like StatsBombpy tried to work around it by generating different types of event-style data. For example, entering all the data into the computer was such labor-intensive work,so StatsBomb would at least collect the positions of other players when a shot was taken. This was called a Shot Freeze Frame, where they recorded the shooter’s data and the positions of nearby players — both teammates and opponents — at the moment of the shot. These were provided in event data format. But now, with tracking data, you don’t need those kinds of workarounds. Thanks to AI, we can now capture every player's position, from 0 to 90 minutes, without any manual input.

With tracking data, it is now possible to plot the positions of all players on the field and transform this information into a 2D animation. For example, yellow dots may represent one team, red dots the other, and the ball's position is shown as well. When this data is compiled into a video file and played, viewers can clearly observe the movement of each player and the trajectory of the ball. Such detailed visualizations were not possible using only event data, which lacks continuous player tracking. However, tracking data allows for a more complete and dynamic representation of match activity. Consider a scenario where a corner kick is taken, followed by a cross and a goal. After the goal, you might observe all the yellow dots moving toward one side of the pitch — a clear indication of a celebration. Once the celebration concludes, the players return to their positions, all of which can be visualized frame by frame in the animation. Several companies provide football data. For instance, StatsBomb offers a large collection of event data, including coverage of the World Cup and Bundesliga. However, their tracking data is currently limited. They do provide features such as shot freeze frames, but not full continuous tracking data.

To analyze tracking data, this section uses sample data provided by Metrica Sports, which will be visualized as a 2D animation. Accessing Metrica’s tracking data can be efficiently handled using the Python package Kloppy, which is designed to simplify the loading and processing of football data from various providers.

Once installed, Kloppy can be used to load the Metrica dataset. Although tracking data can be accessed through other means, Kloppy significantly streamlines the workflow and is therefore recommended for both clarity and efficiency. In addition to Kloppy, it is also necessary to import matplotlib, which will be used to generate the visualizations. Notably, Kloppy supports not only Metrica Sports data but also a range of other football data formats from different providers, making it a versatile tool for football analytics.

from kloppy import metrica import matplotlib.pyplot as plt from mplsoccer import Pitch import os from tqdm import tqdm

To load open tracking data available online, we begin by selecting a sample dataset provided by Metrica Sports, which offers three examples. For this tutorial, we will use the first dataset, and thus set match_id = 1. When the code is executed, the loading process may take some time, as it involves downloading the complete tracking data for Match 1 from the internet. Since repeated downloads are inefficient, it is recommended to save the dataset locally after the initial download. This allows for faster access in subsequent runs without requiring an internet connection. After the data has been loaded, it is possible to examine the number of data points contained in the dataset. In this case, there are approximately 145,000 rows of tracking data. Additionally, the dataset includes time stamps, which indicate the starting and ending times of the tracking data.For example, in this dataset, the final time stamp is approximately 2949 seconds.

dataset = metrica.load_open_data(match_id=1)

len(dataset.records)

dataset.records[0].timestamp

dataset.records[-1].timestamp

Next, we have the metadata, which contains descriptive information about the data. For example, the tracking data includes not only the X and Y coordinates of the players, but also metadata such as: - Which players were on each team - What kind of match it was - Other contextual details related to the game This metadata helps us better understand and interpret the tracking data.

metadata = dataset.metadata

You can also select a specific team or request to display the player IDs for the home team.

team_home, team_away = metadata.teams

However, the players' names are not included because this is public data, and for privacy reasons, the data has been anonymized. In most cases, players are simply labeled with numbers like 1, 2, 3, 4, 5, and so on.

for player in team_home.players: print(player.player_id)

The data is organized by frames, and you can think of each frame as representing a single event or moment in time. So when we say there are 145,000 frames, that means we have 145,000 snapshots of player movements captured throughout the match. By combining these frames one by one, we can create a 2D animation video of the entire game. In this case, we’ve looked at just 10 sample frames to get an idea of what each frame contains. Each frame includes detailed information, and by printing them out, we can see exactly what kind of data is stored inside.

len(dataset.frames)

# first ten frames dataset.frames[:10]

By printing out 10 sample frames, we can examine what kind of data is stored inside each frame. Among those 10 frames, we issued a command to print the ball’s position in each one. As a result, we can see the ball’s coordinates for each frame — for example, the ball started here in the first frame, then moved there in the second, and so on. How is this different from StatsBomb data? The key difference is the coordinate system. In this dataset (from Metrica Sports), the field coordinates are normalized, meaning both the horizontal (X-axis) and vertical (Y-axis) values range from 0 to 1. In contrast, StatsBomb uses actual pitch dimensions — for example, 120 by 80 units. This means that if you're going to plot or visualize the pitch, you need to know which provider the data came from (in this case, Metrica Sports)so that the positions are rendered correctly on the field.

for frame in dataset.frames[:10]: print(frame.ball_coordinates)

The top portion displays the ball’s position, while the section below shows the positions of the players.We printed the player IDs and coordinates, and since the team was specified as "home", only the home team playerswere included — not those of the opponent. For demonstration purposes, only the first 10 positions were printed to provide a brief overview. For example, we can observe : - Player 11’s first recorded position, - followed by their second position, - and then their third position, and so on. By examining each frame sequentially, we can trace how Player 11 moved across the pitch over time, from the starting point throughout the match. This movement can be visualized on screen and even converted into a video animation to better understand player dynamics.

for frame in dataset.frames[:10]: for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": print(player.player_id, coord)

You just need to draw the pitch and then plot each player’s position on top of it, one by one.

pitch = Pitch( pitch_color="grass", line_color="white", stripe=True ) fig, ax = pitch.draw() plt.show()

So among the 145,000 frames, we extracted the first frame and displayed it on the screen. I used a loop to render it, and as you can see here, this is the very first frame. It doesn’t just contain the position of one player — it includes the positions of all players at that moment. In this first frame, we have the locations of all 11 home team players and 11 away team players, as well as the ball position. By visualizing this first frame on the pitch, we can display all player positions: - Home team players are shown in yellow - Away team players are shown in red When we rendered it, we were able to see all 22 players on the pitch at once.

pitch = Pitch( pitch_type="metricasports", pitch_width=68, pitch_length=105, pitch_color="grass", line_color="white", stripe=True ) fig, ax = pitch.draw() # the very first frame frame = dataset.frames[0] for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": pitch.plot(coord.x, coord.y, marker="s", color="yellow", markeredgecolor="black", zorder=1, ax=ax) else: pitch.plot(coord.x, coord.y, marker="s", color="red", markeredgecolor="black", zorder=1, ax=ax) plt.show()

We ran a loop from the 1st to the 50th frame and displayed each one on the screen. By rendering all player positions from those 50 frames onto the pitch, we can visually observe how the players move. At first, you see where a player starts in frame 1, then moves through frame 2, frame 3, frame 4, and so on —until frame 50. This allows us to see how a player moves over time, step by step.

pitch = Pitch( pitch_type="metricasports", pitch_width=68, pitch_length=105, pitch_color="grass", line_color="white", stripe=True ) fig, ax = pitch.draw() for frame in dataset.frames[:50]: for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": pitch.plot(coord.x, coord.y, marker="s", color="yellow", markeredgecolor="black", zorder=1, ax=ax) else: pitch.plot(coord.x, coord.y, marker="s", color="red", markeredgecolor="black", zorder=1, ax=ax) plt.show()

We also added the ball position to the visualization. By including just the ball's location, we can now see how the ball moves across the pitch. Since we’ve plotted all player positions for 50 frames on a single 2D image, we can get a rough sense of the overall movement pattern. However, this is not an animation — it’s just a static visualization. If we were to plot the entire 90-minute match this way, it would become too messy and hard to interpret. That’s why we decided to turn it into a video animation instead.

pitch = Pitch( pitch_type="metricasports", pitch_width=68, pitch_length=105, pitch_color="grass", line_color="white", stripe=True ) fig, ax = pitch.draw() for frame in dataset.frames[:50]: pitch.scatter( frame.ball_coordinates.x, frame.ball_coordinates.y, marker="football", s=100, zorder=999, ax=ax ) for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": pitch.plot(coord.x, coord.y, marker="s", color="yellow", markeredgecolor="black", zorder=1, ax=ax) else: pitch.plot(coord.x, coord.y, marker="s", color="red", markeredgecolor="black", zorder=1, ax=ax) plt.show()

To create a video animation, we begin by saving each individual frame as a PNG image. This involves generating the first frame, saving it as an image, then repeating the process for the second frame, the third frame, and so on — ultimately saving all 145,000 frames as separate image files. Once all frames are saved to the local system, we use video processing software to compile them into a single video file. In this case, we use a tool called FFmpeg, which merges all the individual PNG images into a continuous animation. To automate this process, we wrote a script that performs the following tasks: 1. Draws one frame onto the canvas 2. Saves it as a PNG file 3. Clears the canvas 4. Repeats the process for all 145,000 frames The code below demonstrates how this sequence was implemented.

df = dataset.to_df(engine="pandas")

We have to install FFmpeg. (conda install conda-forge::ffmpeg)

path_base = "anim1a"

os.makedirs(path_base, exist_ok=True)

pitch = Pitch( pitch_type="metricasports", pitch_width=68, pitch_length=105, pitch_color="grass", line_color="white", stripe=True ) n = 0 max_frames = 100 for frame in tqdm(dataset.frames[:max_frames]): n += 1 fig, ax = pitch.draw() pitch.scatter( frame.ball_coordinates.x, frame.ball_coordinates.y, marker="football", s=100, zorder=999, ax=ax ) for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": pitch.plot(coord.x, coord.y, marker="s", color="yellow", markeredgecolor="black", zorder=1, ax=ax) else: pitch.plot(coord.x, coord.y, marker="s", color="red", markeredgecolor="black", zorder=1, ax=ax) path_dst = os.path.join(path_base, f"{n:03d}.png") plt.savefig(path_dst) # clean up fig.clear() plt.close(fig)

path_base = "anim1b"

os.makedirs(path_base, exist_ok=True)

pitch = Pitch( pitch_type="metricasports", pitch_width=68, pitch_length=105, pitch_color="grass", line_color="white", stripe=True ) n = 0 max_frames = 3500 for frame in tqdm(dataset.frames[:max_frames]): n += 1 fig, ax = pitch.draw() if frame.ball_coordinates is not None: pitch.scatter( frame.ball_coordinates.x, frame.ball_coordinates.y, marker="football", s=100, zorder=999, ax=ax ) for player, coord in frame.players_coordinates.items(): if player.team.name == "Home": pitch.plot(coord.x, coord.y, marker="s", color="yellow", markeredgecolor="black", zorder=1, ax=ax) else: pitch.plot(coord.x, coord.y, marker="s", color="red", markeredgecolor="black", zorder=1, ax=ax) path_dst = os.path.join(path_base, f"{n:06d}.png") plt.savefig(path_dst) # clean up fig.clear() plt.close(fig)

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Event Data

Off-the-Ball

Event Data