Sending Our Love Down Taffs Well
To get started, press "Run" on the top nav bar!
Season One ⚽
Football Manager 2024 is here! It's a new game cycle and this year I'm trying something as old as time, starting from the bottom and aiming for the top. I've taken over JD Cymru South League side Taffs Well, a quaint little village not far from central Cardiff that is also home to a very underwhelming football club. Predicted to be relegated at the start of the season, year one with this Merthyr boy at the helm went surprisingly...um...well. We finished 8th out of 16, comfortably clear of relegation, with a perfectly balanced record of 10 wins, 10 draws, and 10 defeats. After absorbing a lot of "Ange Ball" in the last few weeks I opted for a Much Higher Defensive line with a Trigger Press set to More Often while out of possession before using a Vertical Tiki-Taka approach with the ball, intending to control games. For a half-star, Semi-Professional side it was a risky approach but one that paid off. We had the highest Average Possession in the league with 61% and the highest amount of Final Third Passes. On the other side of the ball, we were top of the Passes Completed Against and Final Third Passes Against categories, allowing the fewest, while also placing 5th in Shots Against. Overall, the approach was successful given our goal for the season and our stature. So, how do we build on a successful first season?
After adhering to Lollujo's Non-League to Legend rule set, a thought occurred concerning transfers. "I'm data boy, living in a data world". In other words, scouts cost money, scouts suck at this level and I know how to sift through a lot of data to find potential signings. Although retaining one scout, and accepting that trialists will still be a part of the plan at this level, I decided to use some tools to find players. That's where this blog/notebook comes in. I needed to find a GK, CD, DL, DR, DM and, most importantly, a creative MC. The two main things I've used below are Z-Scores, to rank players, and SQL, to query my database of players. I'm not going to dive into Z-Scores in this post but will link you to a brilliant explainer from Ben Griffis on why they're handy for identifying transfer targets in football.
Storing My DM's in a DB 💾
For the sake of this blog I'm *technically* using a CSV and not a database at the minute but I am using the same methods I would/will be using when I have more data after next season
My intention with this save is to use historical data to make better decisions. The historical part is important as Football Manager suffers from short-termism. There's no way in the game to store more advanced data past one season so, at the end of every season, I'm going to dump a ton of information into a big ole database. The examples below actually use a CSV but as you can see there are 443 rows in there already. When we start logging 2, 3, and 4 seasons worth of data, it's going to get bulky.
(Make sure you have a poke around the data frame below! You'll be able to see the statistics I'm keeping track of)
Run to view results
To extract the data from our database we can use Structured Query Language (SQL) to specify what we want and where we are getting it from. Below, there are three main components that make up our statement. SELECT, FROM and WHERE. In addition to this, there are three operators of interest, LIKE, OR and AND. SQL is a pretty logical language but let me expand on what I've highlighted.
SELECT - Used to select what data you want from your database. If you're an Excel user, we're grabbing the column headers/titles. You'll often see "SELECT * " in SQL statements. Here the * means "all". I know what data I want to select so I type out the specific column names. Again, using Excel terminology, this makes a new "table" with only the selected fields.
FROM - Pretty much what it says on the tin. We type out the table name in our database that we want to extract our data from. The way that relational databases are structured, there will be multiple tables within a database storing unique data. I won't dive too deeply into this but the database we will be creating will be a relational database, i.e. tables within the database have a relationship. Think of what we're using at the minute as "a table from the database" where the table would be "the 2023/24 season".
WHERE - This is the first one that people get tripped up on. WHERE can easily be confused with FROM when starting out so let me explain the difference. It is used as a clause to filter your data but setting a condition. So it's not "Where is my data?". Instead, it is "My data is FROM here, WHERE a column has this!". In the example below I'm saying "SELECT this data FROM my database but WHERE it says "position", I want just defensive midfielders OR midfielders". Remember, WHERE = condition!
Operators
LIKE - Following on from WHERE, LIKE is an operator used to identify a pattern in your condition. So for us, we're saying "In the Position column, find something that is like 'DM%'. Not exactly DM, but something that contains DM". The % is what we call a wildcard. This means "Hey, something could come after this pattern or it could be nothing". This is super handy as players can play multiple positions so we're opening our filter a little wider. With this in mind, remember that positions are written with the more defensive position first in the shorthand. For example, if a player is a midfielder and an attacking midfielder, the shorthand will be something like "M/AM (C)" so we could write "M%" ensuring to capture everyone who's a midfielder.
OR - This one's straightforward! You put it in between a condition to have one condition OR another condition. For us, we're saying "Look for defensive midfielders OR midfielders".
AND - Very similar to OR, but with AND, it's saying we can have one condition or many rather than this condition OR this condition. We've specified we want defensive midfielders OR midfielders from Position, now we're saying "AND make sure they have 500 minutes or more". That leads me to this bonus note. The >= is a comparison operator and means "greater than or equal to".
Final note! ; signifies the end of your SQL statement. DO NOT FORGET THE ; ! NEVER FORGET THE ; ! (from someone who used to forget it all the time and wonder why my script didn't work 🤦🏻)
After all that, we've filtered our original 443 to 73.
Searching for a creative MC using Z-Scores 📈
We've done a lot of filtering and retrieval of data. Great! The problem is, is 4.24 progressive passes per 90 good!? One way we can answer that is by standardising the data we have and finding out what's good, what's (mean) average and what's bad. If you've read the z-score explainer or a are stats nerd, you can skip this but for everyone else, in the simplest way I can explain it, z-scores find the mean in the data and then tell us if the scores are above or below average. 0 is average, 2 is good, 3 is very good, +3 is Lionel Messi. Note, FM kicks up some insane outliers at times so you might see some 4.00's or -4.00's.
I want to find out who was performing above average in our league last year in the hope of signing them. I want to find a creative midfielder who logged good numbers, per 90, for open-play key passes, chances created, progressive passes, xA, possession won and pressures complete but I want to rank only midfielders when doing this. That, in a nutshell, is what the below code does. I thought about doing an in-depth explainer for it but there are two things. One, although most SQL works across different databases or environments, there a slight differences. Deepnote (the platform I'm using here) uses DuckDB as standard so my syntax might look different to someone using MySQL Workbench, PostgresQL, Snowflake, etc. I had to learn how to do a couple of things differently in DuckDB as I'd previously only used MySQL and PostgresQL so it's always worth considering that if you want to use any code or learn some SQL of your own. A good starting point is W3Schools as it has environments on the site you can type the code directly into. Two, after you've got the basics under your belt, I find the best way to learn is just use, remove, break and edit code. Play around with it! It's there to be used and I think you'd get more out of it than if I explained every line of code. With that said, I will just note that ROUND just rounds my z-scores up to 2 decimal points. It's an aesthetic thing as it looks neater than 0.23567344.
Run to view results
In Closing...
The main takeaway is that we've narrowed down 443 players to just 23. We've ranked them against their positional peers. We've given everyone a standardised rating that we can easily read to know who's good or bad. Hopefully, I've inspired you to go out and use some scouting filters in your own saves and/or look into SQL and relational databases. I've included the other statements I wrote for the other positions below so feel free to have a play around with those as well. I managed to make two signings from using this method (I cannot stress how bad we are as a club and how little anyone wants to play for us) so next time we'll have a look at how our analytical darlings did in season two. My aim with this "blog" is to share some methods using what I've learned during my computer science degree and hopefully encourage you to try out some new things.
I appreciate you reading and happy FM'ing!
Additional SQL Statements
GK Search
Run to view results
DC Search
Run to view results
DL/DR Search
Run to view results
DM Search
Run to view results