# Exploratory Data Analysis - Rodrigo Goes

## Import Libraries

### M.L Libraries

### Loading DataFrame

Run the app to see the outputs

Press the run button in the top right corner

Run the app to see the outputs

Press the run button in the top right corner

Run the app to see the outputs

Press the run button in the top right corner

Run the app to see the outputs

Press the run button in the top right corner

Run the app to see the outputs

Press the run button in the top right corner

## Position Relevancy

Run the app to see the outputs

Press the run button in the top right corner

In each of these roles, Rodrigo's statistical outputs would vary, reflecting the different demands of each position. By analyzing these outputs, you can gain a deeper understanding of Rodrigo's performances, his influence on games, and how he adapts to different tactical roles. Understanding these nuances will provide you with a more comprehensive and multi-dimensional view of Rodrigo's value to the team. The use of such data-driven insights is becoming increasingly common in modern football, helping teams to maximize their strategies and player performance.

Run the app to see the outputs

Press the run button in the top right corner

## Average Minutes Played

Run the app to see the outputs

Press the run button in the top right corner

## Interceptions

Run the app to see the outputs

Press the run button in the top right corner

## Kurtosis

The kurtosis of a data set is a measure of the "tailedness" of the probability distribution of a real-valued random variable. In simpler terms, it's a measure of the heaviness of the tails of the distribution. Here's what the kurtosis values you obtained generally suggest:

Run the app to see the outputs

Press the run button in the top right corner

In summary, Rodrigo Goes appears to be a consistent performer across most metrics, but there are instances where he shows variations in losses in his own half and recoveries in the opponent's half.

## Features Normalization

Run the app to see the outputs

Press the run button in the top right corner

## Sampling statistics, Distribution & Standard Error

Run the app to see the outputs

Press the run button in the top right corner

The sample mean tells us the average value of a particular feature. The standard deviation reflects the amount of variation in the feature, and the standard error provides a measure of how much our sample mean is likely to deviate from the actual population mean.

Total Actions Successful: Rodrigo's average (mean) number of total successful actions is close to zero (-0.0030), suggesting his performance is fairly balanced. The standard deviation (0.1007) indicates there is minimal variation in his successful actions. The standard error (0.0032) suggests that our estimate of his average successful actions is pretty accurate. On football optics, Rodrigo's performance in total successful actions is consistent, suggesting a reliable performance level.

Dribbles Successful: Rodrigo's mean for successful dribbles is slightly negative (-0.0008), which indicates his dribbling performance is balanced. The standard deviation (0.1031) suggests minimal variability, while the standard error (0.0033) shows that our estimate of the mean is quite accurate. On football optics, Rodrigo's dribbling ability is consistent, indicating he maintains a steady performance in this area.

Passes Accurate: With a slightly negative mean (-0.0016) and a standard deviation of 0.098, Rodrigo's passing accuracy is balanced and shows minimal variability. The standard error (0.0031) confirms that our estimate of his mean passing accuracy is relatively accurate. On football optics, Rodrigo is a consistent passer, rarely straying from his average level of performance.

Duels Won: The negative mean (-0.0043) indicates Rodrigo's performance in winning duels is balanced. The standard deviation (0.1017) suggests there is minimal variability, and the standard error (0.0032) confirms our estimate of the mean is accurate. On football optics, Rodrigo's consistency in winning duels indicates that he's competitive and effective in one-on-one situations.

Interceptions: Rodrigo's mean for interceptions is close to zero (-0.0004), suggesting his interception ability is balanced. The standard deviation (0.1001) shows minimal variation in his interception performance. The standard error (0.0032) assures us that our estimate of the mean is quite precise. On football optics, Rodrigo's consistent performance in interceptions indicates he's a player who can be relied upon to interrupt the opponent's play.

Losses in Own Half: Rodrigo's mean for losses in his own half is slightly negative (-0.0021), indicating balanced performance. The standard deviation (0.1003) suggests minimal variability in this aspect. The standard error (0.0032) suggests our estimate of the mean is fairly accurate. On football optics, Rodrigo shows consistency in minimizing losses in his own half, indicating strong ball retention skills.

Recoveries in Opponent's Half: The mean for recoveries in the opponent's half is slightly positive (0.0014), suggesting Rodrigo often retrieves the ball in the opponent's half. The standard deviation (0.1002) suggests minimal variability in his performance in this aspect. The standard error (0.0032) confirms our estimate of the mean is accurate. On football optics, Rodrigo's performance in recovering the ball in the opponent's half suggests he's proactive and quick to regain possession.

In summary, Rodrigo Goes presents a highly consistent performance across all the evaluated features. He exhibits balanced performance across these key metrics, indicating a player who can be relied upon for steady performance across matches.

## Central Limit Theorem

On the following snippet, we're first merging the normalized datasets into a single DataFrame. Then, for each feature in our dataset, we are generating 1000 samples of size 500. For each of these samples, we calculate the mean and store it in a list. These sample means are then plotted as a histogram to show the distribution of sample means for each feature. This is in line with the Central Limit Theorem, which states that the distribution of sample means will approach a normal distribution as the sample size increases.

Run the app to see the outputs

Press the run button in the top right corner

## Features Correlation

Run the app to see the outputs

Press the run button in the top right corner

## AB Test & ANOVA

### Based on Total Actions

Run the app to see the outputs

Press the run button in the top right corner

A/B testing and ANOVA (Analysis of Variance) are powerful statistical tools for comparing means across different groups or conditions. The T-statistic provides a measure of the difference between the means in units of standard error. The P-value, on the other hand, gives the probability that you would observe such a difference (or a more extreme one) just by chance if the null hypothesis were true.

Lastly, the ANOVA test is a generalization of the T-test for more than two groups. Here, with a high F-statistic (79.17) and an extremely small P-value (4.08e-29), it suggests at least one group differs significantly from the others. On football optics, this indicates Rodrygo's performance significantly differs across various conditions or roles.

### Based on Duels Won

Run the app to see the outputs

Press the run button in the top right corner

Lastly, the ANOVA result with a high F-statistic (46.82) and an extremely small P-value (1.81e-18) suggests that at least one group significantly differs from the others. This indicates Rodrygo's performance, in general, shows substantial variance across different roles or conditions. On football optics, this confirms Rodrygo's adaptability to different game strategies, effectively altering his game contributions based on his role.

### Based on Pass Accuracy

Run the app to see the outputs

Press the run button in the top right corner

## Linear Regression

### Ordinary Least Squares Model (Total Action)

Run the app to see the outputs

Press the run button in the top right corner

The R-squared and adjusted R-squared values are both close to 1, indicating a very good fit of the model. The Prob (F-statistic) is very small, indicating that at least one of the predictors is significantly related to the dependent variable. However, keep in mind that correlation does not imply causation, and this is only a model, real-life observations can still vary.

### Based on Duels Won

Run the app to see the outputs

Press the run button in the top right corner

The R-squared and adjusted R-squared values are quite high (0.921 and 0.899, respectively), indicating that the model explains a large portion of the variation in duels won by Rodrygo. The Prob (F-statistic) is very small, which provides evidence that at least one of the predictors is significantly related to the dependent variable. However, always remember that correlation does not imply causation, and real-world observations can still vary.

### Based on Pass Accuracy

Run the app to see the outputs

Press the run button in the top right corner

## Sci-kit Learn Models

### Based on Total Actions

Run the app to see the outputs

Press the run button in the top right corner

Dribbles Successful (-0.28): the negative coefficient indicates that successful dribbles have an inverse impact on performance, a surprising result that may require further scrutiny. On football optics, this outcome may signify that Rodrigo's playing style doesn't emphasize dribbling, or it might reflect a particular tactical context where dribbling is not a key part of his role.

Duels Won (0.96): Statistically, the strong positive coefficient for duels_won showcases a robust positive correlation with performance, emphasizing the importance of winning individual battles on the field. On football optics, this reflects Rodrigo's ability to dominate one-on-one situations, a quality essential for his defensive and offensive contributions.

Losses Own Half (0.55): The positive coefficient for losses_own_half is puzzling from a statistical perspective, as it suggests that losses in his own half correlate positively with performance. On football optics, this might indicate a playing style where Rodrigo is encouraged to take risks, understanding that occasional losses are part of an aggressive strategy.

Interceptions (-0.15): the negative coefficient for interceptions indicates an inverse correlation with performance, an outcome that may need more context to interpret. On football optics, this could imply that Rodrigo's role is not primarily focused on intercepting the ball or highlight a tactical alignment where interceptions are not a key element of his responsibilities.

Recoveries Opp Half (-0.20): Statistically, the negative coefficient for recoveries_opp_half signals a mild inverse relationship with performance, which could lead to more investigation into this aspect. On football optics, this may reflect that Rodrigo's role does not strongly involve pressing and recovering the ball in the opponent's half or reveal specific team strategies that deemphasize such recoveries.

Passes Accurate (1.09): the substantial positive coefficient for passes_accurate represents a strong correlation with performance, emphasizing the centrality of accurate passing. On football optics, this aligns seamlessly with Rodrigo's known proficiency in precise ball distribution, forming a crucial part of his overall game contributions.

Overall, with a mean r^2 value of 0.98 and a coefficient of determination of 1.00, the model displays an impressive fit to Rodrigo Goes's performance, supported by a mean squared error of 0.96. The analysis uncovers essential aspects of Rodrigo's game, such as his dueling prowess and passing accuracy, while also bringing to light intriguing questions regarding dribbling, interceptions, and recoveries. The negative coefficients for some features may reflect unique tactical decisions or highlight areas for further exploration and refinement.

### Based on Duels Won

Run the app to see the outputs

Press the run button in the top right corner

Overall, with a mean r^2 value of 0.56 and a coefficient of determination of 0.92, the model provides a reasonably good fit to Rodrigo Goes's dueling performance. The mean squared error of 0.75 supports this.

This analysis paints a multifaceted picture of Rodrigo's playing style in relation to duels won. It highlights key areas like dribbling, interceptions, and recoveries, while also presenting some surprising insights such as the negative correlation with passes_accurate. The result offers valuable perspectives on Rodrigo's attributes and performance and may lead to further in-depth inquiries into specific aspects of his game.

### Based on Pass Accuracy

Run the app to see the outputs

Press the run button in the top right corner

Overall, with a mean r^2 value of 0.97 and a coefficient of determination of 1.00, the model exhibits a highly strong fit to Rodrigo Goes's pass accuracy performance. The mean squared error of 0.78 supports this robust fit.

This analysis reveals crucial insights into Rodrigo's passing abilities, emphasizing the strong correlation with successful actions, dribbling, and recoveries, while offering thought-provoking insights such as the inverse relationship with duels won. It provides a comprehensive view of Rodrigo's passing efficiency, showcasing a multifaceted understanding of his playing style, and serves as a valuable resource for both statistical and football analysis.

## XGBoost & Cross-Validation

### Based on Total Actions

Run the app to see the outputs

Press the run button in the top right corner

### Based on Pass Accuracy

Run the app to see the outputs

Press the run button in the top right corner

### Based on Duels Won

Run the app to see the outputs

Press the run button in the top right corner

## Conformal Prediction

### Based on Total Actions

Run the app to see the outputs

Press the run button in the top right corner

Prediction interval coverage (77.89%): represents a fairly high alignment between the predicted intervals and the observed data for Rodrygo's Total Actions. This suggests that the model is relatively well-calibrated to this particular aspect of his game. However, it might still benefit from a detailed examination of the underlying features and additional tuning to potentially achieve even higher accuracy. Understanding the nature of the total actions, including different types of passes, dribbles, and interactions with the ball, could provide insights for further model refinement. On football optics, reflects his active involvement in matches. Known as an energetic and skillful forward, this coverage might highlight his consistent participation in offensive plays and defensive contributions. The relatively high coverage might suggest a stable role in the team's play, indicating his capability to influence various phases of the game. This measure may also provide insights into areas where focused training could further enhance his performance consistency.

Average prediction interval width (7.96):the average prediction interval width of 7.96 represents the range in which the model expects the true values to lie. Depending on the scale and context of Total Actions, this width might indicate a balanced level of prediction precision. Assessing this interval width against the actual variability and distribution of Total Actions will be critical to understand if it's well-calibrated or if there is room for tightening or widening the interval based on the modeling objectives. On football optics, could be seen as an expression of the variability in his playing style. As a dynamic and versatile player, this interval may reflect the range of his contributions in different matches and situations. Understanding this breadth of involvement can offer insights into his adaptability and areas where specific focus might lead to greater consistency and impact.

In summary, the outcomes of the Conformal Prediction Model related to Total Actions for Rodrygo Goes provide a nuanced view that intertwines statistical modeling with football insights. The relatively high coverage and specific prediction interval width offer a comprehensive perspective of his game, reflecting both the challenges and opportunities in modeling such a multifaceted feature and the vibrant nature of Rodrygo's on-field contributions. These insights can foster further exploration, model enhancement, and a deeper understanding of this promising player's style and strengths.

### Based on Duels Won

Run the app to see the outputs

Press the run button in the top right corner

Prediction interval coverage (77.78%): a prediction interval coverage of 77.78% for Duels Won is quite impressive, suggesting that the model's predicted intervals contain nearly 78% of the actual observations. This level of alignment signifies a relatively strong fit between the predictions and the observed data. The model appears to capture Rodrygo's duel-winning ability quite well, but it might still benefit from further refinement. Understanding the context and types of duels, along with the specific match situations, could help fine-tune the model for even better performance. On football optics, highlights his effectiveness in one-on-one situations. Known for his agility, speed, and technical prowess, this coverage percentage might illustrate his ability to win duels both offensively and defensively. It could also reflect his adaptability to different opponents and playing conditions, showing a consistent level of performance that can be vital for his team's success.

Average prediction interval width (3.42):the average prediction interval width of 3.42 represents a specific range within which the model expects the true values for Duels Won to lie. Given the scale and nature of Duels Won, this width might indicate a well-calibrated level of prediction uncertainty. Analyzing the context, scale, and distribution of Duels Won would be vital to interpret this interval width accurately and understand if it's appropriately balanced in terms of precision and reliability. On football optics, could symbolize the variability in this aspect of his game. As a young and evolving player, this interval might reflect a learning curve where he is developing his skills in reading and reacting to different duel situations. This width can provide insights into his growth potential and areas where targeted training might enhance his one-on-one proficiency.

In conclusion, the outcomes related to Duels Won for Rodrygo Goes provide valuable insights into both statistical modeling and his on-field performance. The high coverage and specific prediction interval width present a well-rounded view of his duel-winning ability, capturing both the statistical subtleties and the real-world football dynamics. These findings can guide further exploration, model optimization, and targeted development for this talented player's playing style and strengths.

### Based on Pass Accuracy

Run the app to see the outputs

Press the run button in the top right corner

Prediction interval coverage (76.32%): the prediction interval coverage of 76.32% signifies a substantial alignment between the model's predicted intervals and the actual observations for Rodrygo's Pass Accuracy. This indicates a relatively robust performance in capturing this specific aspect of his playing style. While this is an encouraging sign, there might still be room for refinement. Considering features such as the type of pass, match context, and playing position could further enhance the model's accuracy in this domain. On football optics, underlines his competence in distributing the ball accurately. As an attacking player known for his flair and creativity, this percentage might reflect his ability to make effective and precise passes in various situations. This level of pass accuracy is essential for building and sustaining attacking momentum and can be a critical aspect of his contribution to his team's offensive play.

Average prediction interval width: an average prediction interval width of 3.69 provides a specific range within which the model expects the true Pass Accuracy values to fall. Depending on the underlying scale and distribution of Pass Accuracy, this width may suggest a moderate level of uncertainty in the model's predictions. Evaluating this width in the context of the actual range and variability of Pass Accuracy would be crucial to gauge whether it's well-calibrated or if further adjustments might be needed to achieve the desired level of prediction precision. On football optics, might represent the natural variability in his passing game. This could encompass fluctuations based on various match conditions, opponent tactics, or his role in different attacking schemes. While this width recognizes the complexity of predicting such a multifaceted skill, it might also offer insights into specific areas where Rodrygo can focus to improve his passing consistency, particularly in critical attacking phases.

In summary, the outcomes related to Pass Accuracy for Rodrygo Goes offer a compelling blend of statistical modeling and football insights. The substantial coverage and specific prediction interval width provide a well-rounded view of his passing abilities, reflecting both the modeling challenges and the real-world football intricacies. These insights can support further model refinement, as well as shed light on an essential dimension of Rodrygo's game, with potential implications for his ongoing development and performance optimization.

## Radar Maps

### PSG 2nd Leg Rodrygo's Performance (2022)

Run the app to see the outputs

Press the run button in the top right corner

### Chelsea 2nd Leg Rodrygo's Performance (2022)

Run the app to see the outputs

Press the run button in the top right corner

### Man City 2nd Leg Rodrygo's Performance (2022)

Run the app to see the outputs

Press the run button in the top right corner

### Liverpool 1st Leg Rodrygo's Performance (2023)

Run the app to see the outputs

Press the run button in the top right corner

### Man City 2nd Leg Rodrigo's Performance (2023)

Run the app to see the outputs

Press the run button in the top right corner