🧢 Moneyball meets Groq: Baseball analytics app with LangChain
🥇 Classic statistics
Mean of runs scored
This tells the average number of runs scored per season across the entire dataset (1962–2012).
Pick a team to explore its stats:👇🏼

Average runs scored
The average number of runs a team scored per season, reflects overall offensive strength.
Average wins
How many games a team typically wins in a season, core measure of team success
Mean on-base percentage (OBP)
Measures how often players reach base, a key stat for scoring and win potential.
Slugging percentage (SLG)
Captures a team’s power-hitting ability weighted by extra-base hits.
Batting average (BA) + standard deviation
Opponent on-base percentage (OOBP)
Traditional hitting stat, how often a player gets a hit per at-bat.
Measures how much batting performance varies across teams or seasons.
Shows how often opponents get on base, reflects pitching and defensive strength.
Indicates how consistent the team is at keeping opponents off base.
OBP vs wins
OBP (on-base percentage) measures how often a player reaches base per plate appearance. Wins represent total team victories in a season. This scatter plot shows the relationship between how often teams reach base and how many games they win. Each dot is a team-season.
Teams with higher OBP generally win more games, supporting the Moneyball philosophy that getting on base drives team success.
SLG vs wins
SLG (slugging percentage) reflects a team's power hitting by weighting extra-base hits (doubles, triples, home runs) more heavily. This plot compares each team’s slugging ability to their seasonal win total.
While there's variability, teams with higher SLG tend to have more wins, suggesting that power hitting contributes meaningfully to overall success.
OPS vs win percentage
OPS (On-base + slugging) combines a team’s ability to reach base and hit for power. This graph compares team OPS to their win percentage (wins divided by games played). It’s useful to see if strong offenses directly translate into success.
The trend shows that teams with higher OPS tend to have higher win percentages, confirming OPS as a strong single predictor of winning.
Run differential vs wins
Run differential (RS - RA) is the number of runs a team scores minus the number of runs it allows. This chart compares run differential to total wins. Run differential is a classic sabermetric predictor of team quality.
There's a clear linear relationship, teams with positive run differentials win more. Outliers can reveal overperformers or underachievers.
Correlation (RS, RA, W)
This heatmap shows the correlation coefficients between runs scored (RS), runs allowed (RA), and wins (W). Correlation values range from –1 to +1. The closer to 1 or –1, the stronger the relationship.
Wins are positively correlated with runs scored (+0.51) and negatively with runs allowed (–0.53), which confirms that scoring more and allowing fewer runs are both crucial to winning.
Correlation (OBP, SLG, BA)
This chart displays how OBP, SLG, and BA (Batting Average) correlate with each other. BA is the traditional hitting metric; OBP includes walks; SLG emphasizes power. Seeing their interrelationships helps understand which metrics offer unique value.
OBP and BA are highly correlated (+0.85), but OBP includes more information. OBP and SLG also correlate strongly (+0.79), supporting the use of OPS as a combined offensive metric.
Linear regression
This R² value tells us how well a team's on-base percentage predicts the number of wins they get. In this case, OBP explains 23.2% of win variation, meaningful, but suggests other factors (e.g., pitching, defense) also matter.
We applied linear regression to estimate how well OBP predicts team wins. Result of OBP alone explains ~23% of win variation.
🧩 Analyst modes
Use the dropdown below to select from five analytical modes, each inspired by different baseball philosophies, from classic sabermetrics to modern AI. Each mode will adjust the data visualization and computation logic in real time.
Analyst mode overview
Moneyball (Beane): Evaluates how getting on base (OBP) and hitting for power (SLG) impact total wins, based on the core ideas from Moneyball.
Pythagorean (Bill James): Estimates a team's expected wins using run differential. Compares it to actual wins to reveal over- and under-performers.
Sabermetrics classic: Shows correlations between key offensive stats: OBP, SLG, and BA. Useful to understand relationships between traditional and advanced stats.
AI analyst (LLM): Uses a large language model (Groq LLaMA 3) to answer custom queries about team performance, outliers, or trends, in plain language.
Story mode (Goldsberry-style): Creates visual narratives over time: how teams improve or decline in metrics like OPS and win %.
🤖 AI model used:
Choose a mode from the dropdown to activate a different type of baseball analysis:
⚾️ Player-level sabermetrics
🧾 Description
This section dives into sabermetrics by analyzing individual player performance, not just team-level stats. The data is sourced from the Lahman Baseball Database, one of the most comprehensive open baseball datasets.
Top 10 hitters by OPS
Displays the players with the highest on-base plus slugging (OPS) among those with more than a threshold number of at-bats.
🧠 OPS = OBP + SLG, a popular composite metric combining getting on base and hitting for power.
Use this slider to set the minimum number of at-bats (AB) (the number of times a player comes up to hit), required to appear in the Top 10 OPS ranking. This filters out players with too few appearances, avoiding misleading OPS values from small sample sizes.
How to read: Higher OPS means more offensive value. The bar chart shows the best all-around hitters, while the table lets you inspect exact numbers per season.
OPS by year
This box plot displays the distribution of OPS (on-base plus slugging) for all players (with >100 at-bats) across seasons from 1956 to 2012. Tracking OPS by year helps identify historical shifts in offensive performance, such as during the “Steroid Era” or modern power surges.
How to read it: OPS values cluster around ~0.7–0.8, with noticeable spikes and broader variance in some years, especially post-1990.
🧾 Player regression & AI analysis
This regression connects individual performance (OPS) to team outcomes, revealing how much offense alone predicts success.
R² score: How well player OPS explains team wins, a low score means OPS alone doesn't capture all win factors.
Regression slope: How much wins increase per unit of OPS, gives the strength of relationship.
Intercept: The baseline number of wins predicted if OPS was zero (just a model constant).
🤖 AI assistant – explain players
AI player query
How to use: Enter a player-level question → click “Run analysis” → get an instant AI-generated answer.