Multi-library data visualization
Comparing Matplotlib, Seaborn, Plotly, and Altair
In this notebook, you'll find a variety of data visualizations created using four different libraries: Matplotlib, Seaborn, Plotly, and Altair. Each graph is recreated in all four to highlight their unique styles and capabilities, making it easy to compare and explore different approaches to data visualization.
Scatter diagrams
A scatter plot visualizes the relationship between two numerical variables using dots. Each dot’s position reflects its values on the horizontal and vertical axes. Scatter plots help identify correlations, clusters, outliers, and trends, making them useful for spotting patterns like positive or negative correlations and nonlinear relationships.
Counts plot
A counts plot visualizes one-dimensional data by scaling dots based on their frequency, reducing overlap seen in strip plots. It helps reveal distribution patterns and is often used alongside jitter plots, violin plots, and boxplots for comparing groups.
Marginal boxplot
A marginal boxplot combines a scatter plot with boxplots along the margins to summarize the distribution of each variable. Unlike marginal histograms, boxplots highlight key statistics such as the median, 25th, and 75th percentiles, making it easier to identify skewness, variability, and potential outliers in both X and Y variables. This visualization helps in understanding data distribution while preserving relationships between variables.
Correlation heatmap
A correlation heatmap visually represents the relationships between numerical variables using a color-coded matrix. It highlights patterns, trends, and the strength of correlations, making it useful for analyzing large datasets. Heatmaps help identify key variables, detect multicollinearity, and guide further analysis.
Pairplot
A pair plot visualizes relationships between numerical variables by plotting each against the others in a grid. The diagonal often shows univariate distributions like histograms or density plots. Widely used in exploratory data analysis, pair plots help reveal correlations, clusters, and outliers. Various visualization libraries offer customization options, including plot types and color coding.
Area chart
An area chart is similar to a line chart but fills the space beneath the line with color or shading to emphasize trends over time. The X-axis represents time or an ordered variable, while the Y-axis shows corresponding values. By highlighting changes in magnitude, area charts effectively illustrate trends, cumulative data, and comparisons across categories.
Time series line chart with annotations
A time series line chart with annotations visualizes data points over time, with key events or insights highlighted using labels, markers, or shaded regions. The X-axis represents time, while the Y-axis shows the measured values. Annotations help provide context, making trends, anomalies, or significant changes easier to interpret. This type of chart is widely used in financial analysis, stock market tracking, and performance monitoring.
Ordered bar chart
An ordered bar chart is a variation of a bar chart where bars are sorted in ascending or descending order to enhance readability and comparison. It helps highlight rankings, trends, and differences across discrete categories more effectively than an unordered bar chart. Ordered bar charts are commonly used in business analytics, survey results, and performance comparisons.
Density plot & histograms
A density plot visualizes the distribution of a numerical variable using kernel density estimation (KDE), creating a smooth curve instead of discrete bins like a histogram. It helps identify patterns, skewness, and outliers while allowing easy comparison between datasets. The area under the curve represents total probability (equal to 1).
Dot & box plot
A dot-box plot enhances a standard box plot by overlaying individual data points, revealing data density and patterns. It retains key statistics like median and quartiles while showing actual values.
Time series peaks & troughs
Time series peaks and troughs represent the high and low points in a time-based dataset, helping to identify trends, cycles, and volatility. Peaks are local maximum values, indicating high points in the data, while troughs are local minimum values, marking low points. Detecting these points is crucial in financial analysis, economics, and performance monitoring to understand fluctuations, seasonality, and turning points in trends. Various smoothing techniques, moving averages, and statistical methods can help identify and analyze peaks and troughs effectively.
Andrews curves
Andrews curves visualize high-dimensional data by transforming each observation into a continuous function, helping to identify patterns, clusters, and outliers. They preserve statistical properties like means, distances, and variances, making them useful for exploratory data analysis. Andrews curves can be plotted using the andrews_curves() method, effectively grouping similar data points based on their characteristics.
Andrews curves are not supported in Altair, therefore, there will be plots only in Matplotlib, Seaborn, and Plotly.
Comparison of Matplotlib, Seaborn, Plotly, and Altair
Matplotlib
Seaborn
Plotly
Altair
Strengths:
Strengths:
Strengths:
Strengths:
Weaknesses:
Weaknesses:
Weaknesses:
Weaknesses:
Best for:
Best for:
Best for:
Best for: