An Altair love affair
The age-old adage a picture is worth a thousand words is an important trueism for data scientists like myself. My physics teacher in high school taught me that the best way to understand stuff is to draw. So over the years, I have continously invested time in working with open source interactive graphics libraries.
As I first started working in R, I was enarmoured with Hadley Wickham's ggplot2. I found that using the logic of Leland Wilkinson's Grammar of Graphics supports reasoning about how to best present your results, rather than putting up with extra cognitive load and dealing with obtuse syntax to create your chart. Switching to Python, I sadly found that we don't have a ggplot2 in the pydata stack. Sure, Bokeh and plotly are excellent libraries which I have used on various projects. But it never felt quite the same ...
Last year, as I was preparing for my lectures on data visualization at the Jheronimus Academy of Data Science during the second lock-down, I bumped into Altair and started a new love affair. Turns out, I am not the only one, and for the same reasons like Fernando Irarrázaval, I now use Altair for most of my visualizations in Python. I will show - rather than tell - you why, by demonstrating how you can implement different concepts of effective data visualization in Altair.
Leland's seven classes
GoG class | Description | Comments and examples |
---|---|---|
1. Varset | A set of one or more variables | More generic definition than just a table or dataframe |
2. Algebra | Produce combinations of variables | Join, concatenate, group by |
3. Scales | Scale variables | Transformation like taking log or normalizing |
4. Statistics | Compute statistical summeries | Generates a new varset |
5. Geometry | Control the type of plot | point, line, area, path, bar, polygon, edge etc. |
6. Coordinates | The coordinate system and faceting | Usually Cartesian, but also polar or geographic coordinates |
7. Aesthetics | Actual mapping of variables to a perceivable graphic | Visual variables include position, size, shape, orientation, brightness, color, granularity. For interactive graphics also blur, sound and motion. |
Vega-Lite's Grammar of Interactions
Selection component | Description | Comments and examples |
---|---|---|
type | Way in which backing points are selected as minimal set to identify all selected points | point, list, interval |
predicate | Logic to determine selected points | Inside or outside dragged area, within a range etc. |
domain or range | Invert screen position to data values | Click on a mark for selecting single point, drag to select points in area etc. |
event | The actual input event | Mouseover, selection by dragging |
init | Initialize selection with specific points | Used for automatically determining scale extents |
transforms | Manipulate selection | E.g. moving a rectangular selection |
resolve | Re-evaluate visual encodings as selections change | Change color (highlighting), use selection as input for other encodings (cross-filtering), re-define scales etc. |