2022-12-10    Share on: Twitter | Facebook | HackerNews | Reddit

Grammar of graphics explained like to 5 years old child.

Learn the difference between imperative & grammar of graphics approach to plotting. Imprerative approach is simple and easy to understand, but grammar of graphics approach is more advanced and efficient. Examples with matplotlib and plotly express included.

A "grammar of graphics" approach to plotting is like a set of instructions for drawing pictures. This is often contrasted to "imperative" approach, so before explaining "grammar of graphics" let's see what is the

Imperative approach to plotting

The "imperative" approach to plotting is like giving a set of instructions to draw a picture, step by step.

For example, imagine you are playing with a toy that can draw pictures, and you tell it:

"First, draw a big blue circle in the middle of the page, then draw a smaller red square to the left of the circle, and finally write the words 'Hello, world!' next to the square."

That would be like using the "imperative" approach to making a plot.

In the imperative approach, the instructions for creating a plot are often given in a single block of code, you tell the computer to create a figure, then you tell it to add some data to it, and then you tell it what the properties of the plot should be, like the color and size of each element of the plot.

It's simple, straight-forward and easy to understand. But, if you want to make changes to the plot, you will have to give a new set of instructions, which can be more time consuming and complex, compared to the "grammar of graphics" approach.

Matplotlib is a plotting library for the Python programming language that follows the "imperative" approach to plotting rather than the "grammar of graphics" approach. With matplotlib, you typically start by creating a figure and an axes object, and then use various functions to add data and specify the properties of the plot. For example:

In [1]:
import matplotlib.pyplot as plt

# Create a figure and an axes
fig, ax = plt.subplots(figsize=(3,3))

# Add data and specify properties
ax.plot([1, 2, 3, 4], [1, 4, 9, 16], "-o")
ax.set_title("A simple plot")

# Adding x, y axis labels 
ax.set_xlabel("Temperature")
ax.set_ylabel("Volume")

# Adding text annotation
ax.annotate("Annotation Text", (3, 10), textcoords="offset points",
            xytext=(0, 10), ha='center')

# Show the plot
plt.show()

Grammar of Graphics 101

A "grammar of graphics" is like a set of instructions for drawing pictures. Just like how you might tell your friend to:

"draw a big sun in the top corner, and a tree next to it, with a bird sitting on a branch,"

With the grammar approach set of instructions for a computer to make a graph might look like this:

In [2]:
import plotly.express as px

data = px.data.gapminder()
fig = px.scatter(
    data_frame=data,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_name="country",
    log_x=True,
    size_max=60,
)
# fig.show()
Out[3]:

Faceting

This example shows a more advanced aspect of "grammar of graphics," called faceting. Faceting is like making a bunch of smaller pictures and putting them all together in a grid. Each of the smaller pictures shows the same kind of graph, but with a different subset of the data. Here's an example:

In [4]:
import plotly.express as px

data = px.data.tips()
fig_2 = px.scatter(
    data_frame=data,
    x="total_bill",
    y="tip",
    color="sex",
    facet_col="day",
    trendline="ols",
)
#fig_2.show()
Out[5]:

This code makes a scatter plot of total bill on x-axis and tip on y-axis, it uses different colors for dots based on the sex of the customer, and it makes separate scatter plots for each day of the week. It also adds a trendline, a line that shows the overall pattern in the data.

To break this down, color='sex' is like giving instructions to make the dots different colors based on sex, facet_col='day' is like saying to make separate pictures for each day, and trendline='ols' is like saying to draw a line that shows the overall pattern in the data.

It will create a grid of scatter plots, one for each day of the week, showing how the total bill and tip varied by day. On each scatter plot, the dots will be colored differently based on the sex of the customer who paid the bill. And all scatter plots will have a trend line, showing the overall pattern in the data.

Other, more advanced aspects of "grammar of graphics"

Here are a few other advanced aspects of the "grammar of graphics" that are worth mentioning:

Statistical Transformations: A statistical transformation allows you to apply a statistical operation to your data before it is mapped to the aesthetics of the plot. For example, you might want to apply a log transformation to the data to better visualize the distribution. Some libraries such as ggplot2 in R or plotly in python support this feature to give more accurate representation of the data.

Coordinate Systems: Different types of plots may require different types of coordinate systems. For example, a scatter plot with geographic data might use a Mercator projection, while a bar chart might use a Cartesian coordinate system. The grammar of graphics allows you to specify the appropriate coordinate system for your plot.

Customizing geometries: Some libraries provides the ability to specify custom geometric objects which can be used to create new types of plots not supported by built-in geometries. This can be used to make more customized, specialized plots for specific use cases.

annotation: Some libraries provides the ability to add annotation to the plot, for example, labels, text, shapes or arrows, that can help you give more context to the plot, highlight specific regions of the plot or indicate important values or trends in the data.

Interactivity: Many libraries, such as plotly or bokeh, allows adding interactive features to your plots such as hover information, zooming, or linking to other plots. This can be especially useful when working with large or complex datasets that can benefit from more interactive explorations.

Grammar of graphics is a powerful approach for creating plots and visualizing data that offers a lot of flexibility and customization. With a good library, you can create a wide range of plots, from simple to complex, and make them more informative and engaging.

Layered grammar of graphics

This approach is similar to the grammar of graphics in that it provides a set of instructions for creating plots, but it organizes these instructions into layers. Each layer represents a different aspect of the plot, such as the data, the scales, the geometric objects, and the statistics. The idea behind this approach is to make it easier to reason about complex plots and to make it more flexible to update and change them.

An example of a library that implements the layered grammar of graphics is ggplot2.

Plotting libraries following "grammar of graphics" approach

  • ggplot2 is a popular data visualization library for R programming language that uses the layered grammar of graphics. It is known for its clear and elegant syntax, and it has a wide range of functionality for creating complex and informative plots.

  • plotly is a data visualization library for Python that is built on top of the plotly.js JavaScript library. It uses the grammar of graphics approach, and it also provides a wide range of interactive features, such as hover information and zooming, making it easy to explore and understand complex datasets.

  • seaborn is a data visualization library for Python that is built on top of matplotlib. It uses the grammar of graphics approach and it is mostly used for statistical data visualization. It provides a high-level interface for creating informative and attractive plots.

  • Altair is a data visualization library for python, it's built on top of Vega. It uses a simple API for building and displaying declarative visualizations. It has a simple and readable grammar of graphics and it's good choice for creating simple and interactive visualizations.

  • vega is a declarative format for creating, saving, and sharing visualization designs. With Vega, visualizations are described in JSON, and generate interactive views using either HTML5 Canvas or SVG.

Summary

A "grammar of graphics" is a set of instructions for creating pictures using code. In my examples, I gave instructions for creating a scatter plot and faceting, which is creating multiple smaller plots in a grid with different subset of data. The first example uses GDP per capita, life expectancy and different colors to show population and continent . The second example creates scatter plots for each day of the week, with different colors for dots based on customer sex and trendline showing overall pattern.

Out[6]: