Tufte in Python – Part One

Edward Tufte recommends several principles for representing data in his book “The Visual Display of Quantitative Information“. Here are some that I find especially useful:

  1. The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented (p. 77)
  2. Write out explanations of the data on the graphic itself. Label important events in the data (p.77)
  3. Show data variation, not design variation (p. 77)
  4. Maximize the data-ink ratio. Erase non-data ink and redundant data-ink (p.105)

Inspired by Lukasz Piwek‘s implementation of Tufte’s design principles in R, I decided to attempt the same in Python. In this post, I will show you how to create Tufte-style line plots, such as this:

Python has an array of visualization packages (here‘s a good overview). My go-to package till now has been plotnine, which provides a ggplot2 interface in Python. It’s elegant and has allowed me to escape learning Python’s infamously annoying package, matplotlib. However, for all the flak it gets, matplotlib is the most customizable and powerful visualization package Python offers. It also serves as the base for most other Python visualization tools (including plotnine). So, here’s a guide to implementing Tufte’s principles in Python using matplotlib.

A very short introduction to Matplotlib

A confusing aspect of matplotlib is the existence of two APIs. This is possibly one of it’s biggest flaws. It makes troubleshooting bugs very difficult since answers on StackOverflow frequently jump between the two APIs.

  1. MATLAB-style API: Matplotlib was originally written to mimic MATLAB, and the pyplot (plt) interface provides a collection of MATLAB-like commands
  2. Object oriented API: This API is more flexible, and the one to use if you want better control and customization.

I recommend using the object-oriented interface. This ensures that you have standard syntax irrespective of whether your plot is simple or complex

Here are the highlights of my matplotlib reading. It took me about 30 min to get familiar with the basic syntax.

  1. Lifecycle of a plot: A good matplotlib primer using the object-oriented interface. Pay special attention to the definition of Fig and Axes under ‘A note on the Object-Oriented API vs. Pyplot‘. I recommend keeping this figure open on the side as reference. After reading this you should feel comfortable with ~80% of the visualizations in this post
  2. Get familiar with Artists: Everything in your plot is basically an Artist. Recognizing this is useful when you want to customize the default instances of your objects.
  3. Get familiar with GridSpec: This will be useful for the scatter-histogram plots, which we will create in Part 2 of this post.

Minimal lineplot

The plot we will replicate is found in The Visual Display of Quantitative Information, p.68.

First, let’s modify the default matplotlib font. The font in Tufte’s plot is an oldstyle serif font, one where the numerals don’t line up at the top and the bottom. After some Googling I settled on ‘Sabon Roman OsF‘ and downloaded and installed the .ttf version of the font.

To install the font in matplotlib, I deleted the fontList file from matplotlib’s font cache (find this by running print(matplotlib.get_cachedir())). Next, modify the rcParams to use our new font as the default serif font (you may have to restart your kernel to get this to work)

import matplotlib as plt
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = 'Sabon RomanOsF'
view raw tufte1.py hosted with ❤ by GitHub

Next, using the original plot as reference, I created some data:

x = list(range(1967,1978))
y = [310, 330, 370, 385, 385, 393, 387, 380, 390, 400, 380]
view raw tufte2.py hosted with ❤ by GitHub

Now let’s initialize and modify the figure and axes:

from matplotlib.ticker import FuncFormatter
from matplotlib.patches import ArrowStyle
fig, ax = plt.subplots(figsize=(7, 3))
# remove splines
for spine in ax.spines.keys():
# set axis limits
ax.set_ylim(290, 420)
ax.set_xlim(1966.5, 1979)
# define axis ticks
ax.yaxis.set_ticks(list(range(300, 420, 20)))
# increase space between axis labels and ticks
# create formatter to convert y_ticks to dollars
def format_ticks(x, pos):
return f'${x}'
formatter = FuncFormatter(format_ticks)
# modify y ticks to dollars
view raw tufte3.py hosted with ❤ by GitHub

At this point, here’s how our plot looks:

Time to add some data!

# plot data
# add two layers of points to create an illusion of a discontinuous line. "zorder" specifies plotting order
ax.scatter(x, y, s=64, color='white', zorder=2)
ax.scatter(x, y, s=8, color='black', zorder=3)
# add connecting line
ax.plot(x, y, color='black', zorder=1, linewidth=0.7)
# add horizontal dotted lines. See linestyles here: https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/linestyles.html
ax.plot([1970, 1977], [380, 380], linestyle=(
0, (5, 10)), linewidth=0.3, color='black')
ax.plot([1970, 1977], [400, 400], linestyle=(
0, (5, 10)), linewidth=0.3, color='black')
# add text
ax.text(1967, 415, "Per capita\nbudget expenditures,\nin constant dollars")
ax.text(1978, 390, "5%")
# add bar
bar = ArrowStyle.BarAB(widthA=0.2, widthB=0.2)
ax.annotate('', xy=(1977.5, 378), xytext=(1977.5, 402),
arrowprops={'arrowstyle': bar, 'lw': 0.7})
view raw tufte4.py hosted with ❤ by GitHub

Here’s the final plot:

Now that we know our way around Tufte-style line plots, we can replicate a more complicated example from The Visual Display of Quantitative Information, p.75. The dark line-segment between 1955-1956 indicates stricter enforcement by Connecticut policemen against cars exceeding the speed limit. Data from other states is provided for comparison.

# create data
x = list(range(1951, 1960))
data = []
data.append({'label': 'New York', 'x': x, 'y': [
13.8, 13.6, 14.3, 13, 13.6, 13.5, 13.4, 13, 12.6]})
data.append({'label': 'Massachusetts', 'x': x, 'y': [
10.1, 10, 11.5, 10.8, 11.7, 11.50, 10.1, 11.8, 10.7]})
data.append({'label': 'Connecticut', 'x': x, 'y': [
12.7, 10.9, 12.5, 11, 14.3, 12.4, 11.8, 10.2, 9.7]})
data.append({'label': 'Rhode Island', 'x': x, 'y': [
8, 8.2, 8.2, 8, 10.4, 8.2, 8.4, 8.5, 9]})
# initialize figure
fig, ax = plt.subplots(figsize=(6, 6))
# remove splines
# set spline bounds
ax.spines['left'].set_bounds(8, 16)
ax.spines['bottom'].set_bounds(1951, 1959)
# set axis limits
ax.set_ylim(7, 17)
ax.set_xlim(1950, 1960)
# set axis ticks
ax.yaxis.set_ticks(list(range(8, 18, 2)))
# make ticks inward facing
# plot data
for data_dict in data:
x = data_dict['x']
y = data_dict['y']
label = data_dict['label']
ax.scatter(x, y, s=8, color='black', zorder=3)
ax.scatter(x, y, s=64, color='white', zorder=2)
if label == 'Connecticut':
linestyle = 'solid'
ax.plot([1955, 1956], [y[x.index(1955)], y[x.index(1956)]],
color='black', linewidth=2, zorder=1, linestyle=linestyle)
linestyle = (0, (5, 10))
ax.plot(x, y, color='black', zorder=1, linewidth=0.7, linestyle=linestyle)
ax.text(x[1]+0.5, y[1], label, va='center')
ax.text(1957.5, 15.5, 'Traffic Deaths per 100,000\nPersons in Connecticut,\nMassachusetts, Rhode Island,\nand New York, 1951-1959', va='center')
view raw tufte5.py hosted with ❤ by GitHub

Final figure:

In the next post we will look at some other Tufte-style plots, including a minimal boxplot and a dot-dash plot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s