The Rise of the Y-Axis-Zero Fundamentalists

On Friday, I read a Natalie Kitroeff Businessweek.com story on the declining appeal of law school, and was so struck by this chart that I shared it on Twitter:

law schools

The chart tells a dramatic story: all the gains in law school enrollment since the mid-1970s have been wiped out in just three years. Twitter responded to that drama with lots of retweets and favorites — but also with lots of disapproving remarks like this:

And this:

There were many, many more responses like that. A couple of them wielded the name of Edward Tufte, today’s leading authority on the visual presentation of data. Which is interesting, because after about five seconds of Googling I found Tufte’s actual views on the practice:

In general, in a time-series, use a baseline that shows the data not the zero point. If the zero point reasonably occurs in plotting the data, fine. But don’t spend a lot of empty vertical space trying to reach down to the zero point at the cost of hiding what is going on in the data line itself. (The book, How to Lie With Statistics, is wrong on this point.)

For examples, all over the place, of absent zero points in time-series, take a look at any major scientific research publication. The scientists want to show their data, not zero.

The urge to contextualize the data is a good one, but context does not come from empty vertical space reaching down to zero, a number which does not even occur in a good many data sets. Instead, for context, show more data horizontally!

Thanks to one of the offended responders on Twitter, Abhinav Agarwal, we can see what the Businessweek.com chart would have looked like with a zero base:

I love that he went to the effort to make that (thanks, Abhinav!) but … it is less informative than the original chart. Yes, in the new version it’s now crystal clear that law-school enrollment hasn’t gone to zero. But who looked at the original chart and thought it had? (Well, this guy says he did, but I think he’s kidding.) And the contrast between the herky-jerky rise of the past four decades and the straight-line drop since 2010 is much less clear in the zero-base chart. It hides the precipitousness of law schools’ change in fortunes.

Such arguments seem to carry little weight, though, among the legions of what BuzzFeed’s Matthew Zeitlin has dubbed y-axis-zero fundamentalists. I had somehow missed out on their rise, I guess because all of my HBR time-series charts over the past few years have for various reasons (the main one being that my Excel skills are so limited that I don’t know how to truncate the axes) featured y-axes that go to zero. But apparently now this is a thing. The Huffington Post‘s Ben Walsh reported a similar experience with a recent (non-zero-based) chart on taxi medallions in New York. According to Walsh, “all the responses were like ‘rule violated. i refuse to consider your thesis’.”

When I checked the Twitter bios of the people who objected to Businessweek.com chart, most of them were software programmers, so I wondered if it was some weird coder obsession. It might be, but a simpler explanation was that prominent programmer Jeff Atwood had retweeted it to his 152,000 followers.

Instead, I think it’s mainly just that more and more people have acquired some amount of statistical literacy, and have learned along the way that not basing your y-axis at zero is can be misleading. As Duke sociology professor — and believer in non-zero-based charts — Kieran Healy Tweeted when I asked him where he thought the reaction came from:

“Narrow axes can make small and inconsequential changes seem big,” Healy went on, “but—symmetrically—zero-axes can make big and real changes seem small. What matters isn’t some iron rule like ‘Always have a zero-base axis!’, it’s your prior commitment to being honest with the data.”

It is easy enough to find examples of people using broken y axes to mislead. From a Media Matters compendium of Fox News chart outrages:

fbn-cavuto-20120731-bushexpire

This isn’t much of a time series, and I really can’t think of any good reason why the y-axis on a bar chart shouldn’t go to zero. But more important than any simple rule is that this chart was obviously crafted to deceive — there’s really no other reason to draw the chart this way.

The Businessweek.com chart, on the other hand, was crafted to show the data as fully as possible. Facebook “data visualization guru” Andy Kriebel recommends adding a note to any non-zero-based-y-axis chart explaining why you didn’t base it at zero. That’s not a bad idea, but I also think the overwhelming majority of those who read a chart like this one online (as opposed to those who see a chart flitting by on the TV screen) are able to figure out what’s going on. I love that so many people online are on the lookout for dodgy charts. But focusing on the data isn’t really dodgy.

Update:  My brilliant colleague Scott Berinato, who is working on a book on data visualization for the HBR Press and created the cool Vision Statement “How to Lie with Charts” in the December issue of HBR, emailed me with his thoughts, which I don’t entirely agree with but seemed worth sharing given that he knows more than I do:

I have to agree with them about the Y axis. Not because it should be a hard and fast rule but because of the metaphor problem. Our brains create 0 when your line begins or ends at the bottom — a metaphorical zero as in “no one is going to law school because the line’s at the bottom.” This is exacerbated by the headline “Empty Classrooms,” which creates a textual cue that “empty” is what matters. 

There’s also the slope problem. Tufte is right and wrong. He’s right about just show the data but a truncated axis doesn’t actually show the data. The data is not the line, the line divides space that represents the proportion of a (those enrolling) and b (those not). So by truncating the axis we not only create a more severe-looking slope, we literally hide representative space, and more on one side than the other.

Having said all that, this kind of thing is rampant, because of web design. This chart would be very tall otherwise. So we have to think about the tradeoffs. My developing sense for these situations is to go even simpler. The data that matters here is:

‘74: low

‘74-‘10: Steady, rolling climb.

‘10-‘13: precipitous fall off

In theory we could build this same chart with three data points — ‘74, ‘10, ’13 — unless those three small humps on the climb matter to the story, which I don’t think they do. Basically start with as few data points as possible then add as necessary. Don’t even connect the lines necessarily; use points.

Discover more from By Justin Fox

Subscribe now to keep reading and get access to the full archive.

Continue reading