The problem with data visualizations

You’ve probably heard the skeptical aphorism about statistics: “There are three kinds of lies: lies, damned lies, and statistics.” Unfortunately, I worry that we may soon hear “data visualization” tacked on to that list as the fourth and most deceiving way of communicating information. This is a problem.

A couple months ago, I was reading through some materials about a company’s financial health, and they included a dual axis chart showing the company’s Revenue and EBITDA from 2008 to 2014 (projected).  (See below.)

Dual Axes exampleLooking at the chart, it appears that the two measures increase in a near parallel fashion. The slope of their lines is pretty comparable, particularly in the later years on the chart. Problem is, this misrepresents what is actually happening. The axis on the left increases in increments of 20 while the one on the right does so in increments of 10, which means the relationship that appears between the lines is a misrepresentation.

When we chart the same data on a single axis we see that EBITDA fails to increase nearly as dramatically between 2012 and 2014 as revenue does. (See below.) If I’m evaluating the health of this company and its future prospects, that difference may be important!

Single Axis example

Adding a second axis seems like such a simple, innocuous thing, but it changes how the data might be interpreted and understood. This is just one example of the substantial impact a seemingly small design decision can have.

Why should we care about this? There are two main reasons:

  1. As consumers of more and more visual data, we need to be aware of situations like this where visualization design decisions may obscure (or at least distract from) certain critical pieces of information. Just because it’s data (data never lies!) and you can see it (my eyes would never deceive me!), doesn’t mean it is presented in an objective way.
  2. As more of us are in roles where we create data visualizations, we need to be aware that if we are careless, we run the risk of misleading our audience or imposing (hopefully unintentionally) our own viewpoint on the information we present.

Data visualization likely will be one of those things many of us try to do without any formal training, and I worry that, as a result, a lot of folks will do it badly. Am I being overly paranoid? I hope so. But this particular example doesn’t do much to allay that paranoia.

Advertisements

2 responses to “The problem with data visualizations

  1. Why would one use a dual axis chart?

    • In some cases, they can be helpful in comparing data in two different units of measure. However, Stephen Few suggests that the dual axis chart is never the best option for displaying data (see here), and I tend to agree with him: most of the time there probably is a better way to clearly show your data than a dual axis chart.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s