3 Simple Steps to avoid lying with the data visualization
Visualizing your data is the best way to communicate complex information to your audience. However, consciously or unconsciously, by disregarding some basic rules of data visualization, you might create misleading charts for your audience. This blogpost looks at 3 simple steps you can take to avoid lying with data visualizations.
1. Start the Y axis with 0
Sometimes when a bar chart, line chart, or scatter chart have large values on the Y axis, spreadsheet programs (or designers) try to reduce the chart area and cut the Y axis, making the chart visually misleading.
In the first chart, when looking at the number of tourists in 2010 and comparing it to the number of tourists in 2011, you might think that the increase is close to 80-90%.
However, if you look at the values and compare them, or just look at the second chart with an Y axis starting with 0, you will see that the increase is no more than just 40-45%.
2. Avoid showing size with shaped figures
Some people try to visualize values using icons, such as the shape of a house or person. But is it always possible to show exact proportions when it comes to icons or circles?
For example, we want to show that in a particular country, smokers make up 10% of all women and 53% of all men. If we want to use a shape of a person filled up by tobacco to visualize these numbers, how do we define the point where we stop the fill? How do we expect the viewer to perceive what part of the shape is filled up without looking at the accompanying number? Are they supposed to look at the height of the icon or the total area of it?
It’s also hard to visualize comparative values using shapes. For example, let’s say that we want to visualize and compare the area of two houses using an icon of a house. If the first figure is equivalent to 110 square meters, how much would the second one equal? Do we compare the height of the figure or the width? If we want to stay accurate, we need to calculate the exact area of the first figure and proportionally increase or decrease the second one, depending on its value. But when calculating the area, do we include the chimney part? Or do we disregard it to make our job easier? Calculating the area would be quite challenging for us, but it would be also difficult for our audience to perceive the areas of the figures and make visual comparisons between them.
3. Avoid using cumulative charts
When people want to show growth over time, they often use cumulative charts. Unfortunately, the underlying data might show that the truth is more complicated.
For example, let’s look at the number of visitors from Panama to Georgia over time. The cumulative chart visualizes the total number of visitors since 2010 and shows gradual growth in overall numbers.
When we look at the second chart, however, we see that number of annual visitors has decreased in this time period.
So, when you decide to use a cumulative chart, make sure the audience is clearly notified that the numbers are cumulative and the growth trend refers to the total number over several years rather than individual values.
These are basic 3 rules you need to follow to minimize the risks of misrepresenting your data. Remember, our job as data communicators is to represent data to our audience in a clear and accurate way, rather than confuse, scare, or deceive our readers.