Obesity prevalence graph makeover

Another interesting challenge presented by Nathan at FlowingData to improve the design and clarity of message of the graph presented below which displays the results of a study investigating obesity rates at different ages across people who were born in different cohorts of years.

The primary difficulty in using this display is the problem caused by having to decode eight line graph series which represents the various year of birth cohorts. As Nathan comments, you can eventually adjust your reading ability to draw some insights from this graph, though it does require a multi-slide presentation to truly impart the key messages.

My re-working below simply takes the raw data, moves the year of birth categorical values on to the x-axis and then applies Excel’s cell conditional formatting to present a colour pattern of obesity rates from 40% (darkest red) down to 0% (pure white). I have used a single hue with changing tints towards white as this helps our visual system to better judge a decreasing value. I have left the original values in the cells for two reasons – firstly, I can’t for the life of me remember how to switch them off and, secondly, I actually think they add an extra layer of potential insight being able to reference key values when explaining the insight from this study.


Gavin KistnerApril 29th, 2010 at 12:50 pm

This is very nice. Well done. I have one issue with it, though, and I don’t have a good answer for how it might be overcome:

The stair-stepped top is a limitation in the data. We don’t know the obesity rates for 10 year olds born between after 1996 because the study ended before they were alive. This stair-stepping is the most obvious part of the graphic, however. It leads one, I think, to draw bad conclusions.

“Are people getting fatter earlier?”
“Sure! Look at how that top line comes down dramatically!”

Instead you want people to focus on rows and columns. Looking at the first three columns there appears to be a consistent trend where the older you get, the greater the percentage of your peers will be obese. Assuming that holds true, the middle three rows (and perhaps even the bottom two) are where the real supporting data lie.

The graphic is easy to read once you know to look across the rows, but the stair-stepping is distracting. There really wants to be some sort of dotted-line-like appearance that helps you know that the future isn’t yet known. Or perhaps a big diagonal wavy line, a sort of visual ellipsis.

I wonder if re-choosing the axes could help and still be good. (You have three axes of data, two spatial and one in value.) What if the age axis was the color; then you could have a color and cell label indicating n/a.

Andy KirkApril 29th, 2010 at 1:29 pm

Hi Gavin, many thanks for your feedback.

You are right to point out the observation of the staircase effect and it would be interesting to explore further transpositions and alternative approaches. I don’t really have a satisfactory solution either right now.

With the heatmap encoding value through colour, it reduces the options to dilute the impact somewhat, perhaps a grey could be used to block out those cells without readings but that would still leave the staircase visible. I had started (but didn’t finish) an alternative graph design for this that plotted the age groups when key milestones of 10,20 and 30% obesity rates were reached – as presented on the study’s website – which avoided this effect but was still ultimately incomplete.

Naomi B. RobbinsApril 29th, 2010 at 3:14 pm

My current pet peeve is figures with too many percent signs, dollar signs or decimal places. This figure would be less cluttered without the % sign in every cell. The title tells us that the data are in percents.

Andy KirkApril 29th, 2010 at 3:26 pm

Fair comment Naomi, thanks for taking time to feedback.