30 November 2009

Charting data with one disproportionally large value

Datavis tweeted about a fun chart he found comparing 'kills' between the top animals - I'm not sure the data are meant to be real or what 'kills' means, but the point is made that humans are by far the worst offenders. Part of the emphasis is that the human number is literally 'off the chart'.

It got me thinking about how to show data like these - where you  have one very large number and other much smaller numbers. There are two things that people may want to know- how much bigger the single point is than the others, and then how the others compare. I've thrown together a few ways you could do it:


Top left: default scaling - it certainly gets the point across that one value far eclipses the others, but you can't really compare the others, or put numbers to them.

Top right: log scale - often espoused as a solution, but frankly I can't stand it. Most people don't get log scales, and I find myself looking to the axis to compare values - I may as well just have the values in a table. It's also difficult to compare small differences between bars.

Middle left: breaking the axis - you can't really visually see how much bigger the human contribution is, but the point is made that it's much bigger, and you can still compare the smaller values. There is no way in Excel to do this automatically - you can limit the axis max of course, but there's no indication that the bar continues upwards.

Middle right: two charts - Arghhh, my eyes, a pie chart and some bevel effects. Actually I quite like this - pie charts are fine for comparing two values if you're not trying to read the exact comparison from them, though the numbers are needed. As for the bevel effect - I blame that on the excesses of Thanksgiving..

Bottom left: aspect ratio - simply enough, make the chart so long that you can compare the smaller values. Even on web only presentation this is a difficult one because of the real estate used up - that is, unless you use this space for other content.

Bottom right: just imply a continuation - similar to the broken axis, and similar to the original chart that Datavis tweeted, but with the actual value shown. Needless to say, this is not a default view in Excel..

Any comments on these, which is your favorite? How would you address this challenge for a dashboard vs. print publication?

13 comments:

  1. As an engineer, I grew up with log axes. They're no good in a bar chart, because there's no real zero. But in an XY chart, for a technically astute audience, log axes are fine. For a typical PowerPoint slide, log axes are no good.

    I wrote a tutorial about Broken Y Axes. It's a convoluted protocol, unfortunately, and broken axes are not always well understood.

    My preference is to have two charts: a larger chart showing the small values with good resolution, with the large bar extending past the top, and a smaller inset chart showing the full scale of the large value, and tiny small values for the others.

    ReplyDelete
  2. The pie chart is a bad choice for this data set, as the measure values are not a portion of a total.

    Also, in your bar charts, I think sorting descending would be a good change.

    As for a way to represent this data set, I would go with a dot plot on a log scale with the range from 1,000 to 1,000,000. See: http://joemako.com/files/kills.png

    This lets you see Humans and Alligators as outliers, and that the other 4 are in a simular range. The tick marks also help the viewer see that the scale is logarithmic.

    I learned about this approach from Stephen Few's book "Now You See It" p199.

    ReplyDelete
  3. the pie approach is a bit misleading, as readers might expect the slices to represent share of total, not share of top 6.
    So I'd use a bar chart like you do, however I'd sort the categories by value.
    If the main message is that one value is much greater than the others, I'd go for the top left one. The actual number of kills of each category is not really important.
    On print, if I have ample amounts of space, I might be tempted to use bottom left and to wrap the text inside the chart - to let the top category bar occupy the whole height of the page. The NY times had done this once, for an article about war casualties - it was featured in a visweek presentation, 2007 I think.
    In a presentation context I'd definitely use motion: scrolling up, or zooming out. something sober though.

    ps where's the malaria mosquito?

    ReplyDelete
  4. @Joe. I agree about the sorting, it would be my usual choice, but I left it for two reasons - the original chart was not sorted and, if the emphasis of the chart is that one value is much higher than the others, there is something visually alerting when that value is in the middle of the axis.

    The log chart is good - you are absolutely right about the dots vs. the bars, and the sorting helps with reading the scale. I am a believer that you shouldn't dumb things down for mass consumption, but I also believe that if this was in a non-scientific / non-engineering type publication, a log scale doesn't work as well for a lot of readers.

    ReplyDelete
  5. @Jerome: Agreed about the top left, but I do like people to be able to take as much as possible out of a chart - depends on the context of presentation.

    The dynamic approach, either on the web or on ppt, would work excellently for these data.

    Both you and Joe mention the pie chart - presumably the objection is that there are other animals that also kill - and of course you are right. In a different data set this would work better, but the argument against pie charts is always good..

    Per the mosquito, I suspect the original data chart which was found on a 'fun pictures' site, is just fabricated anyway.

    ReplyDelete
  6. @Jon (does anyone whose name not start with a J read this blog?). Log - agreed, lose the bars, but they are only for certain situations.

    Nice tutorial on broken axes. On mine above I just inserted shapes - no good for dynamic data, but quick and dirty.

    I think my preference is the bottom left, if the media lends itself to that, otherwise the two chart approach works for me.

    ReplyDelete
  7. My sense is that log scales are really best reserved for instances where there's an exponential relationship that can be linearized by the log scale. Typically that's things like population growth, stock charts, etc. - but I suppose I can see a place for them where there's a power distribution. This doesn't have that power-distribution feeling, since you get a big outlier and a bunch of pretty-much equal values.

    If I were to go with log scale, maybe the width would vary, so the area of the bars accurately represent the values, even if the log scale provides a shorthand formula for shortening. More than I can do in Excel though!

    ReplyDelete
  8. @Funky, I can already hear the palpitations of the data visualization experts - a bar chart, but with area as well!

    It's certainly an interesting idea..

    ReplyDelete
  9. Now that I think about it, it's a perfect case for one of those "charts" where one symbol (skull & crossbones?) = 1000 lives and you stack up a bunch of symbols in a grid for each category.

    E.g., fill up a 30x29 grid with these symbols for the human category, but fill a much smaller 5x6 grid for all the rest. This is so often done inappropriately but this might be the paradigmatic use.

    No key/title, but a pretty straightforward version

    ReplyDelete
  10. @Funky That's my new favorite I think. You can still compare the lower values nicely, but the human area is so overwhelmingly larger.

    By moving into the second dimension you allow the aspect ratio to be 'normal', but you avoid the 'area of the pie chart represents 100% of all deaths' issue, and the smaller areas are still easy to compare.

    ReplyDelete
  11. Also, what did you use to draw that?

    ReplyDelete
  12. I didn't think I would like FunkyGawy's pictogram, at least until I saw his example. Too often these are used to show precision data rather than a pattern like this data set has.

    I think of the precision examples, I like Joe's dot plot best.

    ReplyDelete
  13. I did it in Word (hurrah Wingdings!), but I bet Jon could do it in Excel.

    ReplyDelete

Thank you for taking the time to read this blog and commenting.

ShareThis