7 December 2009

Visualizing data that obscures trends (pie charts are "better")

I've been playing around with Google trends for a while now. It's an incredibly powerful tool for market research, especially when you're looking at new markets, or how the popularity of a brand is changing.  You can look at how a particular search term has trended over five years, where the term is more popular (countries, cities), compare several search terms, and how news stories that bear relevance to the search terms may have impacted the search volume.

The search terms I chose to compare for this example were "pie chart" and "bar chart", partially in response to the friendly back and forwards that Chandoo and Jon Peltier have had recently concerning appropriate use of pie charts. My data adds nothing to their conversation apart from what the public considers 'search worthy', so apologies for the misleading blog title, but the data are perfect for demonstrating how you can better visualize data that have seasonal and other variations that hide overall trends.

Here are the data - relative search popularity of the two terms on a week to week basis since 2004. Some things come out immediately - people search for pie charts more than they search for bar charts. There seems to be seasonal variation within each year, and there seems to be a trend over the years. To investigate these trends further I summarized data by month/year/etc. by grouping the dates in pivot charts.


Here's the seasonal variation - taken by looking at the average search popularity by month of the year, across all five years. Both terms are searched for less in the summer (for the northern hemisphere, but the data are overwhelmingly dominated by users in this hemisphere), and at the end of the year. This would coincide with both school holidays and lower business search volume (as people take vacations). The lines suggest that "bar chart" is less affected during the summer than "pie chart". To confirm this, I've added another series that compares how the two terms vary compared to each other. If the bar for a month is lower than 1, "bar chart" has dropped less relative to its average popularity over the year when compared to "pie chart", so yes, even when you account for the fact that absolute height variation of "pie chart" is greater, "bar chart" does not experience as great a drop-off mid-year.

As a nod towards the bar-chart vs. pie chart argument, perhaps one could argue tongue-in-cheek that "pie chart" is searched for by school children more than business users as the drop-off is larger in the school summer vacation, so therefore bar charts must be more important to business.. No, probably not.

Before we move on though, a quick comment on what we've lost from the 'raw' data. From our summary chart you could conclude that the search volume is about the same in December as it is in the summer months. However, look at the top chart - search volume only drops in December in the last two weeks of the month, whereas search volume is lower for every week of July. For that reason, you must carefully consider the level of summary you chose, whether an average or sum is better for comparisons, and whether you should also show the raw data.


Next we have year to year variation. There is a dip in 2006 and 2007 (for which I have no explanation..) The pattern is the same for bar and pie searches. It is likely to be a reflection of real search volume, not a collection artifact, because two unrelated search terms do not show this dip. I wondered if "bar chart" was catching up to "pie chart", so the line compares the absolute difference. Interestingly, without the average line plotted across, it really looks like this line is dropping - "bar chart" is catching "pie chart" up, but it's misleading - because both the other lines are headed upwards, even a straight-ish line looks like it's dropping. The addition of the straight average line reveals that there is little, if any catching up being done.

Similarly to the prior chart, you have to be mindful of what you've lost in this summary. If you use an absolute count of searches, rather than an average then the results will be different to some degree (and not only because we are missing December 2009 data).

I did also plot these data by day of the month - searches are more slightly popular towards the beginning of a month, but I would attribute that to the two main US holidays, Thanksgiving and Christmas, both occurring the later parts of the month.

How else could you summarize these data?

No comments:

Post a Comment

Thank you for taking the time to read this blog and commenting.

ShareThis