3 May 2010

This blog is moving

I am switching from blogger to using wordpress as a hosted CMS. I will be redirecting this subdomain over in due course. The content and your great comments have already been moved across to: http://www.datadrivenconsulting.com/blog/

Will try to work out subscriptions etc, but anyone who wants to make the switch before I do, please go ahead..

19 April 2010

Data believability

The BBC news website has an article about how western demand for food and goods drives water shortages in developing countries. The interactive visual below is included. The problem I have with it is that the data just doesn't seem believable - I'm not saying it isn't true at all, just that when you deliver information with surprisingly high values (10,850 liters to produce a pair of jeans), you have to delve into that data to stop people instantly dismissing it. For example, how is that volume distributed over growing cotton, manufacture, and transportation? Are you including the water usage of the workers who have some involvement in the trade - is that a fair thing to include?

Without context, information becomes devalued, especially when the data seems outlandish (but may in fact be accurate).




I encountered some skepticism after writing a report concerning excess costs in U.S. shift work operations. The $206 billion annual lost opportunity seemed very high, but the methodology and charts that supported the conclusions made believers, I hope, of the listeners on NPR's Talk of the Nation, and readers of the print publications who picked up the story.

Do you have believability problems with the data in your company - is it being delivered to the decision makers effectively through dashboards and interactive reports? I can solve your data visibility problems and make your business run more efficiently and profitably. 

14 April 2010

Great dashboards: dynamic charts in Excel 2007 and tables

I am working with a client to create dashboards to summarize data. Data is appended on a weekly basis, so any charts that show trending data must either be manually updated to include the new data (not workable), or the range somehow magically updated. As we are working in Excel 2007, I have used the excellent Table functionality as the ability use table references to automatically update formulas to include the new data works very nicely.

As it turns out, any chart that is based on data within a table, even if the table functionality was added to the data after the chart was drawn, causes the chart to automatically include the new data when the table expands.

When you're looking at the chart, this isn't obvious - the series names for the data don't reflect this - they just get automatically updated with the new cell references:

=SERIES("Series name",Sheet1!$A$69:$A$79,Sheet1!$K$69:$K$79,3 increases to

=SERIES("Series name",Sheet1!$A$69:$A$80,Sheet1!$K$69:$K$80,3 ) when a new row is added.

Even if you try to manually change the series references to a table reference:

=SERIES("Series name",Sheet1!$A$69:$A$80,Sheet1!TableName[TableColumn],3), Excel recognizes the table reference, but then renames the series back again anyway.

I find this kind of behavior annoying for two reasons - I should explicitly chose if I want a chart to expand a range when I have chosen data based on cell references. Secondly I spent a while trying to use the methods you needed to in Excel 2003 to create dynamic charts, as I assumed (I know..) that the chart was not going to update because of the use of absolute cell referencing.

If you want to know more about how you can create dynamic charts without table referencing, nip over to Jon Peltier's website.

If you're intrigued by these posts on dashboards, reporting, and charts in Excel, I have the perfect one-day training for you - Jon Peltier and I are hosting a Dashboards in Excel seminar on June 2nd. For more details, see here.

29 March 2010

Dashboard seminar - mark your calendars

I'm sure most readers of my blog will know Jon Peltier's business,  and his excellent Excel/charting blog: Peltier Technical Services. Jon and I are teaming up to present a day-long training session on Excel Charting and Dashboarding. This training will take place on Wednesday, June 2, 2010, at the DoubleTree Hotel in Westborough, MA, in the Route 495 corridor near the Mass Pike.
The training balances Jon’s highly technical expertise in Excel charting and deep understanding of VBA, with my detailed understanding of dashboard design and the business decisions that drive these projects.
Tuition for the all-day, hands-on program is $300, with discounts for early registration. This fee includes meals and breaks during the day plus all course materials. Participants need to bring their own laptops.
For more details, or to register for the class, visit the Excel Charting and Dashboarding home page.

Program Outline

Morning Session

Dashboards 1 – Alex
  • Introduction
  • Data Quality
  • Dashboard Design
  • Dashboard Elements
Advanced Excel Charting 1 – Jon
  • Appropriate Chart Data
  • Chart Types
  • Formatting
  • Combination Charts
  • Dynamic and Interactive Charts

Afternoon Session

Advanced Excel Charting 2 – Jon
  • Conditional Chart Formatting
  • Error Bars
  • Custom Chart Types You Didn’t Think Excel Could Make
Dashboards 2 – Alex
  • Excel as a Dashboard Platform
  • Dealing with Excel Versions
  • Sparkline Add-ins
  • Wrap Up

Program Details

Wednesday, June 2, 2010
8 am to 5 pm
DoubleTree Hotel
5400 Computer Drive
Westborough, MA 01581
Tuition includes conference materials, food, beverages
- $300 base price
– $250 before April 15
– $275 before May 8


23 February 2010

Tableau Public. Cost of raising a child (controlling for inflation)

The Guardian newspaper in the UK often posts interesting data for its readers to mess with and comment on. The latest on the DataBlog ("where facts are sacred") is a data set showing how the cost of raising a child has increased in the UK by 43% since 2003.

The data is from an insurance company, Liverpool Victoria. In neither this data article or the main editorial is the method of data collection described. It's essential to describe this - A lack of visibility into methods, however reliable the reporting source, should quickly lead you to question the findings.

The other issue is that the costs don't seem to have been adjusted to changes in the value of currency (be that through inflation or other methods). Any time monetary values are shown on a time-axis spanning more than a few months (under normal inflation values), the values should be normalized to a single point.

This is my take on the data using Tableau Public, I have presented both the non adjusted costs, and the costs adjusted using the UK's consumer price index. The best normalization probably would be to median wage after tax, as these truly reflect the ability to pay for raising a child, but the CPI will at least give a more balanced view. You can see that the actual increase is about 22% from 2003, and that the only real contributors to this are childcare and education costs because they have increased the most above CPI, and they are the majority of the expenses. The problem with using CPI is that if you used a fine enough detail (e.g. the CPI of providing childcare), the results should, of course, be flat. This is why choosing how to deal with costs and time is far from straightforward.  



Concerning my continued engagement with Tableau Public - it took a while to get the charts how I wanted them - I'm still on an enjoyable learning curve with the new software. There are a few bugs to iron out - for example it doesn't handle null values in an expected way (treats them as zero) - maybe that's still higher up on my learning curve..

22 February 2010

What our kids are learning

Being in the data business, I tend to critique most charts or visualizations I see. I am pleased to note that not only is my son's kindergarten already pushing an understanding of data, but that so far the charts have all been of the bar variety.

While of course one could labor on the excessively strong grid lines reducing the data-ink ratio, I think I'll let it slide this time.

I do hope that pie charts aren't being reserved for 1st grade as they are "more advanced"..

If your business has data that you would like to know more about, maybe you need some bar charts hand crafted by me (or my son) with crayons, or possibly very robust tools like Tableau Public.

17 February 2010

Swivel: First impressions

It would be remiss of me not to also review Swivel, another online, chart sharing solution, similar to Many Eyes, but not as fully-fledged as Tableau Public. One thing I immediately liked were the built-in connections to data sources. Within two clicks I was able to extract data from my Google Analytics account. Swivel then creates a default report from these data - that part wasn't so great - it seemed to be a mash-up of every chart type possible with color distributed with wild abandon.



The default view notwithstanding, there are nice editing options for the charts - the tooltips and visual clues about individual data points are very good. The chart options are the standard basic types, and I feel that Swivel is a quick way to throw a bar chart or pie up onto a website. There are no ways to interact with the data to the same extent you can with Many Eyes (I'm not going to compare Swivel to Tableau - they are completely different animals). The method of embedding is much nicer than Many Eyes, making the chart look much more part of your website and not requiring Java. See below for an example (not my chart).

16 February 2010

Normalizing data: Haiti donations by country using Many Eyes

Two things to talk about in this post - I continue my ramblings about the online viz tool Many Eyes, and discuss how normalizing data can provide radically different insights into data.

The data set I'm using is the donations by countries (government and corporations, but not private) to earthquake relief in Haiti. I've seen a few charts around this showing how the US has provided the most funding, but when normalized per capita, Canada and other countries stand out. On a purely data level, and not to denigrate any country's assistance, is this normalization appropriate when the donation sums do not include donations by the public?

Instead it may be more appropriate to normalize by gross domestic product, especially as governments and corporations greatly influence GDP. However, even that's not straightforward as GDP is affected by exchange rate and does not reflect purchasing power within a country. So we could also normalize based on GDP expressed as purchasing power parity, where differences in cost of living are accounted for.

This yields grossly different results, as shown below. I've made all of the bars in each series relative to the largest in that series. Guyana's contribution of a million dollars is massive compared to its GDP expressed in either way, dwarfing other countries' equivalent contributions. This shows that normalization, which is often appropriate, should be chosen carefully and assessed fully when interpreting charts.



Now onto Many Eyes - I've kept the visualization local again, so apologies for those reading without java. I like the result - it was certainly quick to produce and you can play around a little with it. It's not perfect though - to get appropriate height bars meant messing around - I couldn't get a scale to appear on the y-axis. For simple data, especially text based, I think Many Eyes excels, but this would have worked better in Tableau Public.

What I would love to have done would be to have shown this information as cartograms where the area a country occupies on the map is relative to the data value, not land mass. This would have added a visual geographical spin on where donations were coming from, especially for those of us who may have forgotten where Guyana is exactly..

15 February 2010

Words on my blog: Many Eyes Viz

I'm playing around with Many Eyes - they have some nice text based visualizations. I'll be looking at their charting options as well. Here's a cloud of words on my blog.



I wish that it would appear a little more embedded - i.e. lose the menu at the top and not be grayed out to begin with.

12 February 2010

ER visits due to consumer products: Tableau Public

Tableau Software has just released Tableau Public - a free version of their data visualization tool that will revolutionize how we show information. To showcase its abilities I created the visualization linked below. WARNING: I wanted to push the limits of Tableau - there are 98,000 rows of data shown on the chart, so give it some time to load.

About the data: The NEISS is a database of emergency room visits that involve consumer products collected from 100 hospitals across the country. This is the latest dataset available and covers the entirety of 2008. There were about 370,000 visits in that time frame to these hospitals. As Tableau Public only allows 100,000 rows of data, I used a random function to reduce the dataset.

Using the Visualization: Shown are the 98,000 cases, plotted on the y-axis by age of the patient, and categorized by the type of injury. The lines are colored by body part affected. By default, every product involved is shown. Use the filter on the right to select just a few. For example, deselect "all", and use the magnifying glass to search for just chain saws, select that product, and allow the chart to update. If you mouse over an individual case, you can see the narrative entered by the hospital about the accident.

The chart provides some information about age spread, and the incidence of a particular type of injury, while still allowing you to look at individual cases. I intend to do a lot more with Tableau and this data set - stay tuned for some dashboards. Click the image to interact with the data.


I'm betting your company has data that you'd like to know more about. Data Driven Consulting specializes in collecting, cleaning, and presenting data just like the example above.

7 February 2010

Data visualization challenge: my dashboard design

Finally we get to the choices I made for my dashboard entry into Chandoo's data visualization challenge. The challenge already directed us to make the dashboard focused on the two year performance of the sales people. I'll break this post into the five or so parts of the (single screen) dashboard.

Easily overlooked, but vital, is the title of the dashboard - what is it, what time period does the data cover? Under the title is the most expensive part of the screen real estate - the primary information must go here. If I'm a senior manager looking for sales person information, my first questions will always be: who sold the most, how did those sales vary over my chosen time period, how much was sold compared to what was expected?

From this display we see immediately who sold the most and the least  - give the dollar values, they will be needed - the bar chart gives us information about each person's contribution to the sum. The red markers warn of poor sales performance. The sparklines provide us with time trending information, so often missed from data displays. For data that has some sort of periodicity (as sales data tends to), it can be useful to provide a moving average that better reveals overall trends - for example, the moving average is better at showing that everyone experiences a drop in sales part way through the period, but Hansolo's drop was much more abrupt than James Kirk's.

The Budget/Actual shows that only Hansolo met budget, presumably due to the recovery he experienced in the last six months. By not scaling the bars to be all 100%, we provide additional information about what the sales targets were per sales person. As this makes it difficult to compare sales to target across the sales force, the variance to budget bars clarify this.

The sparklines in the top section are scaled differently to each other - otherwise trends are hidden for the sales people with lower revenue. However it would be easy to predict that the user of the dashboard would want to see the information on one chart. The chart provides this, and an in an effort to minimize colors I added a drop down box to highlight one person compared to the other three. When the manager asks "Why was Hans Solo's performance better than the others?" this chart helps answer that.

I feel that the headlines section is an often overlooked part of a dashboard- 3D pie charts and revving speedometers are sexy, words are not. Often though, pithy statements can make a dashboard much more useful and in 20 seconds can provide you with the most important take-home messages. They are especially great in dynamic dashboards, as long as the information regularly changes.

Finally we begin to get to the other measures that perhaps (hopefully) help us understand the sales issues. The coloration on the data table (again, it is important to sometimes show values) helps us understand the areas that sales people sold in - James Kirk sold almost exclusively in the south, Luke sold across the country. The map provides this information in a slightly different way - for a given region, who sold the most?

The map also provides information about the states that are in each region - anyway that you can make a dashboard as rich as possible is great, but notice that as this is not the most important information, the region boundaries are just a thicker gray, not a highly colored boundary that detracts from the bars.


The bottom two displays are formatted in the same way, so here is just the company size visualization. The stacked bar shows the proportion of sales to each size company - Chewbacca sells to all sizes, Luke is much more focused on enterprise sales. The bars underneath show for a particular size company, how are the sales distributed - again, important, because even though Chewabacca sells to enterprise, his overall contribution to the sales for that size company is completely minimal. That's it - thank you again Chandoo, for the opportunity to create this dashboard.

If by some amazing chance you've made it all the way through this post and are still reading, I'd like to remind you that Data Driven Consulting can help your organization create actionable, strategic, highly useful dashboards and reports that will make your business more successful.

Design (in)considerations

A slideshow of some bad design examples I've been collecting. My favorites are the French translation making you find a 3.17mm drill and the chair that's not a chair..

21 January 2010

Data visualization challenge: the objective

This is the second part of designing my dashboard for the viz challenge at Chandoo.com. With the raw data ready to tell a story, it was time to understand what the objective was. Unfortunately many dashboards/data displays are so immersed in either the technology or the ability to create pretty things (3D pie charts and revving gauges) that this most important part is overlooked. From the challenge our mission was to:
"help a senior manager understand how the sales people have done in the 24 months"
There are three keys here to achieving the goal: senior - someone high up in the company who has little time to be delving into information - they want the information fast and they want to know what caused the issues they are seeing.

Sales people - they want to know about the performance of the sales people - all of the measures and charts should be centric to these sales people, or at least explain the performance of these sales people.The very first measures shown should describe the sales performance alone.

24 months - he/she is clearly interested in the time trending information - how the sales people have performed over the two year period.

Now in most situations, an objective is rarely as well defined as that - it is the job of the product owners to ensure clarity like this is reached. Is this a one-off? Who will be using it? What do they care about in this instance? How timely does the data need to be? From there you can move to the measures displayed..

20 January 2010

Data visualization challenge results, part one: addressing the data


In November I entered a data visualization challenge on Chandoo's excellent Excel and charting blog. I was honored to be voted as having the winning entry (by one vote) - I thought it would be useful to describe the steps I took in designing my entry.

Chandoo provided us with a data file containing two years of raw revenue data for four sales people with information on sales per region, product, and size of company sold to.

Chandoo created the data to show steady growth over the two years, with a little bit of randomness thrown in. As this didn't perhaps allow for an interesting story we were allowed to change the data, but not to add new columns of data (e.g. profit, expenses). Generating data to tell a story is harder than is sounds, especially when you want there to be reasons why one person wasn't performing as well. While this seems a little involved for an online competition, it was very similar to the thinking you would have to do anyway when designing dashboards -  "of the available data, what is the most important to show to the specific end-user for this situation, what can be compared to what to give insight into these data?"

Real data would usually reflect these insights, so I felt I had to make the data more real. I started with stories about the sales, for example: like most sales data, it fluctuates on an X month cycle as sales targets are set and deadlines approach, all the sales people suffered late 2008 due to the recession, some experienced a more abrupt drop in sales, some recovered quicker, Chewbacca (seriously) sells more in the East region, but across all company sizes, Luke Skywalker sells across the country, but mostly to the larger customers.

You can see what I did here (xls 2003) to create the data (second sheet, at the bottom). I created data for the regions that described an overall pattern. From there I created modifiers based on who sold what where, and placed a variable that influenced how strongly these modifiers altered the sales data. With the addition of the Index function, multiple if statements, and randomness thrown in, the data was created. It certainly isn't elegant - I'm sure that it could have been much more concise and still told the stories I wanted, but it worked. Next up: designing the dashboard for the end-user.

19 January 2010

Taming the Data Dragon

Thanks to @infoholic on Twitter for bringing this little gem to my attention: "Daddy, what do you do at work?"


ShareThis