fooled by average #3: and 2 simple tools to avoid it

August 21, 2010

previously I posted two examples of the common misleading use of AVERAGE.

now, I will show my favorite tools (well, it’s graph) to avoid you are fooled by somebody’s using average.

In fact these two graphs are my favorite graphs (other than the pareto diagram): run chart and box plot.

Here is an imaginary story on how you can use those two graphs.

Imagine you are a sales manager, and you have 3 sales officers report to you: Ed, Jon, Lisa.  You have targeted them to sell at least $16/week.  You told them will send the best seller to Bvlgari hotel in Bali for a week of vacation with their family.

Now, by the end of the year, each of them coming to you and reported they have achieved the target. Each of them send you the full 52 weeks actual sales result (in a spreadsheet) and as they said, each of them in average has sold $16/week. Who deserve the nights at Bvlgari hotel?

Fortunately you are smarter now; you can’t be fooled by the average anymore. Not you.

First, you make a simple run chart. I.e. a simple graph showing the sales of each person over the 52 weeks.

Looking at run chart, you can see stability and trend. You can see, if you are using average, all of them has the same average at $16 (see the red bold line).

But this run chart tell you that Ed’s performance is very unstable. Lisa has an exceptional early performance, but she is getting worse and worse over time. Jon has show consistent performance and also he is getting better lately.

Next, to compare their result side by side, you make a box plot. This is a simple chart representing the distribution of each person’s performance.

This graph is strengthen your analysis on their performance.

Now you are about to pick up the phone, to tell the great news to the best sales of the year. Bulgari hotel, Bali…not bad at all…

Readers, do you know which one is the best sales based on the above two graphs?

fooled by average #2: “hey boss, we have increased productivity averagely by 50%!”

August 7, 2010

Have you ever heard anyone, your team member, or colleague, reported that he increased an important result AVERAGELY by a significant number, say 50% or more?

I’ve seen enough

That could mislead you.

Here I show an example in below graph and in an imaginary situation.

Imagine you were a leader of an important team to improve productivity. You have prepared your team and execute many actions. You tracked the performance “before” vs. “after” the improvement. Then you have the graph presented to your boss and all the executives in your company.

You showed them “Before” improvement, your productivity was AVERAGELY 10 pcs/day. “After” improvement the AVERAGE became 15 pcs/day. It is a 50% improvement! (refer to the graph).

You’re so excited to report this and your boss could not hide his happiness and you became the star of the month. Everybody’s happy and you got the award. But something looks wrong.

Now, you look at the graph closely, you know something is wrong with the 50% improvement (in this example, it’s easy to spot with naked eyes…in real world with thousands of data points, the misleading conclusion could not easily be caught).

What’s wrong?

  1. In determining the average/mean (or median or others), you need to know the shape of distribution. Average is only ok for a symmetric normal distribution.
  2. In addition to average, you need also to see the variance (through standard deviation).
  3. From the graph, we can see the “after” improvement performance has some issues that make the AVERAGE is not correct:
  • it is not stable (high variance)
  • it has a decreasing TREND. It’s only increasing in the first couple days, after that the trend is decreasing even to the lower point than before improvement
  • statistically, we need to check whether it’s normally distributed and the after improvement is really different than before.

In a nut shell, for making a representation of a set of data, at least you have to review 2 other things in addition to average (or median): the shape of distribution, and the variance of the distribution.


I deliberately put this at the end:

a box plot can help us to show several set of data side by side for graphical representation

there are several statistical tool can help us to test whether two set of data (or more) are different: could be t-test (for a pair of distribution) or analysis of variance (anova) for multiple set of data

fooled by average #1: “our company pays averagely the same as our competitor”

August 2, 2010

are you familiar with this ” our company pays averagely the same as our competitor?”

Regardless the use of “average” in above sentence is intentional or simply out of misunderstanding, it is wrong to use only it for representing a population or a group of data.

I’ve been amazed that until today, many people is still using just “average” or “mean” as the representation of a set of data. And more people just take it without question. With the widespread use of spreadsheet, we have to be better than that.

“Average” could be misleading, because it does not tell us the distribution of the data or population. I’ll show you why.

Imagine there are two company: Company A and Company B, each has 10 employees (to make it simple).  In below table, I list the salary of each  employee. NOW, you can see that the AVERAGE of employees salary for Company A equals to Company B. But, you know they are NOT the SAME, don’t you?

When the sample is only 10 data points, you can see with your eyes that there are difference. But if the data points are more than 200, you need other tool.

In this case, standard devation can help.

Standard deviation shows the variation of data in one group. For instance, in Company B, there is one employee (maybe the CEO) who has very high salary while there are several employees are paid less than 4; it means, the variance is high–> shown by the higher standard deviation vs. Company A.

Hence, while the average salary is the same, Company B does not pay employees similar to Company B.

In summary, whenever you hear someone tell you that average X equals to average Y, you need to ask “what is the standard deviation?”. You need to see whether the variance is also similar and whether there is an outlier that drag the average up or down.


basic principle: you should know the shape/distribution of data, standard deviation and the average/median for making a simple conclusion of data

working really hard? really?

June 26, 2010

Now, it’ll be obvious that pareto diagram is my favorite tool.

This I did about a year a go.

Most of the times, I thought I have been working really hard. Spending many hours in office, meeting, travelling, doing analysis and writing reports.

But many times, I did not feel right because the impact from my priority projects and activities were not as good as expected.

Hence, for the sake of curiosity (and a bit insanity), I tracked my own time spending for ALL my activities in a week. And this is the result*

Focus & act on what matters you the most is the key for winning. But, this common sense is not common practice. As you see, I spent too much on irrelevant meetings and reading/responding too many emails (anyone with me?)


* Here’s how I made the diagram:

1. I listed my top 3 priority for the week:

First: Project Tristar (not the real name), “the most important project of the century”, according to my boss.

Second: Cost Reduction project: recently the most popular project everywhere in the world, I think.

Third: Coached underperforming team members

2. Tracked the actual time of each project this week. Includes everything such as discussion, writing email, meeting. If the discussion/meeting is related to your project, put into the project allocation. Otherwise, put into irrelevant meeting, or irrelevant email.

3. Put the data into pareto diagram