are you familiar with this ” our company pays averagely the same as our competitor?”
Regardless the use of “average” in above sentence is intentional or simply out of misunderstanding, it is wrong to use only it for representing a population or a group of data.
I’ve been amazed that until today, many people is still using just “average” or “mean” as the representation of a set of data. And more people just take it without question. With the widespread use of spreadsheet, we have to be better than that.
“Average” could be misleading, because it does not tell us the distribution of the data or population. I’ll show you why.
Imagine there are two company: Company A and Company B, each has 10 employees (to make it simple). In below table, I list the salary of each employee. NOW, you can see that the AVERAGE of employees salary for Company A equals to Company B. But, you know they are NOT the SAME, don’t you?
When the sample is only 10 data points, you can see with your eyes that there are difference. But if the data points are more than 200, you need other tool.
In this case, standard devation can help.
Standard deviation shows the variation of data in one group. For instance, in Company B, there is one employee (maybe the CEO) who has very high salary while there are several employees are paid less than 4; it means, the variance is high–> shown by the higher standard deviation vs. Company A.
Hence, while the average salary is the same, Company B does not pay employees similar to Company B.
In summary, whenever you hear someone tell you that average X equals to average Y, you need to ask “what is the standard deviation?”. You need to see whether the variance is also similar and whether there is an outlier that drag the average up or down.
basic principle: you should know the shape/distribution of data, standard deviation and the average/median for making a simple conclusion of data