The standard deviation is a statistic that tells you how
tightly data are clustered around the mean. When the sizes are tightly clustered
and the distribution curve is steep, the standard deviation is small. When the
sizes are spread apart and the distribution curve is relatively flat, that tells
you that there is a relatively large standard deviation.
What is a distribution curve?
There are, as is well known, lies, damned lies and statistics.
And within statistics there is the bell curve. This is the shape of the
frequency distribution one gets when conducting measurements of just about
anything in the natural world.
It first came to prominence in the early nineteenth century
when Adolph Quetelet, the Belgian Astronomer Royal, collected data on the chest
measurements of Scottish soldiers and the heights of French soldiers, and found
that when both sets of measurements were plotted they tended to cluster in a
symmetrical shape around a mean. Or, less technically, most soldiers were in a
height range fairly close to the average. The bell curve became so ubiquitous in
measurements of natural phenomena that it was eventually christened the 'normal
distribution', and it has conditioned our thinking about statistical data ever
The example below simulates how the random distribution of
dropping balls creates a bell-curve. At first there does not seem to be any
pattern but after a few minutes the stacks conform to the superimposed curve
Randomly dropping balls create a normal distribution curve
Normal distribution curve (bell curve)
Key to graph colours above
% of Population
One standard deviation away from the mean in either direction on
the horizontal axis
Two standard deviations away from the mean
Three standard deviations away from the mean
Why is it useful? Smaller standard deviations reflect more clustered
data. More clustered data means less extreme values. A data set with less
extreme values has a more reliable mean. The standard deviation is therefore a
good measure of the reliability of the mean value. The formula is as follows:
Is there an easy way to calculate it?
The Microsoft Excel programmewill calculate the standard deviation
and mean for a set of data listed in a spreadsheet column.
List data set in a single column
Click on the empty cell below the last data item
Open INSERT menu > FUNCTION > STDEV > click OK
The standard deviation is then shown and will appear in the empty cell.
The excel screen example below is for a data set of 3 items
What are its weaknesses?
The standard deviation does not take
into account how close together the means are between two sets of data. The
spread of data at two sample sites could have very similar standard deviations
but very different means. The animation below helps illustrate this. The mean
and standard deviation can both be changed by using the slider.
Which test can be used to show
how close together are the means of 2 sets of data?