Standard Deviation

Standard deviation is a measure of statistical dispersion. It’s the square root of variance. We take the square root because then we can express it in the same unit as the data.

We commonly use it to measure confidence in statistical conclusions- we’re talking margin of error. Only effects that fall outside the standard deviation are considered statistically significant- so the standard deviation is the sort of expected fluctuation we’re gonna get from conducting the same poll over and over again.

Here’s an example of two sample populations with the same mean, but different standard deviations. Both have mean 100, but red has SD 10, blue has SD 50.

So a bigger SD means more “spread out” data. So you see, when someone tells you “the average is 100”, you could get several completely different pictures- you actually ought to know what the spread is. If everybody was taught statistics, we’d ask to hear the spread of data a lot more. It gives us a much clearer picture of a statistic. Next time somebody says “the average is ____”, ask them “what’s the spread like?” (If you want to be geeky you can ask the standard deviation.)

Standard deviation represents deviation from the mean. The average.

mathrm{confidence} = frac{mathrm{signal}}{mathrm{noise}} times sqrt{mathrm{sample size}}.

Parameter Parameter increases Parameter decreases
Noise Confidence decreases Confidence increases
Signal Confidence increases Confidence decreases
Sample size Confidence increases Confidence decreases

 

Interesting:

68% and 95% rules
Rule of thumb that holds for many lists (but not all):
– About 68% (2 out of 3) of the entries on a list are within 1 SD from the average. The other 32% are farther away.
–  About 95% (19 out of 20) of the entries on a list are within 2 SDs from the average. The other 5% are farther away.

This simple picture has helped me understand binomial and normal distributions far quicker and more effectively than any written explanation. It makes sense why you can use a normal distribution to approximate a binomial distribution if is sufficiently high, and it explains how using a continuity correction adds to the accuracy of the approximation.