*Statistical testing is not that difficult! Let Dr. Stephen P. Holden guide you through the murky territory of statistics, only to discover that it isn’t murky at all!*

So You start by calculating a statistic which is simply a measure of how far the observed is from what we might have expected. I think we tend to belabour the problem of statistical significance testing sometimes. The whole process is very intuitive.

Let me take you through a thought-experiment to show you this.

“I’m going to toss a coin 10 times. And to make it interesting, let me put this pineapple (Australian $50 note) on the table and make you this offer:

“If you guess the exact number of heads in the next ten coin tosses that I make, I will give you the $50. If you don’t guess it exactly, I get to keep my $50. Are you willing?”

So assuming that you see that participating in this gamble is a “no-brainer”, you say:

“Yes, sure.”

So what number of coin tosses will be heads? Make your guess now.

**H is for Heads**

Most people guess somewhere around five (5/10). This is of course, the “null hypothesis”, but more on that in a moment.

I now proceed to toss the coin, I catch it, I look, I call out the result. Here’s a series of 10 coin tosses that I prepared beforehand:

“Heads … Heads … Heads … Heads … Heads … Heads … Heads … Heads … Heads … Tails”

In a real version of this ‘thought-experiment’ (that I conduct in presentations), people start to laugh at about seven or eight heads. This is very important because this reflects exactly the logic underlying statistical significance testing.

Okay, so let’s break it down.

**H is for Human Intuition**

**1.**There’s a statistic which measures the observed result. It is very simple in this case, it is the number of “Heads” that come up, so let’s just call it the

**H-statistic**. (H is for heads).

**2.** The distribution of this H-statistic is known in advance: the expected result is 50% or thereabouts. So our expectation, the distribution of the H-Statistic under the “null hypothesis” is as pictured to the right.

**3.** The observed result was 9 out of 10 heads. Based on our understanding of the distribution at #2 above, we can all agree this result (9/10) is possible, but pretty improbable assuming the coin, the tosses and the calls are fair (which is our expectation under the null hypothesis). In fact, according, to the distribution (see chart), the probability of exactly 9 heads is 0.01. The probability of 9 or higher (10 heads) is 0.01 + 0.001 = 0.011. The probability of an extreme value of the H-statistic, say nine or more or one or less (for a two-tailed test) is 0.011 + 0.011 = 0.022.

**4.** The key question here is whether we take an “improbable” result (p=0.022 or less for instance) and interpret it as a “surprising” result given the expectation, or interpret it as “unlikely in this case”. The laughter at seven or eight heads attests that many think that getting 9 heads out 10 exceeds the “significance level”. In other words, they are saying, “Sure, the result is possible, but I’m going to call ‘Bullsh!'” or in a statistician’s language “statistical significance.” We reject that the result was by chance, and conclude that something else was going on.