In this episode of Normal Curves, Regina and Kristin

unpack what a p-value really is,

explain why researchers have fought about it for a century,

and reveal how that famous 0.05 cutoff became enshrined in science.

Along the way, they share stories from their own papers

from a Nature feature that helped reshape the debate

to a statistical sleuthing project that uncovered a faulty method in sports science.

The result: a behind-the-scenes look at how one statistical tool has shaped the culture of science itself.

How it starts

Kristin: First let’s set up what a p-value really is

Regina: 🥳

Kristin: The big picture is that p-values are a powerful tool for helping us separate signals from noise

We need that because most data are noisy

Lots of random fluctuation

Regina: And our brains love to see patterns in noise!

That’s how we’re wired

Kristin: Without p-values, it’s really easy to get fooled by noise

Regina: But omg the definition of a p-value is so technical and unsatisfying

Kristin: That might be the understatement of the podcast

Regina: Let’s do one of our p-value examples!

How about that one from our Alcohol episode

Where I asked you to read my mind and tell me the number between 1 and 20 that I was thinking of

Kristin:

I failed miserably! It took me six tries to get the number!

Regina:

We decided no one was going to mistake you for being psychic

Not a very impressive psychic, anyway

Kristin: 😛

I calculated the p-value for that little experiment

It comes out to 30%

p = 0.30

Regina: Kristin ≠ psychic

Kristin: My performance was totally consistent with the hypothesis that I have NO psychic powers!

Regina: Let’s back up

And explain how we got all that . . .

How We Calculated the P-Value

To calculate the p-value in this little mind-reading experiment, we start with what’s called a null hypothesis — the skeptic’s world, where nothing interesting is happening.

It’s the world of no effect, the one you’re trying to knock down.

TipNull Distribution Setup

Here, the null hypothesis was simple:

Kristin is not psychic. She was just guessing.

Then we ask:

If that’s really true, and we could repeat the experiment again and again, what kinds of outcomes would we expect?

That’s the thought experiment that gives birth to a p-value — a number that tells us how surprising your result would be if the skeptic’s world were real.

Note: this is the frequentist perspective. It treats probability as how often something would happen in the long run — in other words, as frequencies.

You might even picture this long run: Regina and Kristin in Kristin’s sunny California backyard, tossing the ball for Nibbles the Corgi and playing a thousand rounds of “Read Regina’s Mind.”

But to save time, we can use a computer to simulate those thousand virtual games under the null hypothesis — where Regina thinks of a number between 1 and 20, and Kristin keeps guessing until she gets it right.

On average, Kristin guesses correctly in about ten tries, which makes sense since there are twenty numbers to choose from.

TipEvidence Against the Null Hypothesis

But the important question is this: In this imaginary skeptic’s world, how does Kristin’s real performance — six tries — compare to what is typically seen?

When we look at the simulated games, we can see how often Kristin guesses correctly in six or fewer tries.

In about 30% of the tries, Kristin guesses the number in six or fewer tries.

That’s our p-value: 30%, or 0.30.

TipNull Hypothesis Setup

Now for the technical definition: The p-value is the probability, if the null hypothesis were true, of seeing a result at least as surprising as the one we actually observed.

Here, that means 30% of the time, Kristin would get Regina’s number in six tries or fewer if she really had no psychic powers.

The smaller the p-value, the more surprising the result.

The larger the p-value, the less surprising it is.

So a big p-value — like our 0.30 — simply says our observed result isn’t surprising at all.

It’s perfectly consistent with the null hypothesis of Kristin, sadly, not having any psychic powers.

Kristin:

P-values are always more fun when you bring mind-reading games to the party

Regina:

Just wait until you see the example I have about the Paul the Fortune-Telling German Octopus . . .

Kristin:

Psychic octopus?

Now this I am looking forward to