Group A averaged 72, group B averaged 75. Is B really better — or did random luck just hand them a few good days? The t-test is how statisticians answer 'is this difference real?'
A t-test checks whether the difference between two means (or between a sample mean and a target) is bigger than you'd expect from random chance alone — especially with small samples where the normal distribution isn't quite right.
A/B testing, clinical trials, psychology experiments, quality comparisons, before/after studies — the t-test is the everyday workhorse for 'did the change actually do something?'
A t-test comparing two teaching methods gives p = 0.03. What's the right interpretation?
The logic in four steps
- Null hypothesis — assume there's no real difference.
- t-statistic — (difference in means) ÷ (standard error of that difference). Big |t| = the gap is large relative to the noise.
- p-value — probability of a t this extreme if the null were true.
- Decide — p below your threshold (often 0.05) → reject the null; the difference is 'statistically significant'.
When do you use a t-test instead of a z-test?
Name the common flavours of t-test.
One-sample (is this sample's mean different from a known value?), two-sample / independent (do two separate groups differ?), and paired (before-vs-after on the *same* subjects — e.g. blood pressure pre- and post-treatment).
'Statistically significant' ≠ 'important'. With a huge sample, a tiny, meaningless difference can be 'significant'. And p > 0.05 doesn't prove no difference — it just means you couldn't detect one. Report effect sizes, not just p-values.
The t-distribution was published in 1908 by William Gosset under the pen name 'Student' — Guinness brewery, his employer, banned staff from publishing, so he hid behind a pseudonym. Hence 'Student's t-test'.
- Tests whether a difference in means is bigger than chance would produce.
- Use when samples are small and the population SD is unknown.
- p < 0.05 → 'significant', but significance ≠ importance — always look at the effect size.