You don't drink the whole pot to check the soup — you taste a spoonful. But only if you stirred first. Sampling lets you learn about millions from a handful — as long as the handful is fair.
Sampling means studying a subset of a population to draw conclusions about the whole. Done right (randomly, representatively), a small sample is astonishingly accurate; done wrong, no sample size can save you.
Polls, quality control, medical trials, census estimates, market research, A/B testing — you almost never measure everyone, so the quality of your sample is everything.
A magazine surveys its own readers about whether people read magazines. The result will be biased because...
Good sampling methods
- Simple random — everyone has an equal chance; the gold standard.
- Stratified — split the population into groups, sample each in proportion.
- Systematic — every kth member from a list.
- Cluster — randomly pick whole groups (e.g. schools), survey everyone in them.
- Avoid: convenience samples — 'whoever's nearby' — they bake in bias.
Why was the famous 1936 'Literary Digest' poll — which predicted Landon would beat Roosevelt in a landslide — so wrong despite 2.4 million responses?
Name the main sampling biases to watch for.
Selection bias (sampling frame excludes part of the population), non-response bias (who refuses differs from who answers), voluntary-response bias (only the motivated reply — often the angry ones), and survivorship bias (you only see the survivors).
A bigger sample does not fix a biased sample. It just gives you a more precise estimate of the wrong number. Randomness and representativeness come first; size second.
Before trusting any survey, ask: *who was sampled, how were they chosen, and who didn't respond?* Those three questions catch most bad polls.
- Sampling = studying a subset to learn about the whole.
- Random + representative beats large-but-biased every time.
- Watch for selection, non-response, voluntary-response, and survivorship bias.