Take 1,000 people's heights and you can't list them. But sort them into bins — 150-155cm, 155-160cm, ... — and bar them up, and the shape of human height appears: a bell.
A histogram groups continuous data into intervals (bins) and shows how many values fall in each. It reveals the *shape* of a distribution — where it peaks, how it spreads, whether it's skewed.
Quality control, exam-score analysis, image processing (brightness histograms), detecting skew before choosing mean vs median.
Histogram vs bar graph
- Bars touch (no gaps) — the variable is continuous.
- X-axis is a number line of intervals, not category labels.
- Bin width matters — too wide hides detail, too narrow looks like noise.
- Area, not just height, can carry meaning if bins are unequal.
A histogram of test scores has a tall bar at 90-100 and a long thin tail down to 30. Is it skewed? Which way?
Why might a histogram look bumpy with too many bins?
With narrow bins, random fluctuations dominate — you see 'noise' instead of the underlying shape. Wider bins smooth it out. There's an art to choosing bin width.
Don't put gaps between histogram bars. Gaps mean 'separate categories' — but a histogram's intervals are continuous, so the bars must touch.
The Galton board above *is* a live histogram — each bin counts balls, and the shape that emerges is exactly what a histogram of a normal distribution looks like.
- Histogram = distribution of continuous data in bins; bars touch.
- Reveals shape: peak location, spread, skew.
- Bin width is a real choice — too few hides detail, too many shows noise.