Incomes, house prices, city sizes, file sizes — pile them up and the histogram isn't a tidy bell. It leans, with a long thin tail dragging off to one side. That lean has a name: skewness.
Skewness measures how lopsided a distribution is. Right-skewed (positive): a long tail to the right, mean > median. Left-skewed (negative): long tail to the left, mean < median. Symmetric: mean ≈ median, no skew.
Skew decides whether to report the mean or the median, whether 'normal-distribution' methods apply, and whether your data has a long-tail risk you're underestimating.
Reading the skew
- Tail points right → right-skewed (positive); mean > median. (Incomes, house prices.)
- Tail points left → left-skewed (negative); mean < median. (Exam scores when most do well.)
- No tail, symmetric → mean ≈ median ≈ mode. (Heights, measurement errors.)
- Rule of thumb: the mean is dragged toward the long tail.
A dataset has mean 50 and median 35. Is it skewed? Which way?
Why does the news report 'median home price', not 'mean home price'?
Home prices are heavily right-skewed — a handful of multimillion-pound mansions yank the mean far above what a typical house costs. The median sits at the middle house, immune to those few extremes.
Applying 'normal-distribution' rules to skewed data. The 68-95-99.7 rule, symmetric confidence intervals, mean-as-typical — all assume rough symmetry. On strongly skewed data they mislead, sometimes badly.
Memory hook: the skew direction is the direction the tail points (not where the bump is). 'Right-skewed' = tail trails off to the right, even though the hump is on the left.
- Skewness = lopsidedness; the tail's direction names it (right/positive or left/negative).
- The mean gets dragged toward the long tail — so mean > median when right-skewed.
- For skewed data, prefer the median and don't trust symmetric-normal shortcuts.