Ice cream sales and drowning deaths rise together every summer. Does ice cream cause drowning? No — hot weather drives both. Correlation spots a pattern; it never proves a cause.
Correlation measures how strongly two variables move together, from −1 (perfect opposite) through 0 (no relationship) to +1 (perfect same-direction).
Science, economics, medicine, machine learning, marketing — finding which variables relate is step one of almost every data investigation. Confusing it with causation is the cardinal sin.
y ≈ -0.93x + 8.71
r runs from −1 (perfect down) through 0 (no link) to +1 (perfect up). Correlation isn't causation!
Reading the correlation coefficient (r)
- r near +1 — strong positive: as one goes up, so does the other.
- r near −1 — strong negative: as one goes up, the other goes down.
- r near 0 — little or no *linear* relationship.
- r ≠ causation — ever.
Hours studied vs exam score gives r = 0.82. Hours of TV vs exam score gives r = −0.65. What do these mean?
Name a famous spurious correlation.
US cheese consumption correlates ~0.95 with the number of people who died tangled in their bedsheets. Pure coincidence — with enough variables, some will line up by chance.
Correlation ≠ causation. Correlation ≠ causation. Correlation ≠ causation. Also: r only measures *linear* relationships. A perfect U-shape can have r ≈ 0 despite a tight relationship.
To claim causation you need more: a controlled experiment, a plausible mechanism, ruling out lurking variables, and consistent replication.
- Correlation r ∈ [−1, +1]: sign = direction, magnitude = strength.
- Measures linear association only.
- Never implies causation on its own.