Correlation vs Causation

Correlation vs Causation

Balancing a long black between my knees I attempt a slightly late lane change on my drive to work. In my rear view mirror, the woman behind me gesticulates wildly in protest.

At the next traffic light she is still not happy, so I lock my doors, shrink into my seat and change the radio to Adele’s newest power ballad.

Suddenly, I notice the lady’s gestures match up perfectly with the song. Instead of abusing my driving, she is actually in the middle of an all out 8am karaoke jam session.


Image via : Oklahoma Driver’s Manual’s depiction of road rage

Guess What?

Humans are intrinsically wired to seek patterns and create meaning from data- we tend to hear and see the information that suits our respective narratives.

But correlation does not always equal causation, and this golden rule of statistics has important implications for those looking to gain insight from large volumes of data.

Margarine / Divorce

In Maine, USA from 2000–2009 margarine consumption and divorce rates returned the correlation of .99, implying 99% of the divorce rate variation can be accounted for by margarine consumption.

But does margarine consumption really have a bearing on divorce rates?

Maybe, but probably not.

The fact is, datasets which appear to be shadowing each other could actually be the result of coincidence- akin to my assumption about the lady’s road rage.

Take home.

Think critically, think broadly, is the whole picture available?

Next time you’re analysing a data set, think of that misunderstood Adele fan and save yourself from a mournful power ballad of wasted data potential.