Confusing Stats Terms Explained: Residual

When I hear the word "residual", the pulp left over after I drink my orange juice pops into my brain, or perhaps the film left on the car after a heavy rain. However, when my regression model spits out an estimate of my model's residual, I'm fairly confident it isn't referring to OJ or automobile gunk...right?  Not so fast, that imagery is more similar to it's statistical meaning than you might initially think.

In statistics, a residual refers to the amount of variability in a dependent variable (DV) that is "left over" after accounting for the variability explained by the predictors in your analysis (often a regression). Right about now you are probably thinking: "this guy likes the word "variability" way too much, he should buy a thesaurus already!"

Let me try again: when you include predictors (independent variables) in a regression, you are making a guess (or prediction) that they are associated with the DV; a residual is a numeric value for how much you were wrong with that prediction. The lower the residual, the more accurate the the predictions in your regression are, indicating your IVs are related to (predictive of) the DV.

Keep in mind that each person in your sample will have their own residual score. This is because a regression model provided a "predicted value" for every individual, which is estimated from the values of the IVs of the regression. Each person's residual score is the difference between their predicted score (determined by the values of the IV's) and the actual observed score of your DV by that individual. That "left-over" value is a residual.

Like the imagery of the orange pulp, a statistical residual is simply what's left over from your regression model. They can be used for many things, such as estimating accuracy of your model and checking assumptions, but that is a chat for another time...