Foundation of Data Science: Unit III: Describing Relationships

Regression Towards the Mean

Data Science

Regression toward the mean refers to a tendency for scores, particularly extreme scores, to shrink toward the mean.

Regression Towards the Mean

• Regression toward the mean refers to a tendency for scores, particularly extreme scores, to shrink toward the mean. Regression toward the mean appears among subsets of extreme observations for a wide variety of distributions.

• The rule goes that, in any series with complex phenomena that are dependent on many variables, where chance is involved, extreme outcomes tend to be followed by more moderate ones.

• The effects of regression to the mean can frequently be observed in sports, where the effect causes plenty of unjustified speculations.

• It basically states that if a variable is extreme the first time we measure it, it will be closer to the average the next time we measure it. In technical terms, it describes how a random variable that is outside the norm eventually tends to return to the norm.

• For example, our odds of winning on a slot machine stay the same. We might hit a "winning streak" which is, technically speaking, a set of random variables outside the norm. But play the machine long enough and the random variables will regress to the mean (i.e. "return to normal") and we shall end up losing.

• Consider a sample taken from a population. The value of the variable will be some distance from the mean. For instance, we could take a sample of people, it could be just one measure their heights and then determine the average height of the sample. This value will be some distance away from the average height of the entire population of people, though the distance might be zero.

• Regression to the mean usually happens because of sampling error. A good sampling technique is to randomly sample from the population. If we asymmetrically sampled, then results may be abnormally high or low for the average and therefore would regress back to the mean. Regression to the mean can also happen because we take a very small, unrepresentative sample.

Regression fallacy

• Regression fallacy assumes that a situation has returned to normal due to corrective actions having been taken while the situation was abnormal. It does not take into consideration normal fluctuations.

• An example of this could be a business program failing and causing problems which is then cancelled. The return to "normal", which might be somewhat different from the original situation or a situation of "new normal" could fall into the category of regression fallacy. This is considered an informal fallacy.

Foundation of Data Science: Unit III: Describing Relationships : Tag: : Data Science - Regression Towards the Mean