Regression to the Mean
There is a lot of confusion over the concept of regression to the mean, so I thought I would try to explain what it is, and what it isn't. Let's start with a very simple example. Suppose you have 10 6-sided dice. Assume that these dice are all fair, so that the expected value on any given roll is 3.5. You roll each die once. Then you select the 5 dice that rolled the highest and set them aside. Probably the average of the top 5 dice is higher than 3.5, simply because you selected the highest rollers. Now roll those 5 dice a second time. Probably the average of the second roll is lower than the first: something closer to 3.5. That is regression to the mean. If you select a sample based on the measurement of a random variable X, the value of X within the sample is a biased estimator of X. Future measurements of X will tend to "regress" to the mean of X. No physical process is involved in this. It is simply the removal of sample bias by a second measurement or experime