Correlation and
Regression
Correlation and regression are intimately related. The sample correlation coefficient between X and Y is
When Y is regressed on X, the regression coefficient of X is
Therefore, the regression coefficient is the correlation coefficent multiplied by the ratio of the standard deviations.
Since the ratio of standard deviatons is always positive, testing whether the population regression coefficient is 0 is equivalent to testing whether the population correlation coefficient is 0. That is, the test of H0: 1 = 0 is equivalent to the test of H0: = 0.
While correlation and regression are intimately related, they are not
equivalent. The regression equation can be estimated whenever the Y
values result from random sampling. The Xs can result from random
sampling or they can be specified by the investigator. For example, crop
yield can be regressed on the amount of water crops are given regardless
of whether the water is rainfall (random) or the result of turning on an
irrigation system (by design). The correlation coefficient is a
characteristic of the joint distribution of X and Y. In order to
estimate the correlation coefficient, both variables must be the result
of random sampling. It makes sense to talk about the correlation between
yield and rainfall, but it does not make sense to talk about the
correlation between yield and amounts of water under the researcher
control. This latter correlation will vary according to the specific
amounts used in the study. In general, the correlation coefficient will
increase or decrease along with the range of the values of the predictor.