Consider a randomized, controlled experiment in which measurements are made before and after treatment.
One way to analyze the data is by comparing the treatments with respect to their post-test measurements. [figure]
Even though subjects are assigned to treatment at random, there may be some concern that any difference in the post-test measurements might be due a failure in the randomization. Perhaps the groups differed in their pre-test measurements.* [figure]
One way around the problem is to compare the groups on differences between post-test and pretest, sometimes called change scores or gain scores. [figure] The test can be carried out in a number of equivalent ways:
However, there is another approach that could be used--analysis of covariance, in which
The problem was first stated by Lord (1967: Psych. Bull., 68, 304-305) in terms of a dietician who measures students' weight at the start and end of the school year to determine sex differences in the effects of the diet provided in the university's dining halls. The data are brought to two statisticians. The first, analyzing the differences (weight changes), claims there is no difference in weight gain between men and women. The second, using analysis of covariance, finds a difference in weight gain. Lord's conclusion was far from optimistic:
[W]ith the data usually available for such studies, there is simply no logical or statistical procedure that can be counted on to make proper allowances for uncontrolled pre-existing differences between groups. The researcher wants to know how the groups would have compared if there had been no pre-existing uncontrolled differences. The usual research study of this type is attempting to answer a question that simply cannot be answered in any rigorous way on the basis of available data.
Lord was wrong. His confusion is evident in the phrase, "controlling for pre-existing conditions." The two procedures, t-test and ANCOVA, test different hypotheses! For Lord's problem,
Campbell and Erlebacher have described a problem that arises in attempts to evaluate gains due to compensatory education in lower-class populations.
Because randomization is considered impractical, the investigators seek a control group among children who are not enrolled in the compensatory program. Unfortunately, such children tend to be from somewhat higher social-class populations and tend to have relatively greater educational resources. If a technique such as analysis of covariance, blocking, or matching (on initial ability) is used to create treatment and control groups, the posttest scores will regress toward their population means and spuriously cause the compensatory program to appear ineffective or even harmful. Such results may be dangerously misleading if they are permitted to influence education policy. [Bock, p. 496]
Now, consider a case where two teaching methods are being compared in a randomized trial. Since subjects are randomized to method, we should be asking the question, "Are subjects with the same initial value expected to have the same final value irrespective of method?" Even if there is an imbalance in the initial values, the final values should nevertheless follow the regression line of POST on PRE. A test for a treatment effect, then, would involve fitting separate regression lines with common slope and testing for different intercepts. But this is just the analysis of covariance.
If the measurements are highly correlated so that the common regression slope is near 1, ANCOVA and t-tests will be nearly identical.
The analysis could be taken one step further to see whether the ANCOVA lines are parallel. If not, then the treatment effect is not constant. It varies with the initial value. This should be reported. There may be a range of covariate values within which the two groups have not been shown to be significantly different. The Johnson-Neyman technique can be used to identify them.
-----------------
* This is actually a thorny problem. It is generally a
bad idea to adjust for baseline values solely on the basis of a
significance test.
However, there is a good reason, other than imbalance in the initial
values, for taking the initial values into account. In most studies
involving people, analyses that involve the initial values are typically
more powerful because they eliminate much of the between-subject
variability from the treatment comparison.