Announcement
What Does
Multiple Linear Regression Look Like? (Part 2)
This note considers the case of multiple linear regression with two
predictors, where one of the predictors is an indicator variable. It will be
coded 0/1 here, but these results do not depend on the the two values used.
Here, men and women are placed on a treadmill. When they can no longer
continue, duration (DUR) and maximum oxygen usage (VO2max) are recorded. The
purpose of this analysis is to predict VO2max from sex (M0F1 = 0 for males, 1
for females) and DUR.
When the model
VO2max = 0
+ 1 DUR
+ 2 M0F1
+
is fitted to the data, the result is
VO2max = 1.3138 + 0.0606 DUR - 3.4623 M0F1
When the data are plotted in
three dimensions, it is seen that they lie along two slices--one slice
for each of the two values of M0F1. The regression surface is once
again a flat plane. This follows from our choice of a model.
The data in each
slice can be plotted as VO2max against DUR and the two plots can be
superimposed. The two lines are the pieces of the plane corresponding to
M0F1=0 and M0F1=1. The lines are parallel because they are parallel
strips from the same flat plane. This also follow directly from the
model. The fitted equation can be rewritten conditional on the two values
of M0F1. When M0F1=0, the model is
YO2MAX = 1.3138 + 0.0606 DUR - 3.4623 * 0, or
YO2MAX = 1.3138 + 0.0606 DUR
When M0F1=1, the model is
YO2MAX = 1.3138 + 0.0606 DUR - 3.4623 * 1, or
YO2MAX = -2.1485 + 0.0606 DUR.
A more complicated
model can be fitted that does not force the lines to be parallel. This
is discussed in the note on interactions. Those
lines are fitted in the picture to the left. The test for
whether the lines are parallel has an observed significance level of
0.102. Thus, the regression coefficients are within sampling variability
of each other and the lines are within sampling variability of what one
would expect of parallel lines. In general, we like simpler models (in
keeping with Occam's Razor: Use the simplest model that is consistent
with the data.) because they are more easily described. The parallel
slopes model says that men are expected to have a VO2max 3.4623 units higher
than women who last on the treadmill for the same DURation. When the lines
are not parallel, the expected difference in VO2max between a male and female
with the same DURation depends on the value of DURation. In the picture to the
left, the expected difference in VO2max increases with DURation. However, as
already noted, there is not enough evidence to claim that this change in
difference is real.
Copyright © 2001 Gerard E.
Dallal