Which Predictors Are More Important?
Gerard E. Dallal, Ph.D.
When a multiple regression is fitted, it is not uncommon for someone to ask which predictors are more important. This is a reasonable question. There have been some attempts to come up with a purely statistical answer, but they are unsatisfactory. The question can be answered only in the context of a specific research question by using subject matter knowledge.
To focus the discussion, consider the regression equation for predicting HDL cholesterol presented earlier.
The REG Procedure Dependent Variable: LHCHOL Parameter Estimates Parameter Standard Pr > Standardized Variable Estimate Error T |t| Estimate Intercept 1.16448 0.28804 4.04 <.0001 0 AGE -0.00092 0.00125 -0.74 0.4602 -0.05735 BMI -0.01205 0.00295 -4.08 <.0001 -0.35719 BLC 0.05055 0.02215 2.28 0.0239 0.17063 PRSSY -0.00041 0.00044 -0.95 0.3436 -0.09384 DIAST 0.00255 0.00103 2.47 0.0147 0.23779 GLUM -0.00046 0.00018 -2.50 0.0135 -0.18691 SKINF 0.00147 0.00183 0.81 0.4221 0.07108 LCHOL 0.31109 0.10936 2.84 0.0051 0.20611The predictors are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The regression coefficients range from 0.0004 to 0.3111 in magnitude.
One possibility is to measure the importance of a variable by the magnitude of its regression coefficient. This approach fails because the regression coefficients depend on the underlying scale of measurements. For example, the coefficient for AGE measures the expected difference in response for each year of difference in age. If age were recorded in months instead of years, the regression coefficient would be divided by 12, but surely the change in units does not change a variable's importance.
Another possibility is to measure the importance of a variable by its observed significance level (P value). However, the distinction between statistical significant and practical importance applies here, too. Even if the predictors are measured on the same scale, a small coefficient that can be estimated precisely will have a small P value, while a large coefficient that is not estimate precisely will have a large P value.
In an attempt to solve the problem of units of measurement, many regression programs provide standardized regression coefficients. Before fitting the multiple regression equation, all variables--response and predictors--are standardized by subtracting the mean and dividing by the standard deviation. The standardized regression coefficients, then, represent the change in response for a change of one standard deviation in a predictor. Some like SPSS report them automatically, labeling them "Beta" while the ordinary coefficients are labelled "B". Others, like SAS, provide them as an option and label them "Standardized Coefficient".
Advocates of standardized regression coefficients point out that the coefficients are the same regardless of a predictor's underlying scale of units. They also suggest that this removes the problem of comparing years with mm Hg since each regression coefficient represents the change in response per standard unit (one SD) change in a predictor. However, this is illusory. there is no reason why a change of one SD in one predictor should be equivalent to a change of one SD in another predictor. Some variables are easy to change--the amount of time watching television, for example. Others are more difficult--weight or cholesterol level. Others are impossible--height or age.
The answer to which variable is most important depends on the specific context and why the question is being asked. The investigator and the analyst should consider specific changes in each predictor and the effect they'd have on the response. Some predictors will not be able to be changed, regardless of their coefficients. This is not an issue if the question asks what most determines the response, but it is critical if the point of the exercise is to develop a public policy to effect a change in the response. When predictors can be modified, investigators will have to decide what changes are feasible and what changes are comparable. Cost will also enter into the discussion. For example, suppose a change in response can be obtained by either a large change in one predictor or a small change in another predictor. According to circumstances, it might prove more cost-effective to attempt the large change than the small change.