Why SAS's PROC MIXED Can Seem So Confusing

Why SAS's PROC MIXED Can Seem So Confusing
Gerard E. Dallal, Ph.D.

[Early draft subject to change.]

[The technical details are largely a restatement of the Technical Appendix of Littell RC, Henry PR, and Ammerman CB (1998), "Statistical Analysis of Repeated Measures Data Using SAS Procedures", Journal of Animal Science, 76, 1216-1231.]

Abstract

The random and repeated statements of SAS's PROC MIXED have different roles. The random statement identifies random effects. The repeated statement specifies the structure of the within subject errors. They are not interchangeable. However, there are overspecified models that can be specified by using a random or repeated statement alone. Unfortunately, one such model is the commonly encounterd repeated measures with compound symmetry. This has the potential of leading to confusion over the proper use of the two types of statements.

The simple answer to why SAS's PROC MIXED can seem so confusing is that it's so powerful, but there's more to it than that. Early on, many guides to PROC MIXED present an example of fitting a compound symmetry model to a repeated measures study in which subjects (ID) are randomized to one of many treatments (TREAT) and then measured at multiple time points (PERIOD). The command language to analyze these data can be written

proc mixed;
  class id treat period;
  model y=treat period treat*period;
  repeat period/sub=id(treat) type=cs;

proc mixed;
class id treat period;
  model y=treat period treat*period;
  random id(treat);

Because both sets of command language produce the correct analysis, this immediately raises confusion over the roles of the repeated and random statements, In order to sort this out, the underlying mathematics must be reviewed. Once the reason for the equivalence is understood, the purposes of the repeated and random statements will be clear.

PROC MIXED is used to fit models of the form

y = Xβ + ZU + e where

y is a vector of responses
X is a known design matrix for the fixed effects
β is vector of unknown fixed-effect parameters
Z is a known design matrix for the random effects
U is vector of unknown random-effect parameters
e is a vector of (normally distributed) random errors.

The random statement identifies the random effects. The repeated statement specifies the structure of the within subject errors.

For the repeated measures example,

y_ijk = μ + α_i + γ_k + (αγ)_ik + u_ij + e_ijk where

y_ijk is response at time k for the j-th subject in the i-th group
μ, α_i, γ_k, and (αγ)_ik are fixed effects
u_ij is the random effect corresponding to the j-th subject in the i-th group
e_ijk is random error

The variance of y_ijk is

var(y_ijk) = var(u_ij + e_ijk) The variance of the u-s is typically constant (denoted σ_u²). The errors e_ijk are typically idependent of the random effects u_ij. Therefore, var(y_ijk) = σ_u² + var(e_ijk)

The covariance between any two observations is

cov(y_ijk,y_lmn) = cov(u_ij,u_lm) + cov(u_ij,e_lmn) + cov(u_lm,e_ijk) + cov(e_ijk,e_lmn) Observations from different animals are typically considered to be independent of each other. Therefore, the covariance between two observations will be 0 unless i=l and j=m, in which case cov(y_ijk,y_ijn) = cov(u_ij,u_ij) + cov(e_ijk,e_ijn)
= σ_u² + cov(e_ijk,e_ijn)

Under the assumption of compound symmetry, cov(e_ijk,e_ijn) is σ_e²+σ, for k=n, and σ_e², otherwise. It therefore follows that

var(y_ijk) = σ_u² + σ_e² + σ and cov(y_ijk,y_ijn) = σ_u² + σ_e².

The model is redundant because σ_u² and σ_e² occur only in the sum σ_u² + σ_e², so the sum σ_u² + σ_e² can be estimated, but σ_u² and σ_e² cannot be estimated individually. The command language file with the random statement resolves the redundancy by introducing the u-s into the model and treating the repeated measures as independent. The command language file with the repeated statement resolves the redundancy by removing the u-s from the model.

Littel et al. point out that a similar redundancy exists for the unstructured covariance matrix (TYPE=UN), but there is no reduncancy for an auto-regressive covariance structure (TYPE=AR1). In the latter case, both random and repeated statements should be used. See their article for additional details.

[back to LHSP]

Gerard E. Dallal

Why SAS's PROC MIXED Can Seem So Confusing Gerard E. Dallal, Ph.D.

[Early draft subject to change.]

Why SAS's PROC MIXED Can Seem So Confusing
Gerard E. Dallal, Ph.D.