[To explain the concepts in a straightforward manner, I'm being a bit loose with my language. I am using the word similar to indicate not shown to be different statistically or within sampling variability of each other.]
When statistical program packages report the results of a multiple comparisons procedure, the output is usually in the form of a list of pairwise comparisons along with an indication whether each comparison is statistically significant. When these results are summarized for publication, standard practice is to present a table of mean with various superscripts attached and a comment such as,
This procedure is widely used. Nevertheless, at the time of this writing (November 2007; the last version was written in 2003 and, before that, March 2000!), none of the major statistical packages--SAS, SPSS, SYSTAT--provides the superscripts automatically. The analyst must deduce them from the table of P values. The one exception is the MEANS statement of SAS's GLM procedure, which can be used only when the number of observations is the same for each group or treatment. Since the computer software refuses to do the work, the analyst is left to translate the list of pairwise differences into a set of superscripts so that those not judged different from each other share a superscript while those judged different do not have a superscript in common.
By way of example, consider a set of four groups--A,B,C,D--where A was judged different from B and B was judged different from D. A brute force approach might use a different superscript for each possible comparison, eliminating those superscripts where the pair is judged significantly different. There are six possible comparisons--AB, AC, AD, BC, BD, CD--so the brute force approach would start with six superscripts
This is a true description of the differences between the groups, but it is awkward when you consider that the same set of differences can be written
In both cases, A & B do not share a superscript, nor do B & D. However, every other combination does share a superscript. The second expression is much easier to interpret because
There is a straightforward way to obtain the simpler expression. A computer program to generate the superscripts is now available. The procedure takes sets of similar treatments and divides them if they contain pairs of treatments have been shown to be different.
Example: Consider the situation described earlier: four treatments A, B, C, D where the pairwise differences A&B and B&D have been judged statistically significant.
With four treatments A,B,C,D, the initial similar set is
There are two dissimilar pairs (A,B) and (B,D).
Start with (A,B). Since ABCD contains (A,B), replace ABCD by rewriting it twice, once without A and once without B, to get
Next, consider (B,D). Since the similar set ACD does not contain BD, leave it alone. The similar set BCD does contain (B,D). Therefore, replace BCD by rewriting it twice, once without B and once without D to get
CD is eliminated because it is contained in ACD leaving
Thus, two marks/superscripts are needed. One is attached to the means of A, C, and D. The other is attached to the means of B and C.
(Never!) Attaching Superscripts To Singletons
Some researchers have attached unique superscripts to single means that are judged to be different from all other means. For example, suppose when comparing four treatment means,
I find superscripts affixed to a single mean to be the worst
kind of visual clutter. They invite the reader to look for matches that
don't exist. It's similar to reading an article that includes a symbol
indicating a footnote and being unable to find the footnote! Without such
superscripts, unique means stand unadorned and the absence of any
superscript trumpets a mean's uniqueness. For this reason, I never use
superscripts that would be attached to only one mean.