Nonparametric Statistics
Gerard E. Dallal, Ph.D.
Before discussing nonparametric techniques, we should consider why the methods we usually use are called parametric. Parameters are indices. They index (or label) individual distributions within a particular family. For example, there are an infinte number of normal distributions, but each normal distribution is uniquely determined by its mean () and standard deviation (). If you specify all of the parameters (here, and ), you've specified a unique normal distribution.
Most commonly used statistical techniques are properly called parametric because they involve estimating or testing the value(s) of parameter(s)--usually, population means or proportions. It should come as no suprise, then, that nonparametric methods are procedures that work their magic without reference to specific parameters.
The precise definition of nonparametric varies slightly among authors1. You'll see the terms nonparametric and distribution-free. They have slightly different meanings, but are often used interchangeably--like arteriosclerosis and atherosclerosis.
Ranks
Many nonparametric procedures are based on ranked data. Data are ranked by ordering them from lowest to highest and assigning them, in order, the integer values from 1 to the sample size. Ties are resolved by assigning tied values the mean of the ranks they would have received if there were no ties, e.g., 117, 119, 119, 125, 128 becomes 1, 2.5, 2.5, 4, 5. (If the two 119s were not tied, they would have been assigned the ranks 2 and 3. The mean of 2 and 3 is 2.5.)
For large samples, many nonparametric techniques can be viewed as the usual normal-theory-based procedures applied to ranks. The following table contains the names of some normal-theory-based procedures and their nonparametric counterparts. For smaller sample sizes, the same statistic (or one mathematically equivalent to it) is used, but decisions regarding its significance are made by comparing the observed value to special tables of critical values2.
Some Commonly Used Statistical Tests | ||
Normal theory based test | Corresponding nonparametric test | Purpose of test |
t test for independent samples | Mann-Whitney U test; Wilcoxon rank-sum test | Compares two independent samples |
Paired t test | Wilcoxon matched pairs signed-rank test | Examines a set of differences |
Pearson correlation coefficient | Spearman rank correlation coefficient | Assesses the linear association between two variables. |
One way analysis of variance (F test) | Kruskal-Wallis analysis of variance by ranks | Compares three or more groups |
Two way analysis of variance | Friedman Two way analysis of variance | Compares groups classified by two different factors |
Some nonparametric procedures
The Wilcoxon signed rank test is used to test whether the median of a symmetric population is 0. First, the data are ranked without regard to sign. Second, the signs of the original observations are attached to their corresponding ranks. Finally, the one sample z statistic (mean / standard error of the mean) is calculated from the signed ranks. For large samples, the z statistic is compared to percentiles of the standard normal distribution. For small samples, the statistic is compared to likely results if each rank was equally likely to have a + or - sign affixed.
The Wilcoxon rank sum test (also known as the Mann-Whitney U test or the Wilcoxon-Mann-Whitney test) is used to test whether two samples are drawn from the same population. It is most appropriate when the likely alternative is that the two populations are shifted with respect to each other. The test is performed by ranking the combined data set, dividing the ranks into two sets according the group membership of the original observations, and calculating a two sample z statistic, using the pooled variance estimate. For large samples, the statistic is compared to percentiles of the standard normal distribution. For small samples, the statistic is compared to what would result if the data were combined into a single data set and assigned at random to two groups having the same number of observations as the original samples.
Spearman's rho (Spearman rank correlation coefficient) is the nonparametric analog of the usual Pearson product-moment correlation coefficent. It is calculated by converting each variable to ranks and calculating the Pearson correlation coefficient between the two sets of ranks. For small sample sizes, the observed correlation coefficient is compared to what would result if the ranks of the X- and Y-values were random permuations of the integers 1 to n (sample size).
Since these nonparametic procedures can be viewed as the usual parametric procedures applied to ranks, it is reasonable to ask what is gained by using ranks in place of the raw data.
Advantages of nonparametric procedures
(1) Nonparametric test make less stringent demands of the data. For standard parametric procedures to be valid, certain underlying conditions or assumptions must be met, particularly for smaller sample sizes. The one-sample t test, for example, requires that the observations be drawn from a normally distributed population. For two independent samples, the t test has the additional requirement that the population standard deviations be equal. If these assumptions/conditions are violated, the resulting P-values and confidence intervals may not be trustworthy3. However, normality is not required for the Wilcoxon signed rank or rank sum tests to produce valid inferences about whether the median of a symmetric population is 0 or whether two samples are drawn from the same population.
(2) Nonparametric procedures can sometimes be used to get a quick answer with little calculation.
Two of the simplest nonparametric procedures are the sign test and median test. The sign test can be used with paired data to test the hypothesis that differences are equally likely to be positive or negative, (or, equivalently, that the median difference is 0). For small samples, an exact test of whether the proportion of positives is 0.5 can be obtained by using a binomial distribution. For large samples, the test statistic is
where plus is the number of positive values and minus is the number of negative values. Under the null hypothesis that the positive and negative values are equally likely, the test statistic follows the chi-square distribution with 1 degree of freedom. Whether the sample size is small or large, the sign test provides a quick test of whether two paired treatments are equally effective simply by counting the number of times each treatment is better than the other.
Example: 15 patients given both treatments A and B to test the hypothesis that they perform equally well. If 13 patients prefer A to B and 2 patients prefer B to A, the test statistic is (13 - 2)² / (13 + 2) [= 8.07] with a corresponding P-value of 0.0045. The null hypothesis is therefore rejected.
The median test is used to test whether two samples are drawn from populations with the same median. The median of the combined data set is calculated and each original observation is classified according to its original sample (A or B) and whether it is less than or greater than the overall median. The chi-square test for homogeneity of proportions in the resulting 2-by-2 table tests whether the population medians are equal.
(3) Nonparametric methods provide an air of objectivity when there is no reliable (universally recognized) underlying scale for the original data and there is some concern that the results of standard parametric techniques would be criticized for their dependence on an artificial metric. For example, patients might be asked whether they feel extremely uncomfortable / uncomfortable / neutral / comfortable / very comfortable. What scores should be assigned to the comfort categories and how do we know whether the outcome would change dramatically with a slight change in scoring? Some of these concerns are blunted when the data are converted to ranks4.
(4) A historical appeal of rank tests is that it was easy to construct tables of exact critical values, provided there were no ties in the data. The same critical value could be used for all data sets with the same number of observations because every data set is reduced to the ranks 1,...,n. However, this advantage has been eliminated by the ready availability of personal computers5.
(5) Sometimes the data do not constitute a random sample from a larger population. The data in hand are all there are. Standard parametric techniques based on sampling from larger populations are no longer appropriate. Because there are no larger populations, there are no population parameters to estimate. Nevertheless, certain kinds of nonparametric procedures can be applied to such data by using randomization models.
From Dallal (1988):
Consider, for example, a situation in which a company's workers are assigned in haphazard fashion to work in one of two buildings. After yearly physicals are administered, it appears that workers in one building have higher lead levels in their blood. Standard sampling theory techniques are inappropriate because the workers do not represent samples from a large population--there is no large population. The randomization model, however, provides a means for carrying out statistical tests in such circumstances. The model states that if there were no influence exerted by the buildings, the lead levels of the workers in each building should be no different from what one would observe after combining all of the lead values into a single data set and dividing it in two, at random, according to the number of workers in each building. The stochastic component of the model, then, exists only in the analyst's head; it is not the result of some physical process, except insofar as the haphazard assignment of workers to buildings is truly random.
Of course, randomization tests cannot be applied blindly any more than normality can automatically be assumed when performing a t test. (Perhaps, in the lead levels example, one building's workers tend to live in urban settings while the other building's workers live in rural settings. Then the randomization model would be inappropriate.) Nevertheless, there will be many situations where the less stringent requirements of the randomization test will make it the test of choice. In the context of randomization models, randomization tests are the ONLY legitimate tests; standard parametric test are valid only as approximations to randomization tests.[6]
Disadvantages of nonparametric procedures
Such a strong case has been made for the benefits of nonparametric procedures that some might ask why parametric procedures aren't abandoned entirely in favor of nonparametric methods!
The major disadvantage of nonparametric techniques is contained in its name. Because the procedures are nonparametric, there are no parameters to describe and it becomes more difficult to make quantitative statements about the actual difference between populations. (For example, when the sign test says two treatments are different, there's no confidence interval and the test doesn't say by how much the treatments differ.) However, it is sometimes possible with the right software to compute estimates (and even confidence intervals!) for medians, differences between medians. However, the calculations are often too tedious for pencil-and-paper. A computer is required. As statistical software goes though its various iterations, such confidence intervals may become readily available, but I'm still waiting!7
The second disadvantage is that nonparametric procedures throw away information! The sign test, for example, uses only the signs of the observations. Ranks preserve information about the order of the data but discard the actual values. Because information is discarded, nonparametric procedures can never be as powerful (able to detect existing differences) as their parametric counterparts when parametric tests can be used.
How much information is lost? One answer is given by the asymptotic relative efficiency (ARE) which, loosely speaking, describes the ratio of sample sizes required (parametric to nonparametric) for a parametric procedure to have the same ability to reject a null hypothesis as the corresponding nonparametric procedure. When the underlying distributions are normal (with equal population standard deviations for the two-sample case)
Procedure | ARE |
sign test | 2/ = 0.637 |
Wilcoxon signed-rank test | 3/ = 0.955 |
median test | 2/ = 0.637 |
Wilcoxon-Mann-Whitney U test | 3/ = 0.955 |
Spearman correlation coefficient | 0.91 |
Thus, if the data come from a normally distributed population, the usual z statistic requires only 637 observations to demonstrate a difference when the sign test requires 1000. Similarly, the t test requires only 955 to the Wilcoxon signed-rank test's 1000. It has been shown that the ARE of the Wilcoxon-Mann-Whitney test is always at least 0.864, regardless of the underlying population. Many say the AREs are so close to 1 for procedures based on ranks that they are the best reason yet for using nonparametric techniques!
Other procedures
Nonparametric statistics is a field of specialization in its own right. Many procedures have not been touched upon here. These include the Kolmogorov-Smirnov test for the equality of two distribution functions, Kruskal-Wallis one-way analysis of variance, Friedman two-way analysis of variance, and the logrank test and Gehan's generalized Wilcoxon test for comparing two survival distributions. It would not be too much of an exaggeration to say that for every parametric test there is a nonparametric analogue that allows some of the assumptions of the parametric test to be relaxed. Many of these procedures are discussed in Siegel (1956), Hollander and Wolfe (1973) and Lee (1992).
Ellis et al. (1986) report in summary form the retinyl ester concentrations (mg/dl) of 9 normal individuals and 9 type V hyperlipoproteinemic individuals. Although all of the normal individuals have higher concentrations than those of the abnormals, these data are not quite barely significant at the 0.05 level according to the t test using Satterthwaite's approximation for unequal variances. But, even the lowly median test points to substantial differences between the two groups.
Type V hyper- Normal lipoproteinemic 1.4 30.9 2.5 134.6 4.6 13.6 0.0 28.9 0.0 434.1 2.9 101.7 1.9 85.1 4.0 26.5 2.0 44.8 H H H H X H XXXXX X X min--------------------max min--------------------max an H = 2 cases an X = 2 cases mean 2.1444 mean 100.0222 SD 1.5812 SD 131.7142 SEM .5271 SEM 43.9048 sample size 9 sample size 9 statistics P-value df t (separate) -2.23 .0564 8.0 t (pooled) -2.23 .0405 16 F (variances) 6938.69 .0000 8, 8 < median > median Group 1 9 0 Group 2 0 9 P-value (exact) = .0000 Wilcoxon-Mann-Whitney test: P-value = .0000 Pitman randomization test: P-value = .0000 (data * 1E 0)
Fisher and van Belle (1993, p. 306): A family of probability distributions is nonparametric if the distributions of the family cannot be conveniently characterized by a few parameters. [For example, all possible continuous distributions.] Statistical procedures that hold or are valid for a nonparametric family of distributions, are called nonparametric statistical procedures.
Bradley (1968, p. 15): The terms nonparametric and distribution-free are not synonymous . . . Popular usage, however, has equated the terms . . . Roughly speaking, a nonparametric test is test one which makes no hypothesis about the value of a parameter in a statistical density function, whereas a distribution-free test is one which makes no assumptions about the precise form of the sampled population.
Lehmann (1975, p. 58): . . . distribution-free or nonparametric, that is, free of the assumption that [the underlying distribution of the data] belongs to some parametric family of distributions.