SAMPLE SIZE CALCULATIONS
SIMPLIFIED
Controlled Trials
Most sample size calculations involve estimating the number of observations needed to compare two means by using Student's t test for independent samples or two proportions by using Pearson's chi-square test. Standard practice is to determine the sample size that gives an 80% chance of rejecting the hypothesis of no difference at the 0.05 level of significance.
Two Means
The sample size estimate depends on the difference between means and the within-group variability of individual measurements. A formula for the approximate per group sample size is
where 'd' is the expected difference between means and 's' is the within-group standard deviation of the individual measurements. For example, if the difference between means is expected to be 18 mg/dl and the within-group standard deviation is 30 mg/dl, the required sample size is approximately 46 (= 16 30²/18² + 1) per group. (The exact answer is 45.)
Many Means
Sometimes a study involves the comparison of many treatments. The statistical methods are discussed in detail under Analysis of Variance (ANOVA). Historically, the analysis of many groups begins by asking whether all means are the same. There are formulas for calculating the sample size necessary to reject this hypothesis according to the particular configuration of population means the researchers expect to encounter. These formulas are usually a bad way to choose a sample size because the purpose of the experiment is rarely (never?) to see whether all means are the same. Rather, it is to catalogue the differences. The sample size that may be adequate to demonstrate that the population means are not all the same may be inadequate to demonstrate exactly where the differences occur.
When many means are compared, statisticians worry about the problem of multiple comparisions, that is, the possiblity that some comparison may be call statistically significant simply because so many comparisons were performed. Common sense says that if there are no differences among the treatments but six comparisons are performed, then the chance that something reaches the level of statistical significance is a lot greater than 0.05. There are special statistical techniques such as Tukey's Honestly Significant Differences (HSD) that adjust for multiple comparisons, but there are no easily accessbile formulas or computer programs for basing sample size calculations on them. Instead, sample sizes are calculated by using a Bonferroni adjustment to the size of the test, that is, the nominal size of the test is divided by the number of comparisons that will be performed. When there are three means, there are three possible comparisons (AB,AC,BC). When there are four means, there are six possible comparisons (AB,AC,AD,BC,BD,CD), and so on. Thus, when three means are to be compared at the 0.05 level, the two-group sample size formula is used, but the size of each individual comparison is taken to be 0.05/3 (=0.0167). When four means are compared, the size of the test is 0.05/6 (=0.0083). The approximate per group sample size when three means are compared at the 0.05 level is
while for four means it is
Comparing Changes
Often, the measurement is change (change in cholesterol level, for example). Estimating the difference in mean change is usually not a problem. Typically, one group has an expected change of 0 while the other has an expected change determined by the expected effectiveness of the treatment.
When the measurement is change, sample size formulas require the within-group standard deviation of individual changes. Often, it is often unavailable. However, the within-group standard deviation of a set of individual measurements at one time point is usually larger than the standard deviation of change and, if used in its place, will produce a conservative (larger than necessary) sample size estimate. The major drawback is that the cross-sectional standard deviation may be so much larger than the standard deviation of change that the resulting estimate my be useless for planning purposes. The hope is that the study is will prove feasible even with this inflated sample size estimate.
For example, suppose the primary response in a comparative trial is change in ADL score (activities of daily living). It is expected that one group will show no change while another group will show an increase of 0.6 units. There are no data reporting the standard deviation of change in ADL score over a period comparable to the length of the study, but it has been reported in a cross-sectional study that ADL scores had a standard deviation of 1.5 units. Using the standard deviation of the cross-section in place of the unknown standard deviation of change gives a sample size of 101 ( =1.5²/0.6² + 1) per group.
Two Proportions
The appended chart gives the per group sample size needed to compare proportions. The expected proportions for the two groups are located on the row and column margins of the table and the sample size is obtained from corresponding table entry. For example, if it is felt that the proportion will be 0.15 in one group and 0.25 in the other, 270 subjects per group are needed to have an 80% chance of rejecting the hypothesis of no difference at the 0.05 level.
Points to Consider
The calculations themselves are straightforward. A statistician reviewing sample size estimates will have two concerns: (1) Are the estimates of within-group variability valid, and (2) are the anticipated effects biologically plausible? If I were the reviewer, I would seek the opinion of a subject matter specialist. As long as you can say to yourself that you would not question the estimates had they been presented to you by someone else, outside reviewers will probably not find fault with them, either.
Per group sample size required for an 80% chance of rejecting the hypothesis of equal proportions at the 0.05 level of significance when the true proportions are as specified by the row and column labels
0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 ------------------------------------------------------------- 0.05 | 0 474 160 88 59 43 34 27 22 19 0.10 | 474 0 726 219 113 72 51 38 30 25 0.15 | 160 726 0 945 270 134 83 57 42 33 0.20 | 88 219 945 0 1134 313 151 91 62 45 0.25 | 59 113 270 1134 0 1291 349 165 98 66 0.30 | 43 72 134 313 1291 0 1417 376 176 103 0.35 | 34 51 83 151 349 1417 0 1511 396 183 0.40 | 27 38 57 91 165 376 1511 0 1574 408 0.45 | 22 30 42 62 98 176 396 1574 0 1605 0.50 | 19 25 33 45 66 103 183 408 1605 0 0.55 | 16 20 26 35 48 68 106 186 412 1605 0.60 | 14 17 22 28 36 49 70 107 186 408 0.65 | 12 15 18 22 28 37 49 70 106 183 0.70 | 11 13 15 19 23 29 37 49 68 103 0.75 | 9 11 13 16 19 23 28 36 48 66 0.80 | 8 10 11 13 16 19 22 28 35 45 0.85 | 7 9 10 11 13 15 18 22 26 33 0.90 | 7 8 9 10 11 13 15 17 20 25 0.95 | 6 7 7 8 9 11 12 14 16 19 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 ------------------------------------------------------------- 0.05 | 19 16 14 12 11 9 8 7 7 6 0.10 | 25 20 17 15 13 11 10 9 8 7 0.15 | 33 26 22 18 15 13 11 10 9 7 0.20 | 45 35 28 22 19 16 13 11 10 8 0.25 | 66 48 36 28 23 19 16 13 11 9 0.30 | 103 68 49 37 29 23 19 15 13 11 0.35 | 183 106 70 49 37 28 22 18 15 12 0.40 | 408 186 107 70 49 36 28 22 17 14 0.45 | 1605 412 186 106 68 48 35 26 20 16 0.50 | 0 1605 408 183 103 66 45 33 25 19 0.55 | 1605 0 1574 396 176 98 62 42 30 22 0.60 | 408 1574 0 1511 376 165 91 57 38 27 0.65 | 183 396 1511 0 1417 349 151 83 51 34 0.70 | 103 176 376 1417 0 1291 313 134 72 43 0.75 | 66 98 165 349 1291 0 1134 270 113 59 0.80 | 45 62 91 151 313 1134 0 945 219 88 0.85 | 33 42 57 83 134 270 945 0 726 160 0.90 | 25 30 38 51 72 113 219 726 0 474 0.95 | 19 22 27 34 43 59 88 160 474 0
As of February 15, 2005, some useful sample size calculators for a wide range of situations may be found at
[back to LHSP]