Scenario 1
The null and alternative hypotheses are specified as:
The specification in1 is the same as:
Although we can test with data in data analysis, the specification in equation 1 does not offer us enough information to calculate the sample size, as the alternative hypothesis lacks specific details about the treatment effect. Both the null and alternative hypotheses in equation 1 are composite. In fact, any and are potential candidates for as long as they are not equal to each other. However, the power to reject the null hypothesis depends on the true success rates in two groups if they are different. For example, and both satisfy the alternative hypothesis. We have different powers to reject the hypothesis that they have the same success rates in these two cases.
Scenario 2
The hypotheses are specified as:
where Δ is a prespecified known constant. Although both null and alternative hypotheses are still composite, the alternative hypothesis in equation 2 is much simpler than that in equation 1. It turns out that we still do not have sufficient information to determine the sample size. For example, consider the following two special cases:
Case 1. The hypotheses are:
Case 2. The hypotheses are:
In both cases, under the alternative hypothesis. However, in the following sections, we will show that the sample sizes in these two cases are different. Given the difference of success rates, usually it is much easier to reject the null hypothesis in case 1 than in case 2.
Scenario 3
The null and alternative hypotheses are:
where and Δ are prespecified constants. Without the loss of generality, we assume that Δ>0 in the following discussion. It turns out that we can uniquely determine the sample size in this case.
Sample size formula
We derive a sample size formula based on the hypotheses specified in3 using the large sample theory.4 The typical way is to first derive the asymptotic distribution of a test statistic under the null and alternative hypothesis followed by solving an equation to obtain the sample size formula (with the given significance level and power) (see, eg, Tu et al
5).
Although the treatment and control groups have the same sample size in many studies, it is unnecessary in practice. Some studies intentionally assign more patients in one group. Suppose the sample size in groups 1 and 2 are
n
and
nκ
, respectively, where
κ
is a prespecified positive constant. Group 2 has more (less) subjects than group 1 depending on if . If , the two groups have an equal sample size.
Let and denote the estimates of and , where ( ) denote the number of events of success in group 1 (equation 2). According to the central limit theorem,6
as
n
is large enough.
Under the hypothesis of , the variances of and are and , respectively. To test the null hypothesis that , we consider the following test statistics:
Then as
n
grows unbounded.
Let Φ be the distribution of standard normal distribution. For each , let be such that , that is, is the ( )th percentile of the standard normal distribution. Given the significance level
α
, we reject the hypothesis of
. Note that:
Let
We have
In most studies, or for a large sample size. Since under . Under the hypothesis that , to make the test statistic have power , we let:
Solving this equation, we obtained the required sample size in group 1:
This formula is the basis of sample size calculation based on other indices (see the next two sections).
Note that formula (4.2.2) in Chow and colleagues3 is:
The sample sizes in equations 6 and 7 are equal if and only if . If , then
n
in equation 6 is larger (smaller) than that in equation 7.
Figure 1 shows the sample size formulas equations 6 and 7 for different with . Note that in the sample size calculation of Chow and colleagues 3 they did not use the fact that in calculating the variance of under the null hypothesis.
Figure 1Sample sizes based formulas (6) and (7).