A Controversial Issue in Clinical Investigation for Rare Disease Drug Development
Shein-Chung Chow, PhD, Duke University School of Medicine
For clinical evaluation of a test treatment under investigation, the use of p-value (current regulatory standard) may not be feasible, especially for rare disease drug development. Alternatively, one may consider a confidence interval approach based on precision analysis which will be able to address the limitations/concerns of p-value.

For approval of a test drug product under investigation, the United States (US) Food and Drug Administration (FDA) requires that substantial evidence regarding safety and efficacy of the test treatment be provided. The substantial evidence can only be obtained through the conduct of adequate and well-controlled clinical trials (21CFR514.4). For rare disease drug development, FDA (2023) indicates that same regulatory standards will be applied as compared to those drug products with normal conditions. In practice, however, it is very difficult for rare disease drug product to meet regulatory requirements (e.g., p-value less than 0.05 with a power greater than 80%) because (i) rare disease has a relatively small patient population, i.e., only a limited number of subjects available and (ii) clinical outcomes usually associated with a large variability. As a result, for rare disease drug development, it is a dilemma to meet same regulatory standard that p-value is less than 0.05 with an 80% power due to small patient population available. In recent years, this controversial issue has received much attention among the pharmaceutical industry, academia, and regulatory agencies (NASEM, 2024; Chow et al., 2024).
For review and approval of drug products, current regulatory process relies on hypothesis testing of a null hypothesis (H_0) that the test drug is not effective (safe) versus an alternative hypothesis (H_a) that the test drug is effective (safe). Most regulatory agencies such as US FDA and European Union (EU) European Medicine Agency (EMA) would reject the null hypothesis and in favor of the alternative if the observed p-value is less than 0.05 provided that the statistical test can achieve at least 80% power. As a result, under the framework of hypothesis testing procedure, “p-value less than 0.05 with at least 80% power” has become a gold standard for review and approval of drug products under investigation.
For rare disease drug development, however, the feasibility and acceptability of this hypothesis testing procedure, has been questioned/challenged. As an example, under the hypothesis testing procedure, the following concerns have been raised. First, an adequate and well-controlled rare disease clinical trial may require a much larger sample size due to large variability associated with the response (or heterogeneity across different disease subtypes). Second, it may result in insufficient power due to small patient population available. Third, the use of p-value does not reflect the sample size and variability associated with the response (clinical outcome) of the intended clinical trial (NASEM, 2024). Most importantly, the observed p-value may not be reproducible if the same study were to be repeatedly conducted under the same/similar experimental conditions. Thus, under the hypothesis testing framework, the feasibility, acceptability and validity of the use of p-value for clinical investigation is questionable.
To stay away from “p-value is less than 0.05 with at least 80% power”, alternatively, we propose to consider a confidence interval approach based on precision analysis for clinical investigation of the test drug under study. Under the confidence interval approach, regulatory decision rule will depend upon whether the constructed confidence interval falls within a desirable margin (regulatory standard) of precision for clinical evaluation of the test product under investigation. Unlike the hypothesis testing procedure, the confidence interval approach is able to address the issue of reproducibility.
In the next section, the method of hypothesis testing (i.e., power analysis by controlling type I error rate) versus confidence interval approach (i.e., precision analysis) including the corresponding regulatory decision rule, feasibility, and acceptability will be briefly described respectively. Section 3 introduces the use of confidence interval approach for evaluation of safety and effectiveness of a test drug treating under study. Some concluding remarks are given in the last section of this article.
Hypothesis Testing for Clinical Investigation
Current regulatory review and approval process adopts hypothesis testing procedure for clinical investigation of a test drug under study. That is, the hypothesis testing procedure will be employed to test a null hypothesis (H_0) that the test drug is not effective against an alternative hypothesis (H_a) that the test drug is effective. The hypothesis testing procedure is to reject H_0 and in favor of H_a with a desired testing power (say 80%) at a pre-specified level of significance (say 5%).
Regulatory Decision Rule – Under the hypothesis testing framework, if the observed p-value is less than 0.05, then we claim that the test drug product is effective provided that (i) there is no safety/tolerability concern and (ii) there is at least 80% power for correctly detecting the anticipated clinically meaningful difference. This regulatory decision rule has become a gold standard for most regulatory agencies worldwide including EU EMA and US FDA.
Limitations of Hypothesis Testing – In practice, clinical investigation for evaluation of safety and efficacy of a test product under study based on hypothesis testing procedure is widely accepted worldwide. However, the following limitations/concerns have been raised. First, a pre-study power calculation if often performed based on a single primary study endpoint. A wrongly selected primary endpoint may result in a negative study (i.e., the observed p-value is greater than 0.05). Thus, endpoint selection has a great impact on the success of the planned clinical study if the hypothesis testing procedure were used. Second, if co-primary endpoints or multiple-endpoints is used, the pre-study power calculation could lead to a much larger sample size regardless an α adjustment for multiplicity is done. Consequently, the probability of success of the intended clinical trial may decrease in the sense that (i) there is insufficient power (i.e., power is less than 80%) and (ii) the trial is unable to achieve statistical significance after α-adjustment for multiplicity. Third, the use of the observed p-value (i.e., the degree of substantial evidence provided) may be misleading because it does not reflect sample size and variability associated with the response (clinical outcome). In addition, the observed p-value may not be reproducible in the sense that if we repeat the same trial, we may not reach the same significance level. In other words, we may achieve statistical significance for the current study (e.g., p-value=0.049) but may not achieve the same statistical level (e.g., p-value=0.051) if we repeat the same trial.
Distribution of observed p-value – As indicated by Murdoch et al. (2008), the observed p-value is a random variable (see also Klammer et al., 2009) and its distribution follows a uniform distribution on [0,1 with mean of 0.5 and standard deviation of 1/√12 , which can be verified as follows.
Under the null hypothesis, suppose the test statistic T has the distribution F(t) (e.g., standard normal). It can be verified that the p-value P = F(T) has the following probability distribution
Pr(P
In other words, P is uniformly distributed. This holds as long as F(⋅) is invertible, a necessary condition of which is that T is not a discrete random variable.

Confidence Interval Approach
For development of rare disease drug (or drug products with extremely low incidence rates), the typical hypothesis testing procedure may not be feasible due to the reasons that (i) small patient population often results in insufficient power and (ii) there is will large variability and/or heterogeneity across disease sub-types, which require a much larger sample size to achieve statistical significance at a pre-specified significance level (say 5%). In this section, as an alternative to the hypothesis testing procedure, we propose using the following confidence interval approach with a desired precision for clinical investigation of a test drug under investigation.
Let y_1,y_2,…,y_n be independent and identically, distributed normal random variables with mean μ and variance σ^2. When σ^2 is known, a (1 – α)100% confidence interval for μ can be obtained as
y ̅±z_(α/2)σ/√n = (L, U), where z_(α/2) is the upper (α/2)th quantile of the standard normal distribution and L and U are the lower and upper confidence bound of the (1 – α)100% confidence interval for μ. In this case, the maxi-mum error, denoted by E, in estimating the value of μ that one is willing to accept is then defined as E=|y ̅-μ|=z_(α/2) σ/√n
In practice, the maximum error E can be selected as a desirable precision for estimating μ.
Regulatory Decision Rule – Let δ be the desirable precision. Also, let L_δ and〖 U〗_δ be the lower and upper bound of the desirable precision interval for μ. That is, if the sample mean y ̅ falls within 〖(L〗_δ,U_δ), we claim that the estimate of μ (i.e., y ̅) is within the desirable precision of δ. Thus, under the confidence interval approach, the regulatory decision rule is given by (L,U)⊂〖(L〗_δ,U_δ).
If the constructed (1 – α)100% confidence interval (L, U) is totally within the precision interval of 〖(L〗_δ,U_δ), then the test drug product is claimed to be safe and effective. Note that in most cases, the hypothesis testing procedure based on p-value is operationally equivalent to that of the confidence interval approach. However, this is generally not true (see, e.g., Chow and Zheng, 2019).
Note that both methods (hypothesis testing procedure and confidence interval approach) based on a given data set from the intended clinical trial fail to address the issue of reproducibility. To address the issue of reproducibility, it is suggested that regulatory decision rule be modified as follows. Let p=P{(L,U)⊂〖(L〗_δ,U_δ) }≥p_0, where p_0 is a regulatory standard for clinical evaluation regarding the test drug under investigation. That is, regulatory decision rule (3) guarantees that there is a relatively high probability that (L,U)⊂〖(L〗_δ,U_δ), i.e., p≥p_0 (regulatory standard for acceptance).
Merits of the Confidence Interval Approach – The above proposed confidence interval approach for clinical evaluation of a test drug has the following merits. First, it can stay away from the problem of “p-value less than 0.05 with at least 80% power”. Second, it can help to address the issue of reproducibility with proper selection of the regulatory standard p_0. That is, we may modify the proposed confidence interval approach to guarantee that there is a relatively high probability that (L,U)⊂〖(L〗_δ,U_δ). That is, p=P{(L,U)⊂〖(L〗_δ,U_δ) }≥p_0, where p_0 is considered a regulatory standard for clinical evaluation regarding the efficacy (safety) of a test drug under investigation. Note that with a desired precision δ and given regulatory standard p_0, we can solve for sample size that satisfies p=P{(L,U)⊂〖(L〗_δ,U_δ) }≥p_0, where p_0 is considered a regulatory standard for clinical evaluation regarding the efficacy (safety) of a test drug under investigation.
Concluding Remarks
Hypothesis testing procedure (i.e., the use of p-value less than 0.05 with an 80% power) for clinical investigation of a test drug under study may not be feasible for rare disease drug development and/or drug products with extremely low incidence rates. Instead, the use of confidence interval in conjunction with a precision analysis may be more appropriate which cannot only stay away from the hurdle of “p-value less than 0.05 with an 80%” but also will be able address the issue of reproducibility. More importantly, the results can be extended to cover either co-primary or multiple study endpoints with minimum modification.
References
Chow, S.C., Pong, A., and Chow, S.S. (2024). Novel design and analysis of rare disease drug development. Mathematics, 12, 631. https://doi.org/10.3390/math12050631
Chow, S.C. and Zheng, J. (2019). The use of 90% CI or 95% CI in biosimilar product development – a controversial issue? Journal of Biopharmaceutical Statistics, 29, 834-844.
FDA (2023). Guidance for Industry – Rare Diseases: Considerations for Drug and Biological Products. The United States Food and Drug Administration, Silver Spring, MD.
Klammer, A. A., Park, C. Y., and Stafford Noble, W. (2009) Statistical Calibration of the SEQUEST XCorr Function. Journal of Proteome Research. 8(4): 2106–2113.
Murdoch, D, Tsai, Y, and Adcock, J (2008). P-Values are Random Variables. The American Statistician, 62, 242-245.
NASEM (2024). Regulatory processes for rare disease drugs in the United States and European Union: Flexibilities and collaborative opportunities. National Academies of Sciences, Engineering, and Medicine. The National Academies Press. Washington DC. September 12, 2024.