Advertisement
Journal of Clinical Oncology  
Search for:
Limit by:
  Browse by Subject or Issue
Home Search or Browse JCO My JCO Subscriptions Customer Service Site Map

Journal of Clinical Oncology, Vol 25, No 23 (August 10), 2007: pp. 3482-3487
© 2007 American Society of Clinical Oncology.
DOI: 10.1200/JCO.2007.11.3670

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bedard, P. L.
Right arrow Articles by Tannock, I. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bedard, P. L.
Right arrow Articles by Tannock, I. F.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Statistical Power of Negative Randomized Controlled Trials Presented at American Society for Clinical Oncology Annual Meetings

Philippe L. Bedard, Monika K. Krzyzanowska, Melania Pintilie, Ian F. Tannock

From the Division of Medical Oncology and Hematology, Biostatistics, Princess Margaret Hospital and University of Toronto, Toronto, Ontario, Canada

Address reprint requests to Ian F. Tannock, MD, PhD, Division of Medical Oncology and Hematology, Princess Margaret Hospital, 610 University Ave, Toronto, Ontario, M5G 2M9, Canada; e-mail: ian.tannock{at}uhn.on.ca


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Purpose To investigate the prevalence of underpowered randomized controlled trials (RCTs) presented at American Society of Clinical Oncology (ASCO) annual meetings.

Methods We surveyed all two-arm phase III RCTs presented at ASCO annual meetings from 1995 to 2003 for which negative results were obtained. Post hoc calculations were performed using a power of 80% and an {alpha} level of .05 (two sided) to determine sample sizes required to detect small, medium, and large effect sizes. For studies reporting a proportion or time-to-event as primary end point, effect size was expressed as an odds ratio (OR) or hazard ratio (HR), respectively, with a small effect size defined as OR/HR ≥ 1.3, medium effect size defined as OR/HR ≥ 1.5, and large effect size defined as OR/HR ≥ 2.0. Logistic regression was used to identify factors associated with lack of statistical power.

Results Of 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 233 (55.1%) had adequate sample size to detect small, medium, and large effect sizes, respectively. Only 35 negative RCTs (7.1%) reported a reason for inadequate sample size. In a multivariable model, studies that were presented at oral sessions (P = .0038), multicenter studies supported by a cooperative group (P < .0001), and studies with time to event as primary outcome (P < .0001) were more likely to have adequate sample size.

Conclusion More than half of negative RCTs presented at ASCO annual meetings do not have an adequate sample to detect a medium-size treatment effect.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
New treatments in clinical oncology are accepted on the basis of efficacy or decreased toxicity demonstrated in randomized controlled trials (RCTs). Despite promising results in earlier phase II studies, many treatment regimens do not show statistically significant gains when tested in larger RCTs.1

The power of a study is its probability of detecting a clinically important effect of the experimental treatment, compared with the control arm, if a difference actually exists. If a clinical trial fails to show a statistically significant benefit in favor of the experimental treatment, an investigator may erroneously conclude that the experimental treatment is of no benefit, even if the trial did not include enough participants to demonstrate reliably a clinically meaningful effect. Previous studies of negative RCTs reported in the general medical and specialty literature show that many trials are inadequately powered to detect a meaningful difference between the arms.2,3,5-11 Such underpowered RCTs have been criticized as unethical because they expose participants to the toxicities of experimental treatments but are unable to determine whether those treatments are effective.12-15

The objective of this study was to investigate the prevalence of underpowered RCTs published in abstract form in the Proceedings of the American Society for Clinical Oncology (ASCO) Annual Meetings from 1995 to 2003, and the factors associated with lack of statistical power.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Identification of Studies
To identify a large cohort of negative clinical trials, the Proceedings of the ASCO Annual Meetings from 1995 to 2003 were reviewed to identify randomized phase III clinical trials. Superiority trials with two-group parallel design for both dichotomous and continuous primary outcomes were included. Abstracts with explicit statements of negative results were classified as negative studies. If there was no explicit statement within the text of the abstract, a study was considered negative if it did not show a statistically significant benefit (P > .05 or CI including 1.0) in favor of the experimental treatment arm for the primary outcome measure. If a study did not explicitly state its primary outcome measure, the primary outcome was considered the first end point reported in the abstract.16 Abstracts reporting preliminary results before the completion of patient accrual, phase II studies, meta-analyses, equivalence studies, overviews, pooled data from two or more studies, and secondary analyses were excluded.

Data Abstraction
Information on trial characteristics was summarized using a pretested data abstraction form. For each abstract, the following information was collected: year of meeting; cancer type; primary end point (stated or first reported); number of participants; number of participants in each group; sample size calculation (if reported); type of primary outcome (mean, proportion, time to event); event rate and standard deviation of primary outcome in the control and experimental treatment groups; cooperative group involvement; pharmaceutical sponsorship; and format of presentation at meeting (oral, poster, or published only).

Data from 30 RCTs were extracted independently by two investigators (P.L.B. and M.K.K.). After resolution of minor differences, studies were identified and data abstraction was performed on the remaining sample by one author (P.L.B).

Statistical Analysis
To assess the level of agreement between investigators, we calculated the proportion of studies for which the data abstracted were concordant.

The main outcome of this study was to determine the proportion of negative RCTs that were underpowered. Post hoc sample size calculations were performed on all abstracts included in our cohort (Appendix and Tables A1 and A2, online only). Studies were considered to be underpowered if the total number of assessable participants was less than the sample size needed to detect a prespecified difference in outcome with 80% power and an {alpha} level of .05 (two sided). The sample size calculations were performed to determine if the study was sufficiently powered to detect a small, medium, or large effect size in favor of the experimental arm, compared with the standard treatment arm. For studies reporting a mean as a primary end point, effect size was assessed as a multiple of the standard deviation (SD), with a small effect size 0.2 SD, medium effect size 0.5 SD, and a large effect size 0.8 SD.17 For studies reporting a proportion as a primary end point, effect size was expressed as an odds ratio (OR), with a small effect size OR ≥ 1.3, medium effect size OR ≥ 1.5, and a large effect size OR ≥ 2. For studies reporting time to event as a primary end point, effect size was expressed as a hazard ratio (HR), with a small effect size HR ≥ 1.3, medium effect size HR ≥ 1.5, and large effect size HR ≥ 2.0 (Appendix).

Logistic regression analysis was performed to identify factors associated with lack of power to detect a medium effect size. Factors assessed in both univariable and multivariable analyses included year of publication, cancer type, format of presentation (oral, poster, or published only), whether the primary end point was identified, and whether the study was multicenter, involved a cooperative group and/or was sponsored by the pharmaceutical industry, and the type of primary end point (mean, proportion, or time to event). All statistical analyses were carried out using SAS version 9 (SAS Institute Inc, Cary, NC).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Study Population
We identified 514 abstracts that met inclusion criteria. Twenty-two abstracts were excluded subsequently (11 interim reports, one four-arm trial, one trial with an unknown number of participants, and nine trials for which the type of outcome was unclear). Characteristics of the remaining 492 abstracts are summarized in Table 1. Abstracts describing negative clinical trials were published in a nearly uniform manner from 1995 to 2003, with the largest number published in 2001 (71 trials) and the fewest published in 2003 (48 trials). Among this cohort of negative studies, the most common tumor site was breast (24%). The median sample size in the trials was 210, with a median of 189 participants assessable for the primary end point. A primary end point was stated explicitly in only 168 trials (34%). Less than 10% of trials reported sample size considerations. A time-to-event variable was identified as the primary end point in 263 trials, with 216 trials expressing their primary result as a proportion and 13 trials expressing it as a mean. Most studies were multicenter (78%) and only 25% of them indicated pharmaceutical sponsorship. Involvement of a cooperative group was identified in almost half of the trials.


View this table:
[in this window]
[in a new window]

 
Table 1. Baseline Characteristics of 492 Negative Randomized Clinical Trials Published in Proceedings of ASCO Annual Meetings 1995-2003

 
Data Abstraction
Of the 16 items used in the analysis, nine had a concordance proportion of 90% or more, four had a concordance proportion between 80% and 90%, and three had a concordance proportion between 70% and 80%. Differences between investigators were resolved by consultation.

Statistical Power of Negative Randomized Trials
Of the 423 negative RCTs for which post hoc sample size calculations could be performed, 45 (10.6%), 138 (32.6%), and 233 (55.1%) had adequate sample size to detect small, medium, and large effect sizes, respectively.

There were 263 trials for which the primary outcome was expressed as a time to event, 59 of which (22.4%) were excluded because of insufficient information to perform post hoc power calculations. Of the remaining 204 trials, 45 trials (22.1%) were adequately powered to detect a small effect size (HR ≥ 1.3), only 130 trials (63.7%) had adequate sample size to detect a medium effect size (HR ≥ 1.5), and 196 trials (96.1%) had sufficient sample size to detect a large effect size (HR ≥ 2.0; Table 2).


View this table:
[in this window]
[in a new window]

 
Table 2. Negative Randomized Clinical Trials Published in ASCO Annual Meeting Proceedings From 1995 to 2003 With at Least 80% Power to Detect Three Different Effect Sizes Between Experimental and Standard Treatment Groups

 
There were 216 trials for which the primary outcome was expressed as a proportion. Of these, 10 were excluded because of insufficient information to perform post hoc power calculations (4.6%). Of the remaining 206 studies, none had adequate sample size to detect a small effect size (OR ≥ 1.3), only three trials (1.4%) had adequate sample size for a medium effect size (OR ≥ 1.5), and 26 trials (12.6%) were sufficiently powered to detect a large effect size (OR ≥ 2.0; Table 2).

Thirteen trials expressed their primary outcome as a mean. Of these trials, none had adequate sample size for a small effect size (≥ 0.2 SD), five trials (38.4%) had adequate sample size for a medium effect (≥ 0.5 SD), and 11 trials (84.6%) were sufficiently powered to detect a large effect size (≥ 0.8 SD; Table 2).

Predictors of Lack of Statistical Power
The significant multivariable predictors of lack of statistical power to detect a medium effect size were type of presentation (studies presented in poster or published only were more likely to lack adequate sample size; P < .0037), type of institutional sponsorship (single-center studies and multicenter studies not sponsored by a cooperative group were more likely to lack adequate sample size; P < .0001), and type of primary end point (studies reporting a proportion as a primary end point were more likely to lack adequate sample size; P < .0001; Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. Factors Associated With Lack of Statistical Power to Detect a Medium Effect Size in Multivariable Analysis

 
Reasons for Premature Termination
In 35 trials, the authors indicated that their studies were terminated prematurely before the attainment of their targeted accrual (Table 4). In 13 trials (37%), the authors attributed the early termination of their studies to slower than anticipated accrual. In nine trials (26%), an interim analysis suggesting lack of efficacy of the experimental treatment was cited as the indication for premature termination.


View this table:
[in this window]
[in a new window]

 
Table 4. Reasons Given in 35 Abstracts for Early Termination of Negative Clinical Trials Prior to Targeted Accrual

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Our survey of 423 negative clinical trials indicates that 55% of trials had too few participants to detect a medium effect size in favor of the experimental over the standard treatment arm for their primary end point with at least 80% statistical power. Although underpowered negative clinical trials have been widely reported in the general medical and subspecialty literature,2-11 there are few reports relating to trials evaluating treatment of cancer. A review of 22 negative randomized oncology trials published in major general medical or oncology journals during a 1-year period found that 16 trials (73%) lacked adequate statistical power to detect a 50% improvement in median survival in favor of the experimental arm.16 The present study is a more robust survey of research practices in clinical oncology; it is not limited to negative trials that used survival as a primary end point. Furthermore, our study encompasses all negative trials presented at the ASCO Annual Meetings during a 9-year period and includes negative studies that are never published. We have demonstrated previously that RCTs presented at the ASCO Annual Meeting that have nonsignificant findings are less likely to be published than RCTs with significant results.18

A trial may be underpowered because an investigator fails to perform an a priori sample size calculation. In our study, fewer than 10% of abstracts reported sample size calculations. This observation is consistent with the results of other surveys of abstracts published at oncology meetings.19 Many investigators may have performed sample size calculations that were not reported in the abstract. However, given that many negative clinical trials never achieve journal publication, authors of trials with negative results should be obliged to report a brief summary of the sample size calculation in the abstract so that their findings can be evaluated properly.

A trial may be designed with an appropriate sample size calculation, but then fail to accrue its target sample. This may occur because of patient-related factors, such as preference for a specific treatment arm, concerns about random assignment, and practical issues such as distance from the clinic and transportation costs. There may also be clinician and organizational barriers, such as lack of time for recruitment, preference for a particular treatment arm, poor organizational infrastructure, and multiple trials competing for the same patient. Trials may also be terminated before recruitment of their planned sample size because an interim analysis demonstrates lack of efficacy or unexpected toxicity in the treatment arm, because of lack of financial support for continuing the trial, or because new evidence renders the question being addressed by the trial to be no longer of clinical interest or even unethical.

In our study, few authors indicated why their studies were underpowered. In our multivariate model, failure to identify explicitly a primary end point, type of presentation, type of sponsorship, and type of primary end point were the most significant predictors of lack of adequate sample size to detect a medium effect. Negative RCTs reporting a time-to-event variable as a primary end point were more likely to demonstrate adequate sample size to detect a medium effect size than studies reporting a proportion or mean variable as a primary end point. In our cohort, time-to-event studies had larger sample sizes than studies with a proportion or mean variable as a primary end point. Moreover, time-to-event studies were more likely to explicitly identify a primary end point (P = .0007), involve multiple centers or a cooperative group (P < .0001), and present results in oral sessions at the ASCO Annual Meeting (P = .0002). Time-to-event studies were also less likely to be sponsored by the pharmaceutical industry than studies reporting a proportion or mean variable as a primary end point (P = .0007). This suggests that trials that report a time-to-event primary end point are more heavily funded and may be more likely to involve a statistician in the research design phase to perform a priori sample size calculations.

Some authors have characterized underpowered clinical trials as unethical, given that they expose patients to the risks of research without providing a reasonable opportunity for the outcome to contribute to scientific knowledge.12-15 They suggest that investigators should perform appropriate sample size calculations when designing trials and anticipate potential recruitment problems that might threaten the statistical power of their trial design. Other authors have challenged this doctrine, suggesting that well-conducted underpowered studies may provide valuable point estimates and CIs of treatment effect and can be synthesized with other studies in meta-analyses to perform valid treatment comparisons.20-22 Some have suggested that underpowered trials are unavoidable for rare cancers; however, in our study, 60% of trials underpowered for a medium effect size were for common tumor sites, such as breast, lung, and GI cancer.

Although power is only one variable that determines the validity of a clinical trial result, our findings indicate that most negative trials in clinical oncology lack an adequate sample size to detect at least a medium effect size for their primary end point. In contrast, most clinical trials in oncology with positive results demonstrate much smaller effect sizes. For example, a review published by two major cooperative groups in the United States of clinical trials that achieved their targeted accrual during a 15-year period showed that the average effect size was 1.20, or a relative improvement of 20% in favor of the experimental versus the standard treatment arm.23 The average effect size to detect clinical improvements in the advanced-disease setting rather than adjuvant setting may be even smaller.24

There are several limitations of our study. We used a power value of 80% with an {alpha} level of .05 to differentiate adequately powered from inadequately powered studies. This is an arbitrary cutoff point, given that statistical power, much like a P value, exists on a continuum.25 However, a power of 80% with an {alpha} level of .05 is widely recognized as a minimum standard for sample size calculations in clinical trials. In analyzing the abstracts for our study, we made assumptions that might have influenced our results, and we recognize that abstracts cannot provide the same level of detail as full, published reports. For instance, when a primary end point was not explicitly identified, we used the first reported end point as a surrogate for the primary end point. In so doing, we may have falsely classified studies as underpowered, when they were appropriately powered for an end point that was not the first reported end point. When the exact numbers of participants in the standard and experimental groups were not reported, we assumed that the random assignment was 1:1 and performed our analysis with half of the total number of participants in each group.

In summary, more than half of negative RCTs published at ASCO Annual Meetings are underpowered to detect a medium-sized treatment effect. We propose that abstracts that report clinical trials in oncology should identify explicitly a primary end point; provide a brief summary of the sample size calculation; and indicate the statistical power, {alpha} level, and anticipated treatment effect size on which an analysis is undertaken. Authors should also provide a clear explanation as to why a trial is underpowered if it fails to reach its target sample size, so that the findings of negative clinical trials can be interpreted appropriately.


    AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
The author(s) indicated no potential conflicts of interest.


    AUTHOR CONTRIBUTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Conception and design: Philippe L. Bedard, Monika K. Krzyzanowska, Melania Pinitilie, Ian F. Tannock

Administrative support: Philippe L. Bedard, Monika K. Krzyzanowska, Ian F. Tannock

Collection and assembly of data: Philippe L. Bedard, Monika K. Krzyzanowska

Data analysis and interpretation: Philippe L. Bedard, Monika K. Krzyzanowska, Melania Pinitilie, Ian F. Tannock

Manuscript writing: Philippe L. Bedard, Monika K. Krzyzanowska, Ian F. Tannock

Final approval of manuscript: Philippe L. Bedard, Monika K. Krzyzanowska, Melania Pinitilie, Ian F. Tannock


    Appendix
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Definition of effect size. Classification of effect size as small, medium, or large was defined differently for studies reporting a mean, proportion, or time-to-event variable as a primary end point.

Mean as primary end point. For studies reporting a mean as a primary end point, effect size was assessed as a multiple of the standard deviation of the sample in the standard arm (SD) with a small effect size 0.2 SD, medium effect size 0.5 SD, and a large effect size 0.8 SD. These criteria are based on the published guidelines of Cohen.17

Proportion as primary end point. For studies with a proportion (eg, an odds ratio [OR]) as a primary end point, we could not find a published definition of small, medium, and large effect sizes. To establish criteria, we considered a hypothetical study with an event rate of 50% in the standard treatment arm. In this scenario, the calculated ORs for the following observed treatment effects are listed in Table A1. These absolute differences provide a reasonable definition of small, medium, and large effect sizes, respectively, and we have therefore defined a small effect size as OR ≥ 1.3, a medium effect size as OR ≥ 1.5, and a large effect size as OR ≥ 2.

Time-to-event as primary end point. For studies with a time-to-event variable (eg, a hazard ratio [HR]) as a primary end point, we could not find a published definition of small, medium, and large effect sizes. To establish criteria, we considered a hypothetical study in which the survival at 2 years was 50% in the standard treatment arm. In this scenario, the calculated HRs for the observed treatment effects are listed in Table A2. These absolute differences provide a reasonable definition of small, medium, and large effect sizes, respectively, and we have therefore defined a small effect size as HR ≥ 1.3, a medium effect size as HR ≥ 1.5, and a large effect size as HR ≥ 2. This approach is supported by a review of 93 phase III clinical trials conducted by two cooperative groups in the United States during a 15-year period, which demonstrated that mean HR for the entire cohort was 1.20 (Joffe S, Harrington DP, George SL, et al: BMJ 328:1463, 2003). If trials that obtained negative results are excluded, the median effect size reported in this cohort was 1.26 to 1.50, which would correspond to a medium effect size using our definition.


View this table:
[in this window]
[in a new window]

 
Table A2. Calculated Hazard Ratios for Treatment Effects

 
Post hoc sample size calculations. Post hoc sample size calculations were performed in the following manner for studies with a mean, proportion, or time-to-event variable as a primary end point.

Mean as primary end point. The effect size is given as a multiple of the standard deviation

The total sample size is given by the following equation:


Formula 1(1)
[Devore JL: Probability and Statistics for Engineering and Sciences (ed 2). Belmont, CA, Brooks/Cole, 1987], where s is the standard deviation in the standard arm, {delta} is the effect size as defined earlier, and z1{alpha}/2 and z1ß are the quantiles of the standard normal distribution, using z1{alpha}/2 = 1.96 and z1ß = 0.84.

To perform post hoc power calculations, the following assumptions were made: the data are normally distributed, the standard deviation is the same in both arms, and there was the same number of patients in each of the two arms (ie, 1:1 randomization)

Proportion as primary end point. The effect size is expressed in terms of the OR. If an increase in OR is anticipated (p2 > p1), then

Formula 2(2)
If p2 < p1, then

Formula 3(3)
We define qi = 1 – pi, where i = 1, 2 (the arms of the study).

The total sample size is given by

Formula 4(4)
(Machin D, Campbell MJ: Statistical Tables for the Design of Clinical Trials. Blackwell Scientific Publications, 1987), where {delta} = p2p1, and z1{alpha}/2 = 1.96 and z1ß = 0.84.

Time-to-event as a primary end point. The effect size is expressed in terms of the HR. The following effect sizes are considered: small ({theta} = 1.3), medium ({theta} = 1.5), and large ({theta} = 2). The total necessary number of events is

Formula 5(5)
where {theta} is as defined earlier, and z1{alpha}/2 = 1.96 and z1ß = 0.84.

On the basis of the information obtained from the abstract, the total number of events over the duration of the study was estimated by the following methods.

(1) For studies in which competing risks were not present, the accrual and follow-up time were provided, along with an estimate of the percent survival in the standard arm at a point in time or the median survival.

(a) The hazard rate for the standard arm can be calculated as either {lambda}1 = –[ln(S)/t], where S is survival at time t for the standard arm, or {lambda}1 = –[ln(0.5)/t1], where t1 is the median S for the standard arm.

(b) The hazard rate for the experimental arm can be calculated as {lambda}2 = ({lambda}1/{theta}).

(c) For each arm, the probability of event during the study is

Formula 6(6)
for i = 1,2 (the arms of the study), where a = accrual time (the time the study was open for accrual; for example, if the study accrued between 1990 and 1992, a = 3 years); and f = follow-up time (the time between ending accrual and 6 months before the submission of the abstract to the American Society of Clinical Oncology). Given that the deadline for abstract submission is usually in December, the follow-up time is considered to be the end of October of the year before publishing the abstract. For example, if the accrual ended at the end of 1992 and the year of the abstract is 1997, the follow-up time starts in January 1993 and ends in October 1996, and f = 3.83 years.

(d) The total number of events is n1p[r]1 + n2p[r]2, where pi values are calculated in (c), and n1 and n2 are the total number of patients randomly assigned in the standard and experimental arm, respectively.

(2) For studies in which competing risks might have been present, the total number of events, if provided in the abstract, was used.

(3) For studies in which competing risks were not present, an estimate of percent survival in the standard arm at a given point in time was provided or the median survival was provided. In these studies, the accrual and follow-up times were not known but the median follow-up time was provided. To calculate sample size, the procedure was similar to that described in section (1), except (c) was replaced by Formula 6where m is the median follow-up time.

(4) For studies in which competing risks were not present, an estimate of percent survival for the standard arm at a point in time was provided or the median survival was provided. If neither the accrual and follow-up times nor the median follow-up time were provided but the total number of events was explicitly stated, then this number of events was used.

(5) For studies in which competing risks might have been present and the total number of events was not provided, but all the information from either sections (1) or (3) or (4) was provided, then the same procedures as outlined in the respective paragraphs were used.

The total number of events observed in the study is contrasted with the number of events necessary, calculated as nev (equation 1).

To perform post hoc power calculations, the following assumptions were made: it was considered that the time to event was exponentially distributed and the accrual was uniform over time; when the end point was described using words such as "relapse," "progression," or "failure," unless specifically defined, it was considered that the competing risks might have been present.

Go


View this table:
[in this window]
[in a new window]

 
Table A1. Calculated Odds Ratios for Treatment Effects

 
Go


    ACKNOWLEDGMENTS
 
We thank Ida Lee for assistance with data entry and Mel Giovinazzo for administrative support.


    NOTES
 
Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
1. Zia MI, Siu LL, Pond GR, et al: Comparison of outcomes of phase II studies and subsequent randomized control studies using identical chemotherapeutic regimens. J Clin Oncol 23:6982-6991, 2005[Abstract/Free Full Text]

2. Freiman JA, Chalmers TC, Smith H, et al: The importance of beta, the type II error, and sample size in the design and interpretation of the randomized controlled trial: A survey of 71 ‘negative’ trials. N Engl J Med 299:690-694, 1978[Abstract]

3. Moher D, Dulberg CS, Wells GA: Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 272:122-124, 1994[Abstract/Free Full Text]

4. Reference deleted by author.

5. Hebert B, Wright S, Dittus R, et al: Prominent medical journals often provide insufficient information to assess the validity of studies with negative results. J Negat Results Biomed 1:1, 2002[Medline]

6. Brown CG, Kelen GD, Ashton JJ, et al: The beta error and sample size determination in clinical trials in emergency medicine. Ann Emerg Med 16:183-187, 1987[CrossRef][Medline]

7. Edmund MJ, Overall JE, Rhoades HM: Beta, or type II error in psychiatric controlled clinical trials. J Psychiatr Res 19:563-567, 1985[CrossRef][Medline]

8. Mengel MB, Davis AB: The statistical power of family practice research. Fam Pract Res J 13:105-111, 1993[Medline]

9. Dimick JB, Diener-West M, Lipsett PA: Negative results of randomized clinical trials published in the surgical literature: Equivalency or error? Arch Surg 136:796-800, 2001[Abstract/Free Full Text]

10. Williams HC, Seed P: Inadequate size of ‘negative’ clinical trials in dermatology. Br J Dermatol 128:317-326, 1993[CrossRef][Medline]

11. Keen HI, Pile K, Hill CL: The prevalence of under-powered randomized clinical trials in rheumatology. J Rheumatol 32:2083-2088, 2005[Abstract/Free Full Text]

12. Halpern S, Karlawish JH, Berlin JA: The continuing unethical conduct of underpowered clinical trials. JAMA 288:358-362, 2002[Abstract/Free Full Text]

13. Altman DG: Statistics and ethics in medical research: III. How large a sample? BMJ 281:1336-1338, 1980[Free Full Text]

14. Newell DJ: Type II errors and ethics. BMJ iv:1789, 1978

15. Altman DG: The scandal of poor medical research. BMJ 308:283-284, 1994[Free Full Text]

16. Martins RG, Finkelstein DM, Selden MV: The importance of beta, type II error, in negative trials in oncology. Proc Am Soc Clin Oncol 15:415a, 1997 (abstr 1482)

17. Cohen, J: Statistical power analysis for the behavioral sciences (ed 2). Hillsdale, NJ, Erlbaum, 1988

18. Krzyzanowska MK, Pintilie M, Tannock IF: Factors associated with large randomized trials presented at an annual oncology meeting. JAMA 290:495-501, 2003[Abstract/Free Full Text]

19. Krzyzanowska MK, Pintilie M, Brezden-Masley C, et al: Quality of abstracts describing randomized trials in the proceedings of American Society of Clinical Oncology meetings: Guidelines for improved reporting. J Clin Oncol 22:1993-1999, 2004[Abstract/Free Full Text]

20. Edwards SJL, Lilford RJ, Braunholtz D, et al: Why "underpowered" trials are not necessarily unethical. Lancet 350:804-807, 1997[CrossRef][Medline]

21. Knapp TR: The overemphasis on power analysis. Nurs Res 45:379-381, 1996[CrossRef][Medline]

22. Lilford R, Stevens AJ: Underpowered studies. Br J Surg 89:129-131, 2002[Medline]

23. Joffe S, Harrington DP, George SL, et al: Satisfaction of the uncertainty principle in cancer clinical trials: Retrospective cohort analysis. BMJ 328:1463, 2003

24. Chleblowski RT, Lillington LM: A decade of breast cancer clinical investigation: Results as reported in the program/Proceedings of the American Society of Clinical Oncology. J Clin Oncol 12:1789-1795, 1994[Abstract/Free Full Text]

25. Janosky JE: The ethics of underpowered clinical trials. JAMA 288:2118-2119, 2002[CrossRef][Medline]

Submitted February 19, 2007; accepted May 29, 2007.


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Facebook Facebook   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
The Journal of RheumatologyHome page
S. R. JOHNSON, B. M. FELDMAN, J. E. POPE, and G. A. TOMLINSON
Shifting Our Thinking About Uncommon Disease Trials: The Case of Methotrexate in Scleroderma
J Rheumatol, February 1, 2009; 36(2): 323 - 329.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
P. L. Bedard and M. J. Piccart-Gebhart
Nonhormonal Systemic Therapy for Advanced Breast Cancer: Do the Math!
J Natl Cancer Inst, December 17, 2008; 100(24): 1745 - 1747.
[Full Text] [PDF]


Home page
JCOHome page
C. M. Booth and I. Tannock
Reflections on Medical Oncology: 25 Years of Clinical Trials Where Have We Come and Where Are We Going?
J. Clin. Oncol., January 1, 2008; 26(1): 6 - 8.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bedard, P. L.
Right arrow Articles by Tannock, I. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bedard, P. L.
Right arrow Articles by Tannock, I. F.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

About
JCO
 Editorial
Roster
 Advertising
Information
 Librarians &
Institutions
 Rights &
Permissions
 PDA Services

Copyright © 2007 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
Terms and Conditions of Use
  HighWire Press HighWire Press™ assists in the publication of JCO Online