|
|||||
|
|
||||||
© 2002 American Society for Clinical Oncology
Use and Abuse of Statistics in Evidence-Based Medicinefor the Eastern Cooperative Oncology Group, Southwestern Oncology Group, Cancer and Leukemia Group B, and the US Melanoma Intergroup To the Editor:Lens and Dawes recently published an article1 presenting a meta-analysis of published, individually statistically significant Eastern Cooperative Oncology Group and Intergroup trials,2,3 drawing negative conclusions regarding the adjuvant effects of high-dose interferon alfa-2b in melanoma. Meta-analyses are generally used to aggregate the results of underpowered individual studies that are individually incapable of drawing positive conclusions. The selection of data to be analyzed and the choice of statistical tests are critical. Lens and Dawes have omitted the largest and most positive of the trials4 that demonstrates significant prolongation of survival and relapse interval by high-dose interferon alfa-2b (United States [US] Intergroup trial E1694), and they have compounded this omission by the use of a statistical tool that is inappropriate for analysis of the two other studies, E1684 and Intergroup E1690. We would like to discuss these errors for the readership, on behalf of the US cooperative groups that conducted the trials. The largest and most positive trial demonstrating survival benefit from high-dose interferon is the US Intergroup trial E1694. This tested the efficacy of high-dose interferon in relation to a vaccine, the GM2-KLH-QS21 vaccine (GMK; Progenics, Inc, Tarrytown, NY), which was selected as the most promising vaccine for study in 1995 by the US Intergroup. This study was closed early at a median follow-up of 1.3 years because of the highly significant superiority of high-dose interferon alfa-2b in terms of both primary end points, survival (P1 = .009) and relapse interval (P1 = .002, efficacy population).4 This study was published in May 2001, within the window of the survey of Lens and Dawes, and has recently5 been updated at 2.1 years of median follow-up, showing no change in the statistically significant survival and relapse interval benefits from high-dose interferon. The reduction in the hazard of relapse and of death in the E1694 trial is 28% to 33%. There is no evidence of any adverse impact on the GM2 vaccine arm, for which the outcome is similar to observation arms of the most recent prior Intergroup trial, E1690,3 and superior to the observation arm of E1684,2 the pivotal trial for US and worldwide regulatory approval of high-dose interferon alfa-2b. We would like to note for the readership that all Eastern Cooperative Oncology Group and Intergroup adjuvant trials of interferon have been designed to be analyzed by the log-rank test, a robust test that evaluates the hazard function for outcomes of relapse and death over time and that accounts for follow-up interval and censorship appropriately. Lens and Dawes draw negative conclusions from our positive studies by inappropriately applying Fishers exact test to only the published event data, when these trials were prospectively planned by the US cooperative groups and National Cancer Institute to be analyzed by the log-rank test, incorporating the temporal distribution of the outcome data. Lens and Dawes base their analysis on the Fishers exact test, using only the numbers of events for each treatment, although these outcomes are time-dependent. Their conclusions are derived from analyses of our published data, not on the actual events and times of events, as utilized for analyses presented in the original publications, and the US Food and Drug Administrations review for the pivotal study E1684. Such analyses are inappropriate because they do not take into account the actual survival times and censoring times that are available for each of the studies as they were planned, and on the basis of which the sample sizes were calculated to detect the outcomes of relapse/death with adequate power. The log-rank test is the appropriate tool for analysis of these data, but this requires raw data for analysis. These data are available and have already been provided to the US Food and Drug Administration for separate confirmatory analyses. The log-rank test is a more powerful test than the Fishers exact because it uses the actual temporal distribution of events and accommodates right-censored event times. As an example, we consider their analysis of the E1684 data. This study was designed to detect a 50% improvement in relapse-free and overall survival using the log-rank test with one-sided significance level of .05. For the end point of overall survival, the observed hazard ratio for the observation arm over the high-dose interferon arm was 1.32 with a 90% confidence interval (1.03 to 1.69), and the corresponding one-sided P value for the log-rank test for overall survival was 0.023. The 90% confidence interval for the hazard ratio is consistent with a one-sided test at the .05 level. We note here that the results are clearly significant using the statistical tests specified according to the design of the trial: the one-sided log-rank P value is less than .05, and the 90% confidence interval for the hazard ratio does not include 1. Based on the E1684 analysis with 6.9 years of median follow-up as first published in 1996, there is a clear and statistically significant benefit of high-dose interferon over observation. Such a result is not reached based on the 2 x 2 table analysis of Lens and Dawes, which ignores the available time distribution of data. In general, Fishers exact test will be less powerful than the log-rank test for analysis of time-to-event data, since Fishers exact test uses the raw numbers of events in each arm and ignores the follow-up that is available for each patient. Evidence-based medicine ought to be based on all of the evidence, recognizing that the highest level of evidence is found in individual randomized clinical trials of adequate power analyzed by methods that make optimal use of all information germane to the trial outcomes. Meta-analyses that select some but not other trials and use weaker statistical tests that do not incorporate time-dependent variables are destined to mislead us. REFERENCES
1. Lens MB, Dawes M: Interferon alfa therapy for malignant melanoma: A systematic review of randomized controlled trials. J Clin Oncol 20: 1818-1825, 2002 2. Kirkwood JM, Strawderman MH, Ernstoff MS, et al: Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: The Eastern Cooperative Oncology Group Trial EST 1684. J Clin Oncol 14: 7-17, 1996[Abstract]
3. Kirkwood JM, Ibrahim JG, Sondak VK, et al: High- and low-dose interferon alfa-2b in high-risk melanoma: First analysis of intergroup trial E1690/S9111/C9190. J Clin Oncol 18: 2444-2458, 2000
4. Kirkwood JM, Ibrahim J, Sosman JA, et al: High-dose interferon alfa-2b significantly prolongs relapse-free and overall survival compared with the GM2-KLH/QS-21 vaccine in patients with resected stage IIB-III melanoma: Results of intergroup trial E1694/S9512/C509801. J Clin Oncol 19: 2370-2380, 2001 5. Kirkwood JM, Ibrahim JG, Sondak VK, et al: Interferon alfa-2a for melanoma metastases. Lancet 359: 978-979, 2002 (letter)[CrossRef][Medline]
ResponseUniversity of Oxford, John Radcliffe Hospital, Oxford, United Kingdom In Reply:Kirkwood et al have criticized a meta-analysis in a systematic review.1 For reasons made clear in the article, we have not undertaken meta-analysis but confined ourselves to a systematic review. Meta-analysis is a statistical method for combining the results from two or more independent studies (quantitative synthesis), whereas a systematic review describes any review of a body of data that uses clearly defined methods and criteria.2,3 Furthermore, we were explicit with the statement in the Discussion of our article that the data from included individual studies were not pooled. The fact that our systematic review omitted the results from the United States (US) Intergroup trial E1694,4 published within the window of our review, should not be considered as an error. The US Intergroup trial E1694 simply did not fulfill our inclusion criteria. It is clearly stated in the Methods section of our article that the studies were excluded if they used combination therapy or compared interferon-alfa treatment with some other form of systemic therapy. The trial E1694 compared interferon-alfa therapy with the GM2-KLH/QS vaccine. Although Kirkwood et al accentuated the importance of the positive results from this trial, these results should be considered in context with other studies that compare interferon with other systemic therapies. We agree with Kirkwood et al that use of the raw data in the analysis is very important. However, although we had requested that authors from included trials provide us with raw data, the trialists decided not to make their data available. Therefore, we were limited to including only published data and adhering to one of the principal recommendations of the Quality of Reporting of Meta-Analyses (QUOROM) statementto present simple summary results from each treatment group in each trial, for each primary outcome.5 Some authors advocated the importance of publishing the raw data used in medical research articles.6,7 We recognize that choosing an adequate statistical method when analyzing the data is important. The log-rank test may be preferable, but in the absence of the raw data, we were not able to use it. However, we must be precise that we did not run Fishers exact test on the published data. We calculated odds ratios with 95% confidence intervals, as clearly described in the Methods section of the article. Although both methods can be used with 2 x 2 tables when the question is about an association, Fishers exact test is a test statistic which two-tailed P value rejects or does not reject the null hypothesis. On the contrary, the odds ratio is an effect estimate reported with the 95% confidence interval (there is no null hypothesis for confidence intervals). Kirkwood et al gave an example using data from the E1684 trial advocating the importance of obtained evidence based on an arbitrary division of results as "significant" or "nonsignificant" according to the commonly used threshold of P value (P = .05). Kirkwood et al used one-sided P values, which are rarely appropriate. Only a small number of one-sided t tests are found in the published articles. Kirkwood et al stated only in their letter that the one-sided tests were preplanned, but from the original publication of the trial,8 the reader cannot determine whether this decision was made before the data were analyzed and whether it is not dependent on what the results were. Furthermore, the authors did not justify why they have considered that a real difference can occur in only one direction. We would like to note that if we use as an example the end point of overall survival, the corresponding two-sided P value for the log-rank test would be .047. The validity of P values is important, but P values smaller than .05 cannot be considered as proof of conclusive, or even strong, evidence against the null hypothesis. Reporting of medical research should continue to move from the idea that results are significant or nonsignificant to the interpretation of findings in the context of the type of study and other available evidence.9 The validity of the statistical analysis used in the randomized trial and the clinical relevance of the results are important parameters when assessing the quality of the trials, but the design, conduct, and quality of reporting are as, if not more, important.10 It is essential that randomized trials are reported adequately. Proper methodology should be used and reported explicitly so that readers should not have to infer what methodology was probably used; they should see that it have been used.11 There is no described analysis that can compensate for biases introduced through poor trial design. The achievement of allocation concealment is crucial when conducting and reporting randomized trials.12 Inadequate or unclear concealment of treatment allocation may overestimate the treatment effect by as much as 30%.13 Although the Eastern Cooperative Oncology Group and Intergroup trials were planned by US cooperative groups and the National Cancer Institute, published reports of the trials included in our review (E1684 and E1690) did not have sufficient description of the study design to suggest that adequate concealment of allocation had taken place. It is better to address weaknesses in research design and to consider their possible effects on the results and their interpretation than to ignore them in the hope that they will not be noticed.14 In order to avoid biased findings of systematic reviews, there is a need to improve the quality of the included trials and on a routine basis conduct sensitivity analyses to assess the methodologic quality of controlled trials.13 Highlighting the importance that trial design and analysis can have on results is one of the major roles of evidence-based medicine. REFERENCES
1. Lens MB, Dawes M: Interferon alfa therapy for malignant melanoma: A systematic review of randomized controlled trials. J Clin Oncol 20: 1818-1825, 2002
2. Egger M, Smith GD: Meta-analysis: Potentials and promise. BMJ 315: 1371-1374, 1997 3. Chalmers I, Altman DG: Foreword, in Chalmers I, Altman DG (eds): Systematic Reviews. London United Kingdom, BMJ Publishing, 1995
4. Kirkwood JM, Ibrahim J, Sosman JA, et al: High-dose interferon alfa-2b significantly prolongs relapse-free and overall survival compared with the GM2-KLH/QS-21 vaccine in patients with resected stage IIb-III melanoma: Results of intergroup trial E1694/S9512C509801. J Clin Oncol 19: 2370-2380, 2001 5. Moher D, Cock DJ, Eastwood S, et al: Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statementQuality of Reporting of Meta-Analyses. Lancet 354: 1896-1900, 1999[CrossRef][Medline]
6. Hutchon DJR: Publishing raw data and real time statistical analysis on e-journals. BMJ 322: 530, 2001
7. Eysenbach G, Sa ER: Code of conduct is needed for publishing raw data. BMJ 323: 166, 2001 (letter) 8. Kirkwood JM, Strawderman MH, Ernstoff MC, et al: Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: The Eastern Cooperative Oncology Group trial EST 1684. J Clin Oncol 14: 7-17, 1996[Abstract]
9. Sterne JA, Smith GD, Cox DR: Sifting the evidence: Whats wrong with significance tests? BMJ 322: 226-231, 2001 10. Juni P, Altman DG, Egger M: Assessing the quality of controlled clinical trials, in Egger M, Smith GD, Altman DG (eds): Systematic Reviews in Health Care: Meta-Analysis in Context ( ed 2 ). London United Kingdom, BMJ Books, 2001
11. Altman DG: Better reporting of randomised controlled trials: The CONSORT statement. BMJ 313: 570-571, 1996 12. Moher D, Schulz KF, Altman DG: The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 357: 1191-1194, 2001[CrossRef][Medline]
13. Juni P, Altman DG, Egger M: Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ 323: 42-46, 2001 14. Altman DG, Gore SM, Gardner MJ, et al: Statistical guidelines for contributors to medical journals, in Altman DG, Machin D, Bryant TN, et al (eds): Statistics With Confidence ( ed 2 ). London United Kingdom, BMJ Books, 2000
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2002 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|