|
|||||
|
|
||||||
Journal of Clinical Oncology, Vol 26, No 22 (August 1), 2008: pp. 3715-3720 © 2008 American Society of Clinical Oncology. DOI: 10.1200/JCO.2007.14.1044 Phase II Stopping Rules That Employ Response Rates and Early Progression
From the Juravinski Cancer Centre, McMaster University, Hamilton; and the National Cancer Institute of Canada Clinical Trials Group and Department of Mathematics and Statistics, Queen's University, Kingston, Ontario, Canada Corresponding author: John R. Goffin, Juravinski Cancer Centre, 699 Concession St, Hamilton, Ontario L8V 5C2; e-mail: john.goffin{at}hrcc.on.ca
Purpose Phase II oncology trials traditionally have used response rate (RR) as the primary end point, but newer targeted agents require the consideration of alternative end points. High rates of early progressive disease (EPD) suggest inadequate drug activity and may be useful in the early stopping of trials. This study used a simulation to define a set of rules to assess a combined end point of RR and EPD.
Methods The simulation assumed a two-stage trial with a specified Results Thresholds for nr and np that satisfied the specified error rates were generated. There was at least an 89% likelihood that a study would be stopped at the first stage of accrual if r and epd were uninteresting. Conclusion The simulation was able to establish stopping rules by combining the RR and the EPD that achieved the desired error rates. High rates of early stopping suggest that this design could shorten phase II trials of inactive agents.
Drug development in oncology has become increasingly resource-intensive, even as more agents come under study.1,2 The phase II study has a limited ability to discriminate drugs that are suitable for phase III study,3,4 and improvement of the efficiency of phase II studies could accelerate drug development. The conventional use of response rate (RR) in phase II trials is tied to the assumption that such responses are associated with better survival. Data generally support this association,5-8 but some drugs induce minimal response in particular settings while still improving survival.9,10 In addition, stable disease is likely to be associated with an improvement in survival.6,11-14 Recent drug development has been directed toward more targeted therapies. Some of these drugs are likely to cause cell death, but other agents—drugs termed cytostatic—may be more likely to induce only stabilization of disease and, as such, will require different considerations when activity is assessed.15 To monitor for at least a minimal signal of disease stabilization, one can employ the end point of early progressive disease (EPD) in phase II trials. This may be defined as progression of disease at the first point of disease remeasurement after treatment. Drugs that allow a significant proportion of the population to undergo early disease progression are potentially unattractive. Traditional, two-stage, phase II designs with one binary end point, as developed by Fleming,16 Gehan,17 and others18 can be used to assess RR or proportions of EPD. Zee et al19 proposed a new design that would incorporate both RR and proportions of EPD as the end points in a phase II trial. The goal was to increase the likelihood of early termination of trials with agents that showed excessive early progression and a limited rate of response. The desired alternate hypothesis was to find drugs with greater than minimally specified RR or less than the specified EPD rate. Although attractive in concept, the authors found from simulations that the power of the studies with the decision rule from this design was less than that set in the design.20 Chang et al21 considered a two-stage design of phase II trials with both RR and the EPD rate in the context of a window-of-opportunity study. Their study employed an "and" in the alternate hypothesis and included the expectation that drugs should have both a good RR and a minimal EPD rate to be attractive in a previously untreated population. They indicated that their design may be used to assess cytostatic agents or previously treated patients by switching the null and alternative hypotheses. However, their design allows for early acceptance of seemingly interesting drugs at the end of the first stage, which is not a standard practice in phase II oncology trials. This article develops stopping rules for phase II trials by using the same framework as Zee et al.19 It employs both RR and EPD rate and can be used to assess drugs in the setting of advanced disease. An approach that is based on simulations was adopted.
The simulation lets r be the true response rate of the agent that is being studied and epd its true rate of early progressive disease. In this simulation, one assumes that two pairs of parameters, (rnul, epdnul) and (ralt, epdalt), can be specified that would render a drug uninteresting for further development if r is less than or equal to rnul and epd is greater than or equal to epdnul and interesting for further development if r is greater than or equal to ralt or epd is less than or equal to epdalt. That is, we are interested in testing the following hypotheses in a phase II trial: the null hypothesis of r being less than or equal to rnul and epd being greater than or equal to epdnul versus the alternate hypothesis of r being greater than or equal to ralt or epd being less than or equal to epdalt.
This article considers the following two-stage procedure for testing the above hypotheses: In the first stage, n1 are entered. Then, n1r and n1p are assumed to be the number of patients who responded and the number who had early progression among these n1 patients, respectively. The trial would be stopped at this stage if n1r is less than or equal to n1r-nul and n1p is greater than or equal to n1p-nul, in which n1r-nul and n1p-nul are two thresholds determined during the design of the study. Otherwise, n2 additional patients are entered in the second stage of the study. Let n2r and n2p be respectively the number of patients who responded and who had early progression among these n2 patients. The drug will be declared interesting at the end of the second stage if n1r + n2r
Simulations were performed with TreeAge Pro Healthcare software (Williamstown, MA) to determine thresholds (program available on request). For each simulation, the following parameters were prespecified: design parameters (rnul, epdnul) and (ralt, epdalt), desired power and
The full-space method, for simulations of the alternative hypothesis, used a weighted probability (that was based on the space created by the selected [ralt, epdalt] parameters) to randomly select a value for r or epd from within the range of interest. Thus, either r greater than or equal to ralt or epd less than or equal to epdalt is generated. The remaining parameter then is randomly selected from the values that satisfy r + epd For simulations of the null hypothesis, the same method was used, but r and epd both were required to satisfy r less than or equal to rnul and epd greater than or equal to epdnul.
The borderline-value method assumed that extremely desirable values for r and epd were unlikely and that the inclusion of such values in assessed populations would under-power a study to detect drugs with borderline r or epd characteristics. In the borderline-value method, cohorts to assess the alternate hypothesis were generated by randomly assignment of r as ralt or epd as epdalt. The nonprecedent parameter then was randomly assigned a value within its noninteresting range (ie, r
Thresholds Created by the Full-Space Method Table 1 indicates the patient thresholds required for acceptance or rejection of the null hypothesis after stage 1 and 2, when drugs of interest were assumed to have r greater than or equal to ralt or to have epd less than or equal to epdalt. Power was specified at .80; error was specified at .05; 15 patients were assessed in each stage.
Results would be read as follows: For the parameters of ralt 0.4, rnul 0.2, epdalt 0.2, and epdnul 0.4, the drug would be rejected and the study would be stopped after stage 1 of accrual if the number of responders was less than or equal to four of 15 and the number of patients with EPD was less than or equal to four of 15 or if the number of responders was less than or equal to five of 15 and the number of patients with EPD was less than or equal to five of 15. If these two pairs of criteria were not met, the study would accrue the second stage; early stopping was not permitted for drugs that met the criteria for the alternate hypothesis, although these pairs were listed. At the end of stage 2, the drug would be accepted as active if the number of responders was greater than or equal to 12 of 30 or if the number of patients with EPD was less than or equal to six of 30. In practice, it would otherwise be rejected. Strictly, however, the program was designed to reject drugs if they fell within the ranges of response and progressive disease that were considered uninteresting. The listed criteria for drug rejection were, therefore, the basis for the error calculation. In Table 1, a few pairs may be found between the acceptance and rejection criteria at stage 2; therefore, the error in practice would be slightly lower (ie, better) than that stated by the program.
In all assessments of these thresholds by the full-space method, the error rates achieved were better than required. Table 1 also indicates the expected average study size when a drug met either the criteria of interest or disinterest. As expected with the low
Thresholds Created by the Borderline-Value Method
However, the power and
The thresholds in Table 2 assume that uninteresting drugs have both r less than or equal to rnul and epd greater than or equal to epdnul, and the values were taken from the full space of possibility. However, drugs of borderline disinterest can be assessed. Table 3 lists thresholds that will detect interesting drugs of borderline value (as Table 2) but will also detect uninteresting drugs with both r equal to rnul and epd equal to epdnul. This more challenging situation results in thresholds of interest and disinterest that are immediately adjacent. It fails to achieve the desired error in all instances. Only slight improvements were achieved by using a trial size of n = 45 (data not shown).
A sensitivity analysis on the basis of the borderline method of Table 2 is listed in Table 4. A decrease in the difference between epdnul and epdalt or an alteration of the stage size resulted in small changes in error rates.
The EPD rate provides another means of drug activity assessment and offers the potential to stop studies early when the rate is undesirably high. This article creates a set of rules to incorporate RR and EPD rate.
Two different methods to determine patient thresholds for response and EPD are described. The full-space method simulates drugs, and r and epd are chosen from throughout the ranges relevant to the null or alternate hypothesis. It is attractive in its absence of bias or presupposition about the characteristics of the drugs under study. As indicated in Table 1, thresholds were generated that met the specified error rates. The power is greater than required so that the thresholds detect drugs with either a favorable RR or a favorable EPD rate. A lesser power could have been chosen but would have been without gain, as the
Thresholds generated by the full-space method may dismiss drugs of marginal activity that still meet the alternate hypothesis, as listed in Table 1. In contrast, the borderline-value method creates thresholds that assume marginal activity for a drug and achieves the specified power and
In addition, Table 3 lists thresholds created by drugs that have borderline characteristics for both cohorts of interest and of disinterest. The rules fail to meet the specified Although extremely good drugs are unlikely to exist, poor drugs are more frequently encountered; RRs near zero are observed in early trials, and the EPD rate can be high in advanced or pretreated disease. Therefore, the value of rules designed to sensitively detect drugs of only borderline disinterest is doubtful. The utility of the parameters of Table 3 is in indicating the limits of such rules. Other cohorts of drugs could be used to generate thresholds that have distributions that favor particular portions of the space of interest or disinterest. The difficulty is in knowing what RR or EPD rate distribution would be likely. For practical reasons, we favor the thresholds capable of detecting drugs of borderline interest as generated for Table 2. The initial paper that added EPD to the assessment of phase II trials was found later by its authors to have poorer power than intended.19,20 That paper differs from the present paper in the employment of multiple pairs of threshold criteria that reject the null hypothesis. Because the alternate hypothesis is designed to detect drugs with either interesting parameter, a single pair suffices. However, that initial paper indicated the importance of the detection of drugs with minimal but sufficient activity, and it supports the use of rules of Table 2.
Unlike this article, Chang et al21 designed a trial to allow acceptance of a drug after the first stage of accrual, which may be appropriate for extremely active drugs or limited resources. In a heterogeneous population, many investigators prefer to accrue additional patients. This will modestly improve the confidence intervals around outcome rates and will allow better planning of phase III studies for drugs with marginal activity. Correspondingly, desired error rates may be difficult to achieve after only the first stage of accrual; after stage 1, in Table 2,
The consequence of rejection of the null hypothesis at stage 1 can be shown. Chang et al21 indicate that their hypotheses can be reversed and give one example with the hypotheses set as in the present paper. By using specified parameters (power = .8; alpha = .05; n1 = 21, n2 = 21; null hypothesis, r By applying the rules of Zee et al19 to 39 completed, phase II trials that were assessed by RR only, Dent et al22 found that rules that incorporated EPD were more likely to stop apparently ineffective drugs at the first stage of accrual. Conversely, after stage 2, the method of Zee et al19 suggested activity in two instances, whereas the method of Fleming16 did not. The disputed drugs were not additionally studied, so activity could not be confirmed. The study of Dent et al22 demonstrates the potential for more frequent early stopping of inactive drugs as well as the unconfirmed possibility that rules that incorporate EPD may be more sensitive to drug activity. Another potential benefit to the use of EPD, as suggested by Zee et al,18 is that its presence can be determined at first tumor measurement, even if responses can not be fully assessed; this information may allow rapid commencement of stage 2 of accrual after stage 1 is completed, which would allow avoidance of the delay induced by waiting for potential responses. Although the described method capably arrests the study of inactive drugs after stage 1, the "or" alternate hypothesis may best be applied to situations in which only modest drug activity is expected. Cytostatic agents or populations with poor rates of expected response (eg, pancreatic cancer) still may accrue benefit from nonprogressive disease, and low EPD rates may indicate drugs worthy of further consideration. Alternatively, drugs active against specific biologic subsets of disease (eg, trastuzumab) may demonstrate some responses but otherwise demonstrate little stable disease if assessed in unselected populations. In the above situations, the demands of an "and" alternate hypothesis may result in drug discard. Conversely, the ethics of window-of-opportunity studies demand that only the most active drugs continue to stage 2, which supports the use of an "and" hypothesis. The assessment of EPD in addition to response could shorten phase II studies through early stopping when the EPD rate is undesirably high. Although EPD may also increase the sensitivity of phase II studies to drug activity, this must be confirmed and will depend on the criteria of the alternate hypothesis. This article creates rules for an alternate hypothesis that allows a drug to be considered interesting if the criteria are met for either a sufficiently high RR or a sufficiently low rate of EPD. It is hoped that the use of such rules may improve drug development.
The author(s) indicated no potential conflicts of interest.
Conception and design: John R. Goffin, Dongsheng Tu Collection and assembly of data: John R. Goffin Data analysis and interpretation: John R. Goffin, Dongsheng Tu Manuscript writing: John R. Goffin, Dongsheng Tu Final approval of manuscript: John R. Goffin, Dongsheng Tu
Detailed Methods Section of Phase II Stopping Rules That Employ Response Rates and Early Progression. The simulation lets r be the true response rate of the agent under study and epd its true rate of early progressive disease. Assume that one can specify two pairs of parameters, (rnul, epdnul) and (ralt, epdalt), which would render a drug uninteresting for further development if r is less than or equal to rnul and epd is greater than or equal to epdnul and less interesting for further development if r is greater than or equal to ralt or epd is less than or equal to epdalt. That is, in a phase II trial, we are interested in the testing of the following hypotheses: the null r less than or equal to rnul and epd greater than or equal to epdnul versus alternate hypothesis: r greater than or equal to ralt or epd less than or equal to epdalt.
This article considers the following two-stage procedure for the testing of the above hypotheses: In the first stage, n1 are entered. If n1r and n1p are, respectively, the number of patients who responded and had early progression among these n1 patients, the trial would be stopped at this stage if n1r was less than or equal to n1r-nul and if n1p was less than or equal to n1p-nul, in which n1r-nul and n1p-nul are two thresholds determined during the design of the study. Otherwise, n2 additional patients are entered in the second stage of the study. Let n2r and n2p be, respectively, the number of patients who responded and had early progression among these n2 patients. The drug will be declared as interesting at the end of the second stage if n1r + n2r
Simulations were performed by using TreeAge Pro Healthcare software (Williamstown, MA) to determine thresholds (program available on request). For each simulation, the following parameters were prespecified: design parameters (rnul, epdnul) and (ralt, epdalt), desired power and
In the full-space method, simulations of the alternative hypothesis randomly select a value for either r or epd from within the range of interest. The choice of precedence of r or epd was assigned randomly on the basis of a frequency assessment of the overall potential space in which a drug would have a value of r greater than or equal to ralt and/or epd less than or equal to epdalt. Thus, either r greater than or equal to ralt or epd less than or equal to epdalt is generated. The remaining (nonprecedent) parameter then was selected randomly from the values that satisfy r + epd
For simulations of the null hypothesis, the same method was used, but both r and epd must satisfy r less than or equal to rnul and epd greater than or equal to epdnul.
The borderline-value method assumed that extremely desirable values for r and epd were unlikely and that inclusion of such values in assessed populations would under-power a study to detect drugs with borderline r or epd characteristics. In the borderline-value method, cohorts for the assessment of the alternate hypothesis were generated by random assignment of r = ralt or epd = epdalt. The nonprecedent parameter then was randomly assigned a value within its noninteresting range (ie, r < ralt or epd > epdalt), such that r + epd
Successive patient thresholds for nr-alt and np-alt and for nr-nul and np-nul were run through 106 simulations of randomly generated r and epd values (by using either the full-space or borderline-value method) to determine the power or
More than one pair could be taken for the After the thresholds were determined for stages 1 and 2, cohorts were run through a two-stage trial by using 106 simulations.
Supported by a grant from the Amgen Career Development Award (J.R.G.). Presented in part at the 43rd Annual Meeting of the American Society of Clinical Oncology, June 1-5, 2007, Chicago, IL. Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
1. DiMasi JA, Hansen RW, Grabowski HG: The price of innovation: New estimates of drug development costs. J Health Econ 22:151-185, 2003[CrossRef][Medline] 2. Booth B, Glassman R, Ma P: Oncology's trials. Nat Rev Drug Discov 2:609-610, 2003[CrossRef][Medline] 3. Goffin J, Baral S, Tu D, et al: Objective responses in patients with malignant melanoma or renal cell cancer in early clinical studies do not predict regulatory approval. Clin Cancer Res 11:5928-5934, 2005 4. DiMasi JA, Grabowski HG: Economics of new oncology drug development. J Clin Oncol 25:209-216, 2007 5. Buyse M, Thirion P, Carlson RW, et al: Relation between tumour response to first-line chemotherapy and survival in advanced colorectal cancer: A meta-analysis—Meta-Analysis Group in Cancer. Lancet 356:373-378, 2000[CrossRef][Medline] 6. Graf W, Pahlman L, Bergstrom R, et al: The relationship between an objective response to chemotherapy and survival in advanced colorectal cancer. Br J Cancer 70:559-563, 1994[Medline] 7. Markman M: Why does a higher response rate to chemotherapy correlate poorly with improved survival? J Cancer Res Clin Oncol 119:700-701, 1993[Medline] 8. Paesmans M, Sculier JP, Libert P, et al: Response to chemotherapy has predictive value for further survival of patients with advanced non–small-cell lung cancer: 10 years experience of the European Lung Cancer Working Party. Eur J Cancer 33:2326-2332, 1997[CrossRef][Medline] 9. Burris HA III, Moore MJ, Andersen J, et al: Improvements in survival and clinical benefit with gemcitabine as first-line therapy for patients with advanced pancreas cancer: A randomized trial. J Clin Oncol 15:2403-2413, 1997 10. Shepherd FA, Dancey J, Ramlau R, et al: Prospective randomized trial of docetaxel versus best supportive care in patients with non–small-cell lung cancer previously treated with platinum-based chemotherapy. J Clin Oncol 18:2095-2103, 2000 11. Cesano A, Lane SR, Poulin R, et al: Stabilization of disease as a useful predictor of survival following second-line chemotherapy in small-cell lung cancer and ovarian cancer patients. Int J Oncol 15:1233-1238, 1999[Medline] 12. Howell A, Mackintosh J, Jones M, et al: The definition of the no change' category in patients treated with endocrine therapy and chemotherapy for advanced carcinoma of the breast. Eur J Cancer Clin Oncol 24:1567-1572, 1988[CrossRef][Medline] 13. Murray N, Coppin C, Coldman A, et al: Drug delivery analysis of the Canadian multicenter trial in non–small-cell lung cancer. J Clin Oncol 12:2333-2339, 1994 14. Rapp E, Pater JL, Willan A, et al: Chemotherapy can prolong survival in patients with advanced non–small-cell lung cancer: Report of a Canadian multicenter randomized trial. J Clin Oncol 6:633-641, 1988[Abstract] 15. Roberts TG Jr, Lynch TJ Jr, Chabner BA: The phase III trial in the era of targeted therapy: Unraveling the "go or no go" decision. J Clin Oncol 21:3683-3695, 2003 16. Fleming TR: One-sample multiple testing procedure for phase II clinical trials. Biometrics 38:143-151, 1982[CrossRef][Medline] 17. Gehan EA: The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent. J Chronic Dis 13:346-353, 1961[CrossRef][Medline] 18. Simon R: Optimal two-stage designs for phase II clinical trials. Control Clin Trials 10:1-10, 1989[Medline] 19. Zee B, Melnychuk D, Dancey J, et al: Multinomial phase II cancer trials incorporating response and early progression. J Biopharm Stat 9:351-363, 1999[CrossRef][Medline] 20. Freidlin B, Dancey J, Korn EL, et al: Multinomial phase II trial designs. J Clin Oncol 20:599, 2002 21. Chang MN, Devidas M, Anderson J: One- and two-stage designs for phase II window studies. Stat Med 26:2604-2614, 2007[CrossRef][Medline] 22. Dent S, Zee B, Dancey J, et al: Application of a new multinomial phase II stopping rule using response and early progression. J Clin Oncol 19:785-791, 2001 Submitted August 21, 2007; accepted January 3, 2008.
Related Correspondence
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2008 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|