|
|||||
|
|
||||||
Originally published as JCO Early Release 10.1200/JCO.2004.03.950 on May 10 2004 © 2004 American Society of Clinical Oncology.
Trawling for Genes That Predict Response to Breast Cancer Adjuvant Therapy1 Siteman Cancer Center, Washington University, St Louis, MO
2 Mayo Clinic Cancer Center, Mayo Clinic College of Medicine, Rochester, MN Randomized clinical trials often set the standard of care for the treatment of early-stage breast cancer. However, the low event rates in these studies often require large sample sizes and long-term follow-up to demonstrate significant improvements in outcomes. This has translated into a glacial timeline for progress that can frustrate both patients and physicians. As progress is made toward the goal of a cure for all patients, the stepwise approach of comparing standard therapy with standard therapy plus something new has generated trials involving large populations, with the possibility of massive overtreatment and ballooning health care costs. How can we focus on the decreasing minority of women who are not cured with standard multimodality breast cancer therapy, while not imposing on the majority who are? The key to this problem lies in the development of tools to individualize treatment so that treatment plans can be carefully matched to both the risk of relapse and the responsiveness of the tumor. To achieve this goal, we need effective biomarkers to replace the old standards of stage, grade, and hormone receptor status. Genomic profiling, therefore, is central to our current vision of individually tailored therapy; confidence is rising that we will be able to track down the constellations of genes that orchestrate patient fate.1 Recent gene expression profiling studies have clearly shown that we can differentiate between tumors with limited potential for systemic spread and those that are disconcertingly metastatic even at a very early stage.2 The goal for new prognostic tools is a triage strategytreating according to risk to produce the same overall outcome but with fewer patients receiving unnecessary therapy. While avoiding toxicity and expense is a laudable goal, predicting outcome without systemic intervention (prognosis) will not greatly increase the overall number of patients cured of their breast cancer. For this, we must understand the biology underlying responsiveness to systemic therapy and develop new and effective treatments through predictive strategies. The article by Ayers et al3 in this issue of the Journal of Clinical Oncology provides an opportunity to discuss progress in this area and to consider ways to improve our approaches to clinical trial designs that focus on predictive biomarker analysis. The goal of this study was to examine the feasibility of developing a multigene predictor of pathologic complete response (pCR) to sequential weekly paclitaxel, fluorouracil, doxorubicin, and cyclophosphamide (T/FAC) neoadjuvant chemotherapy for breast cancer. The specimens were collected from 42 patients. The first 24 samples were used to develop a multigene predictor for pCR, and the remaining 19 were used to validate the results of the predictor. The model successfully identified only three of seven instances of pCR, based on the expression pattern of the primary tumor before treatment. The study therefore fell short of presenting a definitive result, concluding that transcriptional profiling has "the potential to identify a gene expression pattern in breast that may lead to clinically useful predictors of pCR to T/FAC neoadjuvant therapy." Is this really the case, or are there formidable barriers that we must resolve before any real progress can be made? Unquestionably, the article serves to illustrate the scientific advantages of treating patients with systemic agents while the primary tumor is intact so that antitumor efficacy can be monitored and the molecular and cellular basis for response can be carefully investigated. pCR as an end point has been validated by large randomized trials that have shown a statistically robust relationship between the ability to eradicate the primary breast cancer and the risk of relapse and death from the disease.4 However, pCR is not a perfect surrogate for cure, as patients in this category are still at substantial risk for metastatic disease, albeit with a reduced frequency. Perhaps the most significant advantage for primary systemic therapy, aside from improvements in surgical outcomes, is the efficiency of this approach as a drug development strategy. There are an increasing number of small randomized primary systemic therapy trials of both endocrine5 and chemotherapy agents6 that nicely parallel the final outcome of adjuvant studies, foretelling likely successful strategies years before the final answer. However comparing outcomes between groups of patients treated in a uniform way is not the same as predicting outcome for a single individual. The challenge of forecasting individual outcomes is the focus of the article by Ayers et al, who highlight an important new objective for primary systemic therapy trialsto search gene expression profiles for predictive biomarkers for systemic therapy. The methodological approaches necessary for a successful discovery effort are still in development, and the article nicely highlights many unresolved issues. From a bioinformatics viewpoint, the article is a trawling exercise to "catch" a manageable number of genes for further examination as chemotherapy-response genes. This exercise is an inherently difficult problem because the number of expressed mRNA transcripts far exceeds the number of samples. This is like trawling for halibut in a sea teeming with thousands of species of fish. The fisherman's approach is to repeatedly sample different areas of the sea to find out where the halibut like to live. There is no doubt that a statistical model would increase the catch considerably, and there is a similar need for a robust statistical approach in the analysis of gene expression data. The key issues include knowing how much sampling to do (to know where the halibut are) and confirming that we have the genes of interest (because unlike halibut fishermen, we are not really sure what types of genes we are looking for). Before discussing bioinformatics further, it is worth considering some controversial points regarding trial design for such investigations. From the viewpoint of tailored therapy, predicting response to individual agents is more appealing than predicting the outcome of multiagent treatment, some of which an individual patient may not have needed. A cursory look at the list of genes in this article in comparison with other studies of this type7 underscores the potential for genes to be selectively involved in response to different agents. To assess this, it might be worth exploring sequential single-agent therapy.8 Response could be assessed by an imaging tool, such as magnetic resonance imaging, between regimens.9 Another point of emphasis is that the study generates a biomarker profile that predicts an imperfect intermediate surrogate end point (pCR), not a true clinical end point (relapse and death)so the focus on pCR as the validation end point is misplacedsuch that the final value of the exercise will be revealed only when the gene sets are tested for their predictive abilities in the adjuvant setting. The study also highlights the thorny problem of response definition. Response is a continuous variable that we dichotomize into complete remission, partial response (PR), stable disease (SD), and progressive disease (PD) for ease of analysis. The trouble is that the underlying biology does not fit neatly within these rules. After all, the mechanisms underlying pCR must be similar to those underlying very good PR; yet in the Ayers study, PR is classified the same as nonresponse. This may lead to misclassification because, for example, a large cancer may be just as responsive as a small one, but simply may require more cycles of therapy to achieve pCR. A significant level of misclassification of this type will render the final model useless. One of the rationales for choosing pCR as a study end point is that it is the extreme end of the response spectrum. Logically, one might get a more informative set of genes if one also went to the other end of the response spectrum and compared the profile of tumors that underwent pCR with tumors exhibiting SD or PD. A focus on extreme phenotype comparison is similar to approaches in gene profiling for prognostic purposes in which selected groups of patients who never relapse are compared with a group of patients who relapse and die within a given time frame. A final caveat is that clinical trials enter patients with a broad spectrum of breast cancer subtypes. In the Ayers et al study, there is concern that genes that predict T/FAC efficacy for one molecular subtype (for example, positive estrogen receptor) do not predict outcomes for other types because the gene in question is expressed in a cell lineage-dependent manner. From the bioinformatics standpoint, the article takes two quite distinct approaches to analysis. The first was to search for subtypes of tumors that are related through similarities in gene expression. This is termed "unsupervised analysis," which aims to identify tumors that are closely related through the patterns of genes that they express. Response rates were then compared between tumor subtypes. The problem with unsupervised analysis for response prediction is that if differences are determined by a relatively small number of genes, these genes will not have much influence on global measurements of similarity. The result is likely to be cluster identification with little or no association with response status. This was the case for the hierarchical clustering approach taken by Ayers et al. The authors correctly concluded that the lack of association with pCR was not because there are no genes that can be used to predict response, but rather because unsupervised cluster analysis techniques are not the right tools for developing small sets for multigene predictors. A supervised analysis, or class prediction, was the next statistical approach used by the authors. Typically, binary external information, like alive and relapse-free versus dead from disease, is used to identify genes that correlate with this distinction. In an initial group of 24 patients (the training set), profiles were compared between tumors that achieved a pCR and those that did not. This produced a list of genes that, as a set, could distinguish between the two tumor groups. This list formed the basis for a predictive model that was tested for accuracy on the next 18 prospectively accrued patients (the validation set). Although the merits of the methodology used by Ayers et al could be debated, the approach used to develop their predictor is reasonable. However, it has been found that in general, simpler prediction algorithms perform as well as, or better than, more complex algorithms.10 Furthermore, it is always prudent to compare the effectiveness of a new, complex algorithm with other methods (on the same data set) before adopting it. The most important requirement of a predictor is its performance or accuracy. Applying the multigene predictor created from a training set to an independent and representative validation set is the best way to assess performance. Adequately assessing the performance of predictors developed from gene expression microarray data is particularly crucial because of the real danger of overfitting the training data (netting random species of irrelevant fish). Use of an independent validation set is a way to protect against this: a predictor composed of genes biologically relevant to the outcome will have high accuracy, whereas predictors that contain genes that pertain to peculiarities specific to the training set will have lower accuracy. Ayers et al correctly used an independent validation set, but the amount of information it provided was scanty. Specifically, the overall accuracy measured for the predictor was 78% (14 of 18). The (exact two-sided binomial) 95% CI ranges from 52% to 94%; hence, the true accuracy could be considerably lower than 78%. Furthermore, the goal of the study was to predict pCR, implying that the error rate among pCR samples is likely of greater interest. Three (43%) of the seven pCR tumors in the validation set were correctly identified, yielding a 95% CI for the accuracy in this group ranging from 10% to 82%. Thus, for pCR tumors, there is no evidence that the predictor is better than guessing. Although the authors state that the predictor seems to be promising, the small size of the independent validation set provides little information to support its usefulness. We would not want to end on a negative note. The Ayers et al study should be viewed as an instructive pilot effort. The most obvious improvement is to increase the sample size for both for the training set and for the validation set. Simon et al propose that for determining whether genes are differentially expressed between two groups, one should have at least 25 patients in each group.11 Given that prediction is a considerably more complex problem, this is at the extreme low end of what would be acceptable for the future. In addition, the model should ultimately be validated by the prediction of treatment failure in the adjuvant setting. Predicting pCR is only an intermediate step. The transition from preliminary models to predicting end points in adjuvant studies has recently been made much more straightforward through the development of techniques to reliably examine mRNA expression starting with archival formalin-fixed paraffin embedded tissue. With the recent introduction of a commercial test that measures 16 genes and five controls from standard pathology blocks to estimate the risk of relapse for node-negative patients who receive tamoxifen, one believes that tools are now in place to make tailored treatment planning a reality.12 However, it will be a huge undertaking to transform clinical practice with these technologies, and a rigorous and critical approach will be essential. After all, owning a trawler is no guarantee that you will make a profit catching fish. Authors' Disclosures of Potential Conflicts of Interest The authors indicated no potential conflicts of interest. Acknowledgment We thank Hope Rugo, Larry Norton, Hyman Muss, and Charles Perou for stimulating discussion and review of our commentary. REFERENCES
1. Sorlie T, Tibshirani R, Parker J, et al: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100:84188423, 2003 2. van't Veer LJ, Dai H, van de Vijver MJ, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530536, 2002[CrossRef][Medline]
3. Ayers M, Symmans WF, Stec J, et al: Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol 22:22842293, 2004 4. Fisher B, Bryant J, Wolmark N, et al: Effect of preoperative chemotherapy on the outcome of women with operable breast cancer. J Clin Oncol 16:26722685, 1998[Abstract]
5. Ellis MJ, Coop A, Singh B, et al: Letrozole is more effective neoadjuvant endocrine therapy than tamoxifen for ErbB-1- and/or ErbB-2-positive, estrogen receptor-positive primary breast cancer: Evidence from a phase III randomized trial. J Clin Oncol 19:38083816, 2001
6. Smith IC, Heys SD, Hutcheon AW, et al: Neoadjuvant chemotherapy in breast cancer: Significantly enhanced response with docetaxel. J Clin Oncol 20:14561466, 2002 7. Chang JC, Wooten EC, Tsimelzon A, et al: Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362369, 2003[CrossRef][Medline]
8. Stearns V, Singh B, Tsangaris T, et al: A prospective randomized pilot study to evaluate predictors of response in serial core biopsies to single agent neoadjuvant doxorubicin or paclitaxel for patients with locally advanced breast cancer. Clin Cancer Res 9:124133, 2003 9. Wasser K, Klein SK, Fink C, et al: Evaluation of neoadjuvant chemotherapeutic response of breast cancer using dynamic MRI with high temporal resolution. Eur Radiol 13:8087, 2003[Medline] 10. Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:7787, 2002[CrossRef] 11. Simon R, Radmacher MD, Dobbin K: Design of studies using DNA microarrays. Genet Epidemiol 23:2136, 2002[CrossRef][Medline] 12. Paik S, Shak S, Tang G, et al: Multi-gene RT-PCR assay for predicting recurrence in node negative breast cancer patients: NSABP studies B-20 and B-14. Breast Cancer Res Treat 82:S10, 2003 (suppl 1, abstr 16)
Related Article
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2004 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|