|
|||||
|
|
||||||
Journal of Clinical Oncology, Vol 23, No 15 (May 20), 2005: pp. 3526-3535 © 2005 American Society of Clinical Oncology. DOI: 10.1200/JCO.2005.00.695 Molecular Staging for Survival Prediction of Colorectal Cancer PatientsFrom the Departments of Surgery, Pathology, and Biostatistics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL; The Institute for Genomic Research, Rockville; Department of Statistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD; Department of Biochemistry, George Washington University, Washington, DC; The Molecular Diagnostic Laboratory, Department of Clinical Biochemistry, Aarhus University Hospital, Skejby, Denmark; and Department of Medical Genetics, Biomedicum, Helsinki, Finland Address reprint requests to Timothy J. Yeatman, MD, H. Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Dr, SRB-2, Tampa, FL 33612; email: yeatman{at}moffitt.usf.edu
PURPOSE: The Dukes' staging system is the gold standard for predicting colorectal cancer prognosis; however, accurate classification of intermediate-stage cases is problematic. We hypothesized that molecular fingerprints could provide more accurate staging and potentially assist in directing adjuvant therapy. METHODS: A 32,000 cDNA microarray was used to evaluate 78 human colon cancer specimens, and these results were correlated with survival. Molecular classifiers were produced to predict outcome. RESULTS: Molecular staging, based on 43 core genes, was 90% accurate (93% sensitivity, 84% specificity) in predicting 36-month overall survival in 78 patients. This result was significantly better than Dukes' staging (P = .03878), discriminated patients into significantly different groups by survival time (P < .001, log-rank test), and was significantly different from chance (P < .001, 1,000 permutations). Furthermore, the classifier was able to discriminate a survival difference in an independent test set from Denmark. Molecular staging identifies patient prognosis (as represented by 36-month survival) more accurately than the traditional clinical staging, particularly for intermediate Dukes' stage B and C patients. The classifier was based on a core set of 43 genes, including osteopontin and neuregulin, which have biologic significance for this disease. CONCLUSION: These data support further evaluation of molecular staging to discriminate good from poor prognosis patients, with the potential to direct adjuvant therapy.
Colorectal cancer staging is currently based solely on simple clinicopathologic features such as bowel wall penetration and lymph node metastasis. Unfortunately, clinical staging systems often fail to discriminate the biologic behavior of a large number of tumors, resulting in the systematic overtreatment or undertreatment of patients with adjuvant therapies. Devised more than 70 years ago,1 the now modified Dukes' staging system provides adequate prognostic information for patients staged as A or D. However, the intermediate stages of B and C are not extremely useful in discriminating good from poor prognosis patients. Additionally, application of this staging system results in the potential overtreatment or undertreatment of a significant number of patients, and it can only be applied after complete surgical resection rather than after a presurgical biopsy. Recently developed microarray technology has permitted the development of multiorgan cancer classifiers,2,3 identification of tumor subclasses,4-6 discovery of progression markers,7,8 and prediction of disease outcome in many types of cancer.9-11 Unlike clinicopathologic staging, molecular staging has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures that, at the time of diagnosis, can direct the biologic behavior of the tumor over time.12
The human investigations were performed after approval by the University of South Florida Institutional Review Board and in accord with an assurance filed with and approved by the Department of Health and Human Services. A waiver of informed consent was filed.
Tumor Samples (Moffitt Cancer Center) Samples were microdissected (> 80% tumor cells) by frozen section guidance, and RNA was extracted using Trizol reagent (Invitrogen, Carlsbad, CA) followed by secondary purification on RNAEasy columns. The samples were profiled on the Insitute for Genomic Research's (TIGR) 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts (23,936 unique TIGR tentative consensus [TC] and 6,913 expressed sequence tags [EST], 10 exogenous controls printed 36 times, and four negative controls printed 36 to 72 times).
Significance Analysis of Microarrays Survival Analysis
Classifier Construction and Evaluation The molecular classifier was composed of the following two distinct steps: gene selection using a t test and classification using a neural network. Both steps were taken after the test sample was left out (from the LOOCV) to avoid bias from the gene selection step. We used the top 50 genes as ranked by absolute value of the t statistic using a t test, for each cross-validation step. A feed-forward back-propagation neural network16 with a single hidden layer of 10 units, a learning rate of 0.05, and a momentum of 0.2 was constructed. Training occurred for a maximum of 500 epochs or until a zero misclassification error was achieved on the training set. Our experiences indicate that neural networks are extremely robust to both the number of genes selected and the level of noise in these genes. We have successfully used this classifier in earlier multiplatform gene expression classification experiments.3
Statistical Significance
Identification of Prognosis-Related Genes We used SAM survival analysis to identify a set of genes most correlated with censored survival time. A set of 53 genes was found, corresponding to a median expected false discovery rate of 28%. These genes are listed in Table 1 and include several genes that we believe to be biologically significant, such as osteopontin and neuregulin (see Discussion). Figure 1A presents a graphical representation of these 53 SAM-selected genes as a clustered heat map. The figure uses only the Dukes' stage B and C patients, whose outcome Dukes' staging predicts poorly (Fig 1D). Because we clustered using only genes correlated with survival, the clusters should correspond to different prognosis groups. The SAM-selected genes are also arranged by annotated Dukes' stage (B and C stages only) in Figure 1B, which shows little discrimination based on gene expression.
Figure 1C shows the Kaplan-Meier plot for the two dominant gene expression clusters of stage B and C samples. Clearly, these 53 genes separate the patients into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1; P < .001, log-rank test) as expected because the genes were selected using SAM. Figure 1D presents a Kaplan-Meier plot of the survival times of patients with Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference and demonstrating the weak potential for discrimination of these patients by Dukes' staging. Here, we demonstrate that gene expression profiles can separate good and poor prognosis patients better than Dukes' staging. This suggests that a gene expressionbased classifier could be more accurate at predicting patient prognosis than the traditional Dukes' staging.
Performance of a 43-Gene cDNA-Based Colorectal Cancer Survival Classifier
Evaluation of an Independent Test Set From Denmark To further validate our classifier, we identified an independent Danish colon cancer data set comprised of Dukes' stage B and C patients and produced on a U133A Affymetrix GeneChip (Affymetrix, Santa Clara, CA) oligonucleotide-based platform. Because this data set was produced on an oligonucleotide platform instead of our cDNA platform, we first translated our gene signature into available Affymetrix probe sets using the Resourcerer program (TIGR, Rockville, MD; www.TIGR.org). This translation reduced the gene signature from 43 genes to only 26 unique genes. Therefore, we limited the original cDNA classifier to only genes represented on the U133A platform. Using this restricted gene signature derived solely from the Moffitt data set, we found 60 corresponding probe sets on the Affymetrix U133A GeneChip and used these genes to evaluate the survivorship of 95 Danish patients. With this approach, hierarchical clustering was used to find the most significant groups in the data. The survival of these groups was then displayed using Kaplan-Meier survival analysis. Figure 2A shows that the 26 selected genes were able to discriminate good from poor prognosis patients, despite the restrictions imposed by cross-platform analyses. Dukes' staging was incapable of discriminating survivorship in these same patients (Fig 2B). When applied to the Dukes' stage B and C patients separately, the 26 gene signature was capable of discriminating good from poor prognosis subpopulations within each stage (Fig 2C).
The benefit of adjuvant chemotherapy for colorectal cancer seems limited to patients with Dukes' stage C disease, where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathologic Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, it is not very accurate in predicting overall survival, and thus, its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are probably a number of patients who might benefit from therapy who do not receive it. Molecular staging may provide more accurate predictions of patient outcome than is currently possible with clinical staging, which may, in fact, misclassify patients. Using a SAM-selected set of genes derived from a genome-wide analysis of gene expression, we were first able to cluster groups of patients with good and bad prognoses, suggesting that outcome-rich information was likely present in this gene expression data set. Subsequently, a supervised learning analysis identified a core set of 43 informative genes that appeared in 75% of the cross-validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients. Although Dukes' staging works well for very good and very poor prognosis patients (Dukes' stage A and D), currently it is not very informative when predicting long-term outcomes of intermediate prognosis patients (Dukes' stage B and C), yet it is the primary means for determining the administration of potentially toxic adjuvant chemotherapy. We hypothesized that molecular staging might be able to identify those Dukes' stage B and C patients for whom chemotherapy might be beneficial. The production of a cDNA classifier for survival is a first step in the validation process for molecular staging.
With our approach, we were able to determine which genes seemed to be most useful in the classifier based on their frequency of appearance in the classification set. Of these genes, at least two, osteopontin and neuregulin, have reported biologic significance in the context of colorectal cancer. Osteopontin, a secreted glycoprotein17 and ligand for CD44 and Although cross-platform analysis is challenging because of the paucity of available human data sets, the discrepancies in probes used on competitive platforms, and the differential performance characteristics of these probes, we demonstrated that the genes selected for our cDNA-based classifier were effective in discriminating good from poor prognosis patients using a completely independent data set produced on a Danish population using an oligonucleotide platform (Affymetrix, U133A). These data provide confirmation, under the most rigorous conditions, that there is prognostic value for the identified gene signature. Of interest is a recent report of a gene expression profile, using the U133A Affymetrix platform, to predict recurrence for Dukes' stage B patients.20 The reported 23-gene signature with a performance accuracy of 78% shares no genes in common with the cDNA classifier we have produced. The absence of concordant genes could be related to many different issues, including differences in the microarray platform, the samples selected for analysis, and the analytic tools used to generate the gene signatures, suggesting the need for extensive validation of any promising signature before clinical implementation of any gene expression signature. We have produced the first colorectal cancer molecular staging classifier, based on the analysis of all colon cancer stages, for which accuracy exceeds that of Dukes' staging when used to estimate prognosis on the same patients. These results suggest that a molecular-based method for the discrimination of outcome for intermediate Dukes' stages B and C, where prognosis is currently problematic, may be effective. Our classifier is based on a core set of 43 genes that seem to have biologic significance for human colorectal cancer progression. The data provided support more extensive validation of this prognostic gene signature.
The authors indicated no potential conflicts of interest.
Supported by grants from the National Cancer Institute, Bethesda, MD; also supported by the Biostatistics Core and the Microarray Core at H. Lee Moffitt Cancer Center and services provided by the Research Computing Core, University of South Florida. Authors' disclosures of potential conflicts of interest are found at the end of this article.
1. Dukes C: The classification of cancer in the rectum. J Pathol Bacteriol 35 : 323 , 1932[CrossRef]
2. Ramaswamy S, Tamayo P, Rifkin R, et al: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98
: 15149
-15154, 2001
3. Bloom G, Yang IV, Boulware D, et al: Multi-platform, multi-site microarray based human tumor classification. Am J Pathol 164
: 9
-16, 2004
4. Bhattacharjee A, Richards WG, Staunton J, et al: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 98
: 13790
-13795, 2001 5. Khan J, Wei JS, Ringner M, et al: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7 : 673 -679, 2001[CrossRef][Medline]
6. Sorlie T, Tibshirani R, Parker J, et al: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100
: 8418
-8423, 2003
7. Agrawal D, Chen T, Irby R, et al: Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst 94
: 513
-521, 2002
8. Sanchez-Carbayo M, Socci ND, Lozano JJ, et al: Gene discovery in bladder cancer progression using cDNA microarrays. Am J Pathol 163
: 505
-516, 2003 9. Shipp MA, Ross KN, Tamayo P, et al: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8 : 68 -74, 2002[CrossRef][Medline] 10. van 't Veer LJ, Dai H, van de Vijver MJ, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 : 530 -536, 2002[CrossRef][Medline]
11. van de Vijver MJ, He YD, van't Veer LJ, et al: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347
: 1999
-2009, 2002 12. Ramaswamy S, Ross KN, Lander ES, et al: A molecular signature of metastasis in primary solid tumors. Nat Genet 33 : 49 -54, 2003[CrossRef][Medline]
13. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98
: 5116
-5121, 2001
14. de Hoon MJ, Imoto S, Nolan J, et al: Open source clustering software. Bioinformatics 20
: 1453
-1454, 2004 15. Dyrskjot L, Thykjaer T, Kruhoffer M, et al: Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33 : 90 -96, 2003[CrossRef][Medline] 16. Fahlman SE: Faster-Learning Variations on Back-Propagation: An Empirical Study, Proceedings of the 1988 Connectionist Models Summer School. Los Altos, CA, Morgan-Kaufmann, 1988
17. Fedarko NS, Jain A, Karadag A, et al: Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer. Clin Cancer Res 7
: 4060
-4066, 2001 18. Yeatman TJ, Chambers AF: Osteopontin and colon cancer progression. Clin Exp Metastasis 20 : 85 -90, 2003[CrossRef][Medline] 19. Carraway KL III, Weber JL, Unger MJ, et al: Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases. Nature 387 : 512 -516, 1997[CrossRef][Medline]
20. Wang Y, Jatkoe T, Zhang Y, et al: Gene expression profiles and molecular markers to predict recurrence of Dukes' B colon cancer. J Clin Oncol 22
: 1564
-1571, 2004 Submitted August 31, 2004; accepted December 13, 2004.
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2005 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|