Advertisement
Journal of Clinical Oncology  
Search for:
Limit by:
  Browse by Subject or Issue
Home Search or Browse JCO My JCO Subscriptions Customer Service Site Map

Journal of Clinical Oncology, Vol 26, No 6 (February 20), 2008: pp. 877-883
© 2008 American Society of Clinical Oncology.
DOI: 10.1200/JCO.2007.13.1516

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sun, Z.
Right arrow Articles by Yang, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sun, Z.
Right arrow Articles by Yang, P.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Non-Overlapping and Non–Cell-Type–Specific Gene Expression Signatures Predict Lung Cancer Survival

Zhifu Sun, Dennis A. Wigle, Ping Yang

From the Department of Health Sciences Research and Division of General Thoracic Surgery, College of Medicine, Mayo Clinic, Rochester, MN

Corresponding author: Zhifu Sun, MD, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905; e-mail: sun.zhifu{at}mayo.edu


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Purpose Gene expression profiling for outcome prediction of non–small-cell lung cancer (NSCLC) remains clouded by heterogeneous and unvalidated results. This study applied multivariate approaches to identify and evaluate value-added gene expression signatures in two types of NSCLC.

Materials and Methods Two NSCLC oligonucleotide microarray data sets of adenocarcinoma and squamous cell carcinoma were used as training sets to select prognostic genes independent of conventional predictors. The top 50 genes from each set were used to predict the outcomes of two independent validation data sets of 84 and 91 NSCLC cases.

Results Adenocarcinomas with the 50-gene signature from adenocarcinoma in both validation data sets had a 2.4-fold (95% CI, 1.3 to 4.4 and 1.0 to 5.8) increased mortality after adjustment for conventional predictors. Squamous cell carcinoma with this high-risk signature had an adjusted risk of 1.1 (95% CI, 0.4 to 3.2) in one data set and 2.5 (95% CI, 1.1 to 5.8) in another consisting of stage I tumors. Adenocarcinoma with the 50-gene signature from squamous cell carcinoma had an elevated risk of 3.5 (95% CI, 1.4 to 9.0) after adjustment for conventional predictors. Squamous cell carcinoma with this high risk signature had an adjusted risk of 1.8 (95% CI, 0.7 to 4.6). Despite the little overlap in individual genes, the two gene signatures had significant functional connectedness in molecular pathways.

Conclusion Two non-overlapping but functionally related gene expression signatures provide consistently improved survival prediction for NSCLC regardless of histologic cell type. Multiple sets of genes may exist for NSCLC with predictive value, but ones with independent predictive value beyond clinical predictors will be required for clinical translation.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Gene expression profiling and correlation with clinical outcome have been studied extensively in non–small-cell lung cancer (NSCLC), ranging from tumor recurrence potential after treatment1,2 to metastatic status prediction3-5 to chemotherapy treatment responses6 to disease-free or overall survival.2,7-12 Although a number of these findings are promising for clinical translation, issues remain because consensus predictive gene signatures across studies are rare, and many signature-based outcome predictions have not been replicated by independent studies. Multiple confounding factors and analytic issues13 have contributed to this problem. Most importantly, many available gene signatures have been selected from univariate associations with survival, and their added clinical value is of limited benefit when conventional predictors are considered.14,15

For NSCLC, TNM stage, age, sex, and histologic cell type (particularly bronchoalveolar carcinoma) are well-established prognostic factors. However, these factors have reached their limit in the prognostic information they provide, and do not explain the large outcome variation among patients with similar characteristics. A critical clinical question remains: Given the status of known predictive variables, can we further subclassify individual NSCLC patient populations for optimized treatment using gene expression biomarkers? To answer this question, we conducted a study using several large-scale microarray data sets that are adequate for model-based multivariate analysis. We first selected survival related gene signatures by adjusting for conventional predictors using two training data sets of adenocarcinoma and squamous cell carcinoma, and then evaluated the predictive ability in multiple independent sets of NSCLC patients. The performance and improved prediction of these signatures were assessed along with conventional predictors by multivariate models and time-dependent receiver operating characteristic (ROC) curves.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Data Sources
Training data set 1. Gene expression and clinical data for 86 cases of primary lung adenocarcinoma were obtained from Beer et al.7 The microarray experiment was conducted using the Affymetrix HU6800 (HuGeneFL; Santa Clara, CA) chip, and the data were preprocessed by a trimmed mean algorithm.7

Training data set 2. This data set has 129 cases of lung squamous cell carcinoma hybridized on the Affymetrix U133A chip.8 Data were retrieved from the Gene Expression Omnibus (GSE4573 [NCBI GEO] ; http://www.ncbi.nlm.nih.gov/geo/) and preprocessed by Affymetrix MAS software.

Validation data set 1. Two hundred three samples of primary lung cancer were profiled using the Affymetrix HG-U95Av2 chip.16 Expression data was preprocessed by Affymetrix GeneChip software. From this study, 84 adenocarcinoma cases with complete clinical information and good chip quality were used.

Validation data set 2. This data set consists of 45 adenocarcinoma and 46 squamous cell carcinoma cases. Raw microarray data (Affymetrix HG-U133Plus2) was preprocessed by the RMA algorithm implemented in the R statistical package.17

Patient clinical characteristics and enrollment criteria for the four data sets are summarized in Table A1 (online only).

Data Analysis

Selection of Two Prognostic Gene Signatures
Adenocarcinoma. Gene expression data with 4,966 probe sets for 86 adenocarcinoma samples from training data set 1 were first merged with the clinical data. A Cox proportional hazards model was applied to evaluate each gene's independent association with survival after adjusting for patient age, sex, tumor stage, and tumor subtype (bronchoalveolar carcinoma v other types of adenocarcinoma). An adjusted P value of .01 was set to be the level of significance. To increase the reliability of these selected genes, a bootstrap of 1,000 times was further conducted for each gene and the number of times with P < .01 was ranked. The optimal number of genes used for constructing a prediction signature was evaluated by varying numbers of genes from the list with reference to their prediction error. The top rank genes set by the optimal number with the lowest prediction error were chosen as the prediction signature (Fig A1, online only).

Squamous cell carcinoma. From training data set 2, we first selected 12,990 probe sets with high variation and expression across all samples using a standard deviation greater than 0.5 and an expression value greater than 5 on the log2 scale among 50% of the samples. The same procedures as described for adenocarcinoma were carried out to select the gene signature for squamous cell carcinoma (Fig A2).


Figure 5
View larger version (8K):
[in this window]
[in a new window]
[PowerPoint Slide for Teaching]
 
Fig A2. Prediction performance by a number of genes in training data set 2.

 
Evaluation of Two Gene Signatures in Independent Samples
To evaluate the predictive value of the selected gene signatures from the training data sets, we first mapped the signature genes in the validation data sets by either array comparison files or Locus ID (Table A2, online only). Then we calculated the risk index for each case in the validation data sets on the basis of a linear combination of the gene expression values on the log2 scale weighted by their estimated regression coefficients from the Cox models. Patients were classified into a low- or high-risk group based on a predefined percentile of the risk scores. We evaluated the performance of different cutoff points from the 50th to 70th percentile in validation data set 1 (Table A3, online only) and found that the 60th or 65th percentile provided the best prediction. For consistency and easy comparison with the results in the literature,7 the 60th percentile was selected for all validations. The Cox proportional hazards model was applied to assess the independent value of the risk prediction along with conventional predictors of age, sex, stage, and cell type. To estimate improved prediction from the gene signature, the time-dependent ROC curves were generated and the area under the curve (AUC as measured by C index) was calculated over a 5-year follow-up period.18 Figure 1 illustrates the process flow of the study. All analyses were conducted using the SAS v9 package (SAS Institute, Cary, NC) or R statistical packages (www.r-project.org).


Figure 1
View larger version (42K):
[in this window]
[in a new window]
[PowerPoint Slide for Teaching]
 
Fig 1. Graphic view of prognostic gene selection and validation. AD, adenocarcinoma; SQC, squamous cell carcinoma; ROC, receiver operating characteristic curves.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Survival-Related Gene Signatures
Training data set 1 (adenocarcinoma). After multivariate adjustment, 90 genes were significant at P = .01. Consistent with previous reports,7,8 we found that the prediction accuracy of the training set reached a plateau at approximately 50 genes through the leave-one-out cross validation process (Figure A1). Table A5 (online only) lists these top 50 genes as the prediction signature from adenocarcinoma. These genes are involved in cell processes of apoptosis, cell adhesion, signaling, and transcription. Only three genes (STX1A, FUT3, and PDE7A) overlap with the 50-gene signature derived from a univariate analysis in the original publication.7


Figure 4
View larger version (8K):
[in this window]
[in a new window]
[PowerPoint Slide for Teaching]
 
Fig A1. Prediction performance by a number of genes in training data set 1.

 
Training data set 2 (squamous cell carcinoma). One hundred thirty-nine genes were independently associated with survival. Following the same procedure as described for training data set 1, we selected the top 50 genes as the prediction signature for squamous cell carcinoma (Table A6, online only). Ten of the 50 probe sets (nine unique genes: ABCC4, EDG2, HLF, CASK, LGALS8, SPAST, PELI2, MARK1, and IL8) were the same as those identified in the original publication.8

Independent Validation of the Adenocarcinoma 50-Gene Signature From Training Data Set 1
Validation data set 1. The risk prediction by the 50-gene signature was evaluated along with patient age, sex, tumor stage, and cell type (Table 1). Patients in the high-risk group as predicted by the gene expression profile had a significantly increased risk for poor survival with an adjusted hazard ratio (HR) of 2.4 (95% CI, 1.3 to 4.4). The signature was the second most significant predictor after tumor stage. Inclusion of the signature in the clinical model increased the prediction performance from 0.70 to 0.73 over 5 years, as measured by the C index (Fig 2A).


View this table:
[in this window]
[in a new window]

 
Table 1. Prediction Evaluation of Two 50-Gene Signatures Selected From Adenocarcinoma in an Independent Set of Adenocarcinoma Patients of Validation Data Set 1 (n = 84)

 

Figure 2
View larger version (13K):
[in this window]
[in a new window]
[PowerPoint Slide for Teaching]
 
Fig 2. Independent validation of the 50-gene signature from adenocarcinoma. (A) Area under the curve (AUC) for validation data set 1 with (blue line) or without (yellow line) the gene signature. (B) AUC for validation data set 2 (45 adenocarcinomas) with (blue) or without (yellow) the gene signature. (C) Kaplan-Meier curves for combined adenocarcinoma from validation data sets 1 and 2. (D) AUC for combined adenocarcinoma from validation data sets 1 and 2.

 
Validation data set 2. The validation was conducted first on all cases and then separately on adenocarcinoma and squamous cell carcinoma subtypes. As shown in Table 2, the only significant predictors for this mixed data set were tumor stage and the 50-gene signature. Stratified analysis for the two cell types showed that the significant prediction was observed mainly in adenocarcinoma. The tumors with the high-risk gene signature increased the risk of death by 2.4-fold compared with the tumors with the low-risk gene signature, and the prediction was improved from 0.61 to 0.67 (Fig 2B). The signature did not have significant predictive value for squamous cell carcinoma (HR = 1.1, 95% CI, 0.4 to 3.2).


View this table:
[in this window]
[in a new window]

 
Table 2. Prediction Evaluation of the 50-Gene Signature Selected From Adenocarcinoma in an Independent Set of Patients of Validation Data Set 2

 
Stage I adenocarcinoma from combined validation data sets 1 and 2. We combined all cases of stage I adenocarcinoma (IA or IB) from validation data sets 1 and 2 to create a relatively larger set of 91 patients (Table A4). As shown in Figure 2C, the high- and low-risk groups as predicted by the 50-gene signature had significantly different lengths of survival as measured by Kaplan-Meier survival curves (the median survival time was 86 months in the low risk group v 31 months in the high risk group, P < .01). After adjusting for age, sex, stage (IA and IB), and data source of the two combined data sets, patients in the high-risk group had a 2.4-fold higher risk (95% CI, 1.2 to 4.6) of poor survival than those in the low-risk group. Incorporation of the gene signature increased the prediction accuracy from 0.63 to 0.67 (Fig 2D).


View this table:
[in this window]
[in a new window]

 
Table A4. Characteristics of Stage I Adenocarcinoma of Two Validation Data Sets

 
Independent Validation of the Squamous Cell Carcinoma 50-Gene Signature From Training Data Set 2
We did not evaluate this signature on validation data set 1 because close to half of the genes (23 of 50) from the signature do not exist on the U95Av2 chip. When all 91 cases of adenocarcinoma and squamous cell carcinoma in validation data set 2 were evaluated together, the 50-gene signature showed a significant predictive value, with a HR of 2.3 (95% CI, 1.2 to 4.4; Table 3). Adding this signature to the conventional prediction model increased the prediction accuracy from 0.62 to 0.66 (Fig 3A-B). When the two cell types were analyzed separately, the prediction was only statistically significant in adenocarcinoma (HR = 3.5; 95% CI, 1.4 to 9.0) but not in squamous cell carcinoma (HR = 1.8; 95% CI, 0.7 to 4.6; Table 3). For adenocarcinoma, the prediction performance was increased from 0.60 to 0.68 (Fig 3C-D); for squamous cell carcinoma, it was increased from 0.63 to 0.66. The prediction of the gene signature for squamous cell carcinoma was not significant in the multivariate model (adjusted P = .22; Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. Prediction Evaluation of the 50-Gene Signature Selected From Squamous Cell Carcinoma in an Independent Set of Patients of Validation Data Set 2 (n = 91)

 

Figure 3
View larger version (14K):
[in this window]
[in a new window]
[PowerPoint Slide for Teaching]
 
Fig 3. Independent validation of the 50-gene signature from squamous cell carcinoma in validation data set 2. (A) Area under the curve (AUC) for validation data set 2 with (blue) or without (yellow) the gene signature. (B) Kaplan-Meier curves for validation data set 2 by the 50-gene signature. (C) AUC for 45 adenocarcinomas with (blue) or without (yellow) the gene signature. (D) Kaplan-Meier curves for 45 adenocarcinomas by the 50-gene signature.

 
Evaluation of Adenocarcinoma Signature in Training Data Set 2
Our finding that the gene signature from squamous cell carcinoma could predict survival for adenocarcinoma prompted us to further assess the predictive value of the gene signature selected from adenocarcinoma in training data set 2, where all 129 cases are squamous cell carcinoma. When the signature was evaluated in all cases, it was marginally significant in predicting 5-year survival (adjusted HR = 1.6; 95% CI, 0.9 to 2.8); whereas when the analysis was limited to stage I tumors, the signature was highly predictive (HR = 2.5; 95% CI, 1.1 to 5.8).

Functional Relationship Between the Two 50-Gene Signatures
The two signatures from training data set 1 (adenocarcinoma) and training data set 2 (squamous cell carcinoma) have no overlapping genes, yet both can predict survival. This observation led us to hypothesize that the two sets of genes may be functionally related despite the lack of overlap in identity. To test this hypothesis, we performed a pathway-based analysis by uploading the 100 genes into the Ingenuity Pathways Analysis application (http://www.ingenuity.com). Ninety-three of the 100 genes can be mapped into the Ingenuity database, and 73 of the 93 are clustered in five functional gene networks (Table A7, online only). The genes from adenocarcinoma (43 genes) and squamous cell carcinoma (30 genes) are intermingled in these networks and are not explained purely by chance, suggesting functional relationships in the broad biologic categories of cell movement, cell death, cell cycle, and signaling processes.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Evaluation of gene signatures in the context of conventional predictors is essential to develop new molecular testing tools for refined outcome prediction, which in turn could assist in choosing treatment options. Gene expression biomarkers selected by univariate analysis are typically confounded by various sources, most importantly, the existing known predictors of survival in NSCLC. This is supported by our evaluation of the two signatures reported in the literature that were selected from the exact two training data sets used in this study: a 50-gene signature from adenocarcinoma,7 and a 50-gene signature from squamous cell carcinoma.8 Both signatures are strong predictors for survival in a univariate assessment; however, their predictive power diminishes when conventional predictors are included in the same models. For the 50-gene signature from adenocarcinoma, the HR drops from 2.5 (P = .004) to 1.7 (P = .11; Table 1). Adding the gene signature to a clinical model does not appear to improve the outcome prediction (C index from 0.70 to 0.71). For the 50-gene signature from squamous cell carcinoma, the adjusted HR is 1.7 (P = .09; Table 3).

Consistent with the literature,7,8,19 our results support the notion that the gene signature selected from one cell type is predictive for that specific cell type. Whether gene signatures from different cell types can be predictive for each other in NSCLC is a separate, unexplored question, with implications as to how gene expression biomarkers might be translated into clinical use. Our findings of the mutual prediction from the two gene signatures for adenocarcinoma and squamous cell carcinoma suggest that a prognostic signature may not be cell type specific and that a universal signature reflecting tumor aggressiveness and subsequent clinical outcome may exist across histologic cell types. This would be clinically important because a unified gene signature would dramatically simplify the outcome evaluation process for different or unspecified types of carcinoma.

The finding that both gene signatures did not perform well in predicting survival for squamous cell carcinoma in validation data set 2 certainly needs further verification by more studies. However, available evidence suggests that the prognosis of squamous cell carcinoma may in fact be less influenced by variable tumor biology than that of adenocarcinoma. First, in the evaluation of the nine metagenes from Potti et al2 on their series of 51 squamous cell carcinoma and 48 adenocarcinoma, Larsen et al20 found that the prediction of the metagenes demonstrated moderate accuracy (75%) in adenocarcinoma but nil for squamous cell carcinoma (53%). Second, when we evaluated the same nine metagenes on the three data sets (not on validation data set 2, the data set used to select the metagenes), the significant prediction was only observed in adenocarcinoma (validation data set 1; P = .02) but not in 129 cases of squamous cell carcinoma (P = .38). Third, in our evaluation of the optimal number of genes that can be used to predict survival for training data sets 1 and 2, the maximum rate of correct prediction in adenocarcinoma could reach more than 85%, but the correct rate only reached 72% for squamous cell carcinoma (Figures A1 and A2). Fourth, unlike adenocarcinoma, which has many different subtypes and some of which (such as bronchoalveolar carcinoma) demonstrate different biologic behaviors and clinical outcomes, squamous cell carcinoma is relatively homogeneous, and its histologic variations have not been found to be clinically relevant.

The non-overlapping yet equally predictive gene signatures suggest the possibility that multiple sets of gene expression biomarkers may exist in tumors that could be useful for outcome prediction. These genes may participate in similar molecular processes related to tumor aggressiveness. This may explain some of the heterogeneity of NSCLC gene expression profiles observed to date in the literature. For example, a comparison among nine different published gene lists1,2,7-11,19,21 reveals only one in common between the 50-gene signature of squamous cell carcinoma8 and the 129 meta-genes of mixed cell types,2 and one between the 50-gene signature from adenocarcinoma7 and the 96-gene signature of squamous cell carcinoma.19

Another important observation from our work is that we did not observe the dramatic prediction enhancement from gene signatures beyond conventional predictors as reported in a number of published studies. There are several potential explanations for this discrepancy. First, our validation was conducted in completely independent data sets, which generally downgrades but more accurately measures the true performance of a signature. Second, there is significant prediction overlap between gene expression profiles and histologic phenotypes. In our prognostic marker selection, we adjusted for tumor stage and subtypes of adenocarcinoma, the two most significant predictors of clinical outcomes in NSCLC. This facilitates patient substratification within these well-established clinical predictors. Control for these factors removes the confounding elements they introduce, and as a consequence, the predictive power of the derived gene signature is expected to be lower. Third, in our study, we purposely used the gene expression data preprocessed by the original authors using various approaches, which allowed us to mimic the "real life" scenario and evaluate the predictive stability of gene signatures across research centers, platforms, and preprocessing algorithms given their profound effects on the final results and interpretations.12,22-24 The gene signature that can pass through multiple different tests and overcome the associated variability is more likely to be robust and useful. The predictive performance of the two signatures in our study would likely be even higher if all data were processed uniformly using the same software package and algorithm. Fourth, unlike studies that arbitrarily set a time point to use polarized cases with or without a certain event,8,19 no preselection of cases for validation was conducted in our study. The inclusion of all cases may lead to some of the discrepancy with the results reported in the literature where a subset of patients was used. Fifth, unlike other studies that choose a time point in follow-up, we evaluated the performance of a gene signature over a standard 5-year period after initial diagnosis. This allowed for a more thorough observation of how a gene signature performed in the time-dependent event of survival. Arbitrary selection of a different time point for performance evaluation may also contribute to the inconsistencies in the published results. Last, inappropriate data analyses may cause an unrealistic hype for the power of gene expression profiling in cancer outcome prediction, which seems not uncommon in the literature.13

The two gene expression signatures derived in this study provide a moderate yet consistently improved survival prediction for NSCLC beyond conventional predictors. Despite being moderate, their prediction value is comparable to TNM stage. Multiple sets of gene expression biomarkers that can be used for outcome prediction exist in NSCLC. A gene signature selected from adenocarcinoma or squamous cell carcinoma may be predictive for both histologic subtypes. These results suggest the potential for adding molecular classifiers to the existing classifications of NSCLC patients by TNM staging and histologic evaluation.


    AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
The author(s) indicated no potential conflicts of interest.


    AUTHOR CONTRIBUTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Conception and design: Zhifu Sun, Dennis A. Wigle, Ping Yang

Financial support: Ping Yang

Collection and assembly of data: Zhifu Sun

Data analysis and interpretation: Zhifu Sun

Manuscript writing: Zhifu Sun, Dennis A. Wigle, Ping Yang

Final approval of manuscript: Zhifu Sun, Dennis A. Wigle, Ping Yang


    Appendix
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
Gene Mapping Across Platforms
Validation data set 1 was generated from the Affymetrix HG-U95Av2, and it has 12,600 probe sets. Validation data set 2 was conducted on the Affymetrix HG-U133Plus2 chip, which contains 54,675 probe sets. To maximize the coverage of the signature genes identified from the two training sets, no preselection or filtering for the probe set was performed on these two data sets. The two signature genes were mapped to the two validation data sets by two approaches: (1) Affymetrix-released array comparison files across platforms and (2) by gene Locus ID for microarray platforms without array comparison files. The first approach maps each probe set exactly from a previous generation of a chip to a new generation. This approach was used for mapping genes from HUGeneFL to HG-U95Av2, and 43 of 50 probe sets were found for the 50-gene adenocarcinoma signature. From HUGeneFL to HG-U133Plus2 or U133A, the Locus ID was used. When multiple probe sets for a specific locus or gene were returned, average gene expression from the multiple probe sets was used for that gene. For the 50 signature markers selected from HG-U133A, a direct probe set mapping was used to find the respective probe set represented in the U133plus 2 data set (validation data set 2). Table A2 lists the number of gene or probe set mapped among the different chips.


View this table:
[in this window]
[in a new window]

 
Table A2. Number of Probe Sets/Genes Mapped for Gene Signatures

 
Determination of the Optimal Number of Genes in Survival Prediction
We evaluated the optimal number of genes in outcome prediction by the leave-one-out cross validation process in training data set 1 and 2, with a five-gene increment from the two gene lists we selected. Figure A1 illustrates the results for training data set 1. When a gene number increases from 5 to 90, the best performance is seen at the number of 45, 50, and 55. We selected 50 because this is the exact number as reported by Beer et al (Nat Med 8:816-824, 2002) so that the two signatures can be easily compared. Figure A2 is the result for training data set 2. As the gene number increases, the prediction performance goes up, but the best is observed around 50. Adding more genes after this point does not improve the prediction accuracy significantly; therefore, the same number of top 50 genes was chosen for this data set as well.

Risk Score Cutoff Determination in a Validation Data Set
Table A3 lists the log-rank P values at different percentile cutoff points of the calculated risk scores in the adenocarcinoma validation cohort (validation data set 1). The P values at the 60th and 65th percentiles are essentially the same. We selected the 60th percentile for two considerations: (1) It is one of the most significant cutoff points, and (2) using the same cutoff as previously published (Beer et al) will make the two signatures more comparable and interpretable.


View this table:
[in this window]
[in a new window]

 
Table A3. Log-Rank P Values at Different Cutoff Points of Risk Scores

 

Go


View this table:
[in this window]
[in a new window]

 
Table A1. Clinical Characteristics of Four Data Sets Used in the Study

 
Go

Go

Go

Go


View this table:
[in this window]
[in a new window]

 
Table A5. Top 50 Genes Selected by Multivariable Adjustment in Adenocarcinoma

 
Go


View this table:
[in this window]
[in a new window]

 
Table A6. Top 50 Genes Selected by Multivariable Adjustment in Squamous Cell Carcinoma

 
Go


View this table:
[in this window]
[in a new window]

 
Table A7. Signature Genes Mapped Into the Same Functional Networks

 


    ACKNOWLEDGMENTS
 
We thank Jason Wampfler and Matthew Maurer at Mayo Clinic for their technical assistance in data analysis, Susan Ernst for her technical assistance with the manuscript, and the authors who deposited their microarray data in the public domain used in this study.


    NOTES
 
Supported by Grants No. CA80127, CA84354, and CA115857 (P.Y.) from the National Cancer Institute, and Mayo Foundation Funds.

Authors’ disclosures of potential conflicts of interest and author contributions are found at the end of this article.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 AUTHORS' DISCLOSURES OF...
 AUTHOR CONTRIBUTIONS
 Appendix
 REFERENCES
 
1. Wigle DA, Jurisica I, Radulovich N, et al: Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005-3008, 2002[Abstract/Free Full Text]

2. Potti A, Mukherjee S, Petersen R, et al: A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med 355:570-580, 2006[Abstract/Free Full Text]

3. Kikuchi T, Daigo Y, Katagiri T, et al: Expression profiles of non-small cell lung cancers on cDNA microarrays: Identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene 22:2192-2205, 2003[CrossRef][Medline]

4. Takada M, Tada M, Tamoto E, et al: Prediction of lymph node metastasis by analysis of gene expression profiles in non-small cell lung cancer. J Surg Res 122:61-69, 2004[CrossRef][Medline]

5. Hoang CD, D'Cunha J, Tawfic SH, et al: Expression profiling of non-small cell lung carcinoma identifies metastatic genotypes based on lymph node tumor burden. J Thorac Cardiovasc Surg 127:1332-1342, 2004[Abstract/Free Full Text]

6. Oshita F, Ikehara M, Sekiyama A, et al: Genomic-wide cDNA microarray screening to correlate gene expression profile with chemoresistance in patients with advanced lung cancer. J Exp Ther Oncol 4:155-160, 2004[Medline]

7. Beer DG, Kardia SL, Huang CC, et al: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816-824, 2002[Medline]

8. Raponi M, Zhang Y, Yu J, et al: Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. Cancer Res 66:7466-7472, 2006[Abstract/Free Full Text]

9. Lu Y, Lemon W, Liu P-Y, et al: A gene expression signature predicts survival of patients with stage I non-small cell lung cancer. Plos Med 3:e467, 2006[CrossRef][Medline]

10. Chen HY, Yu SL, Chen CH, et al: A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356:11-20, 2007[Abstract/Free Full Text]

11. Sun Z, Yang P, Aubry MC, et al: Can gene expression profiling predict survival for patients with squamous cell carcinoma of the lung? Mol Cancer 3:35, 2004[CrossRef][Medline]

12. Yang P, Sun Z, Aubry MC, et al: Study design considerations in clinical outcome research of lung cancer using microarray analysis. Lung Cancer 46:215-226, 2004[CrossRef][Medline]

13. Dupuy A, Simon RM: Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99:147-157, 2007[Abstract/Free Full Text]

14. Sun Z, Yang P: Gene expression profiling on lung cancer outcome prediction: Present clinical value and future premise. Cancer Epidemiol Biomarkers Prev 15:2063-2068, 2006[Abstract/Free Full Text]

15. Yang P, Sun Z: Gene-expression profiling in lung cancer: Still early days. Pharmacogenomics 8:129-132, 2007[CrossRef][Medline]

16. Bhattacharjee A, Richards WG, Staunton J, et al: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 98:13790-13795, 2001[Abstract/Free Full Text]

17. Bild AH, Yao G, Chang JT, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353-357, 2006[CrossRef][Medline]

18. Heagerty PJ, Zheng Y: Survival model predictive accuracy and ROC curves. Biometrics 61:92-105, 2005[CrossRef][Medline]

19. Larsen JE, Pavey SJ, Passmore LH, et al: Expression profiling defines a recurrence signature in lung squamous cell carcinoma. Carcinogenesis 28:760-766, 2007[Abstract/Free Full Text]

20. Larsen JE, Fong KM, Hayward NK: Refining prognosis in non-small-cell lung cancer. N Engl J Med 356:190, 2007[Medline]

21. Zheng Z, Chen T, Li X, et al: DNA synthesis and repair genes RRM1 and ERCC1 in lung cancer. N Engl J Med 356:800-808, 2007[Abstract/Free Full Text]

22. Verhaak RG, Staal FJ, Valk PJ, et al: The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies. BMC Bioinformatics 7:105, 2006[CrossRef][Medline]

23. Shedden K, Chen W, Kuick R, et al: Comparison of seven methods for producing Affymetrix expression scores based on false discovery rates in disease profiling data. BMC Bioinformatics 6:26, 2005[CrossRef][Medline]

24. Hoffmann R, Seidl T, Dugas M: Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biol 3:RESEARCH0033, 2002[Medline]

Submitted June 25, 2007; accepted October 30, 2007.


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Facebook Facebook   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Am. J. Respir. Crit. Care Med.Home page
S. Dubey and C. A. Powell
Update in Lung Cancer 2008
Am. J. Respir. Crit. Care Med., May 15, 2009; 179(10): 860 - 868.
[Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. C. Boutros, S. K. Lau, M. Pintilie, N. Liu, F. A. Shepherd, S. D. Der, M.-S. Tsao, L. Z. Penn, and I. Jurisica
Prognostic gene signatures for non-small-cell lung cancer
PNAS, February 24, 2009; 106(8): 2824 - 2828.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sun, Z.
Right arrow Articles by Yang, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sun, Z.
Right arrow Articles by Yang, P.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

About
JCO
 Editorial
Roster
 Advertising
Information
 Librarians &
Institutions
 Rights &
Permissions
 PDA Services

Copyright © 2008 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
Terms and Conditions of Use
  HighWire Press HighWire Press™ assists in the publication of JCO Online