|
|||||
|
|
||||||
Journal of Clinical Oncology, Vol 25, No 34 (December 1), 2007: pp. 5418-5425 © 2007 American Society of Clinical Oncology. DOI: 10.1200/JCO.2007.12.8033 Quantitative Justification of the Change From 10% to 30% for Human Epidermal Growth Factor Receptor 2 Scoring in the American Society of Clinical Oncology/College of American Pathologists Guidelines: Tumor Heterogeneity in Breast Cancer and Its Implications for Tissue Microarray–Based Assessment of Outcome
From the Departments of Pathology and Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT; Division of Medical Oncology, and Genetic Pathology Evaluation Centre, Department of Pathology, British Columbia Cancer Agency; Department of Pathology, Vancouver General Hospital; and Department of Pathology, University of British Columbia, Vancouver, British Columbia, Canada Address reprint requests to David L. Rimm, MD, PhD, Department of Pathology, Yale University School of Medicine, 310 Cedar St, PO Box 208023, New Haven, CT 06520-8023; e-mail: david.rimm{at}yale.edu
Purpose The variability in scoring of immunohistochemistry, whether a result of true heterogeneity or artifacts in preparation, has led to decreased reliability in companion diagnostics and the recommendation for new standards (eg, the American Society of Clinical Oncology/College of American Pathologists [ASCO-CAP] guidelines). The basis of this problem is the amount of tissue required to be representative of an entire tumor. Because protein expression on tissue microarrays (TMAs) can be rigorously measured and one 0.6-mm spot is equivalent to two to three high-power fields, we used TMAs to assess levels of heterogeneity and to determine optimal representation as a function of outcome. Patients and Methods We analyzed estrogen receptor (ER), progesterone receptor, and human epidermal growth factor receptor 2 (HER-2) expression in two cohorts (n = 676 and n = 152) on a series of four to five separate TMA cores and assessed heterogeneity by linear regression analysis. Minimum, average, and maximum scores were generated for each set, which were then assessed for prognostic and predictive value. Results Each marker shows some heterogeneity, but average r values between 0.7 and 0.8 are seen between TMA spots. Analysis for prognostic value shows that the highest maximum score (of five spots) is the most prognostic for ER, whereas a high HER-2 minimum score is most prognostic for poor outcome and most predictive of response to trastuzumab. Conclusion These results suggest that the representivity required for each biomarker may be a function of its role in tumorigenesis. Furthermore, these results provide scientific basis for the ASCO-CAP guidelines for assessment of HER-2 expression but perhaps suggest that the 30% figure is still too conservative.
The recent American Society of Clinical Oncology/College of American Pathologists (ASCO-CAP) guidelines1 for interpretation of the companion diagnostic for trastuzumab states that the membranous expression pattern for immunohistochemistry (IHC) staining of human epidermal growth factor receptor 2 (HER-2; erbB-2, neu) must be represented in 30% of the cells representing invasive carcinoma. This represents a change from the previous standard of 10% of the tumor cells, which has been the standard since the labeling was defined by the package insert on the HercepTest, the first US Food and Drug Administration–approved IHC-based companion diagnostic test. If the ASCO-CAP guidelines are examined for clues as to why this significant change was made, the reader is directed to Appendix G, which states, "A cutoff of more than 30% reflects the cumulative experience of panel members that usually a high percentage of the cells will be positive if it is a true IHC3+, published reports using cutoff values higher than 10% and the goal of the panel to decrease the incidence of false positive 3+ cases... ."1 The cited reference by Vincent-Salomon et al2 for the GEFPICS group includes analysis of 10% and 60% areas representing the expression level for scoring. They find that the 60% levels result in better concordance with gene amplification as assessed by the fluorescent in situ hybridization assay, but they do not test the 30% figure. Beyond that study, there seems to be no published scientific justification for the 30% figure. This change of standards for the companion diagnostic to assess response to trastuzumab responds to two difficult issues in IHC analysis. First, it addresses the issue of artifacts and issues of variation in laboratory testing. By increasing the area of the required region of staining, they aim to make the test more specific by removing false positives generated by edge artifact or small regions of strong staining that may not accurately represent the tumors. They cite that prospective substudies from two of the adjuvant randomized trials of trastuzumab versus nil have demonstrated that approximately 20% of HER-2 assays performed in the field (at the primary treatment site's pathology department) were incorrect when the same specimen was re-evaluated in a high-volume, central laboratory.3,4 Thus, the first issue of artifact and laboratory variability is clearly one reason for this change. However, the second issue that is addressed is that of tumor heterogeneity. Measurement of this parameter has historically been difficult to assess and fraught with inaccuracies related to methods of measurements as well as confounding caused by preanalytic variables such as time to fixation, extent of fixation, and fixative selected. The guidelines extensively assess these variables with a series of requirements for specimen preparation and criteria for sample rejection. But there is less information regarding true biologically based tumor heterogeneity. Here, we address the question of heterogeneity from the context of the tissue microarray (TMA). TMAs allow for rapid, relatively inexpensive analysis and efficient use of tissue archives in large-population studies. A number of studies have been performed to address the issue of the number of cores required for representative assessment of breast tumors.5,6 One approach to quantitative heterogeneity studies is to perform analysis of the amount of protein present in the histospots from the same patient stained for the same biomarker. We use the automated quantitative protein expression analysis (AQUA) technology to measure protein expression. AQUA is a fluorescence-based method for rigorous quantitative assessment of protein expression on histology slides. It uses molecular identification of compartments by the use of multiple fluorophores followed by measurement of target biomarker intensity.7 This method has been shown to have accuracy comparable to an enzyme-linked immunosorbent assay8 and has been broadly used for studies on both TMAs9 and whole sections.10 In this article, we describe the assessment of the expression of three standard breast cancer markers (estrogen receptor [ER], progesterone receptor [PR], and HER-2) in four to five separate TMA cores from each tumor. This is similar, in some ways, to examination of four to five midpower (20x) microscopic fields.
Cohort Description and TMA Construction TMAs were constructed with 669 representative breast core samples (331 node-negative and 338 node-positive samples) selected from hematoxylin and eosin full-section stains, resected between January 1, 1962, and December 31, 1977, and archived in the Yale University Department of Pathology. Follow-up time ranged from 2.4 to 41.5 years, with a median time of 8.3 years. Age at diagnosis ranged from 24 years to 86 years, with a median age of 59 years. Detailed description of the cohort has been previously published.11 Paraffin-embedded blocks containing the cores were tape transferred in 5-µm sections by a tissue microarrayer (Beecher Instruments, Silver Spring, MD), allowing five constructs of the same study population to be created as previously described.7,12 Collection of these materials and the associated clinical information was approved by the Yale Human Investigation Committee under protocol No. 8219 to D.L.R. The TMA from the British Columbia Cancer Agency has been previously described.13
IHC
Image Capture and Quantitative Analysis
Statistical Analysis
Five-fold analysis of each marker allows assessment of pairwise correlations of each set to evaluate both assay reproducibility and heterogeneity. Regression analysis plots are shown for HER-2 in Figure 1 and ER and PR in Figure 2. The assessment of ER and PR shows log AQUA scores from each array, with the Spearman's assessment of correlation in the upper left hand corner of each box. Note that the values for all correlations for ER are greater than 0.75 unless they include array construction 1. A similar pattern is seen for PR. This is an example of detection of systematic error in construction 1 analysis (our first array construction and an earlier analysis). However, even the best correlations never exceed 0.86. We believe this represents true spot-to-spot heterogeneity. Figure 1 illustrates the same observations for HER-2, but to use both early and recent HER-2 analysis data, we normalized the AQUA scores on a 100-point scale and show the data on a log scale. There are many fewer points compared with ER and PR because approximately 85% of the cohort has low expression of HER-2. For all three proteins, we observed that, as time progressed through the coring and construction of these arrays, the correlations became greater.
Because we have obtained quantitative scores for five independent 600-µm diameter regions of each tumor, we can assess outcome as a function of heterogeneity. To do this, we have plotted the minimum, maximum, and average score for each marker studied. To determine the significance of this distribution but show only the most informative data, Figure 3 shows the minimum, maximum, and average score for any case where the maximum score was greater than the cut point to be considered positive. The cut points shown are determined from analysis of cell lines for each marker (Moeder et al, manuscript in preparation) and from previous studies using the AQUA technology where these markers were measured.11 Figure 3 demonstrates the minimum, average, and maximum AQUA scores across the five constructs of the breast array. For this figure, patients were ranked by their average compartmental AQUA score (scaled to 100 for each marker). As scores increase, so too does the variation between the minimum, average, and maximum. We believe that the range between the minimum and maximum and the variability between cases is a representation of tumor heterogeneity for each marker.
To examine the effect of analyzing a patient's minimum score versus the average or maximum score across the five samples, plots were generated for each value. Figure 4 includes Kaplan-Meier curves using the minimum, average, or maximum score generated for all three markers. AQUA cell line or historically derived cut points (ER = 10, PR = 13, and HER-2 = 12) were maintained for all plots. For ER, the maximum score produces the most statistically significant P value (P = .0003). PR produces significant P values for each set of data; the change, however, is minimal. HER-2 dichotomizes the population in each case, but it shows that the minimum AQUA scores are more prognostically valuable in that P values increase from P < .0001 in the minimum plot to P = .0962 in the maximum plot. In other words, the 5-year survival rate when the minimum score is greater than the cut point is 45% compared with 80% for HER-2–negative patients, whereas when the maximum score is greater than the cut point, the difference is only 60% versus 70%.
To determine whether these differences are statistically significant, univariate analysis for continuous data was performed (Table 1). The risk ratio and 95% CI are shown with the P value for each of the three markers for their minimum, average, and maximum scores. In every case, a significant risk ratio is found, although it seems small because it is expressed per scoring unit. However, the overlap in the 95% CI shows that in no case is the risk ratio of the minimum score significantly different from the risk ratio derived from the maximum score.
Data from this cohort allowed only analysis of disease-specific survival, which, although interesting, is less valuable than data showing response to therapy. To test the concept of minimum versus maximum scores for prediction of response to therapy, we analyzed a retrospectively collected cohort of trastuzumab-treated patients from the British Columbia Cancer Agency.13 Analysis was performed in a similar manner to the Yale cohort, but only four TMA cores were available for analysis. In each case, a minimum of three readings was required for inclusion in the data set, and in each, a minimum, maximum, and average score was calculated. Table 2 lists the odds ratios for response to therapy, where response is defined according to the Response Evaluation Criteria in Solid Tumors criteria as complete response or partial response combined and nonresponse is defined as stable disease or progressive disease. Similar to the results seen for disease-specific survival in the Yale cohort, the minimum HER-2 AQUA score is most predictive of response to therapy in the British Columbia Cancer Agency cohort.
By the analysis of five TMA tissue spots, we show quantifiable evidence of heterogeneity in expression of ER, PR, and HER-2. Although regression analysis shows good agreement between different spots from the same patients, Spearman's values are in the range of 0.7 to 0.8, and the regression plots show outliers. This raises questions regarding which of these different scores is truly most representative of tumor behavior. Because the number of scores was limited, we calculated only a minimum, average, and maximum score for each patient. Surprisingly, the score that most accurately reflected the biologic behavior of the tumor seemed to vary depending on the biomarker. For ER, the highest score was the most prognostic of good outcome, whereas for HER-2, the lowest score was most prognostic and most predictive. This may be a function of the tumor biology itself. For example, perhaps if any region of the tumor is expressing ER, a better prognosis is inferred because the tumor is better differentiated. Similarly, one can also describe a scenario for HER-2. Because HER-2 is an oncogene, it may be that it is relevant only if the tumor is "addicted"14 to that pathway. The lowest score being most prognostic and predictive implies that every field (even the lowest of the five) shows high-level expression. Translated to traditional slides, we infer that, if significant portions of the tumor do not express HER-2, then it is not driven by that pathway and not likely to behave as aggressively or respond to therapy as well as a uniformly positive tumor. The term uniformly positive was reportedly considered by the authors of the ASCO-CAP guidelines, but ultimately the stipulation of 30% of tumor area was accepted. The range of expression or tumor heterogeneity seen in this study is commonly observed by pathologists when examining IHC slides. The cause of this heterogeneity may be truly biologically based, or in some cases, it may be an artifact of tissue fixation or preparation. Either way, it represents a significant complication in the interpretation of expression. If it is an artifact that may be detected by other changes in morphology or other staining patterns, then it can be summarily disregarded. However, often artifacts are difficult to detect, and there is significant evidence of heterogeneity of expression of at least some proteins. Heterogeneity makes biologic sense in tumors because blood vessels are less well organized than normal tissues, so levels of ischemia can vary within a tumor, which can lead to pathways of protein expression that reflect that ischemic condition. This is commonly observed morphologically where some areas are so ischemic as to become necrotic and show obvious heterogeneity. Furthermore, there may be variables other than ischemia that also influence heterogeneity, such as proximity to vessels or stroma, which results in effects on delivery of chemokines, growth factors, or other modulators. We believe that the changes we are seeing in between different cores in this study represent this sort of heterogeneity. A limitation of this study is that we did not assess fields on a slide as a pathologist would in routine analysis; rather, we assessed five cores from a TMA. We believe this is similar to a pathologist selecting five medium-power fields for analysis. Most likely this number is significantly less than the number of fields that are viewed by a pathologist when reading an IHC slide. Furthermore, the pathologist does not evenly weight all fields to generate an average, but rather reports the score of the most intensely staining area. In a previous study,10 we looked at as many as 80 fields per slide and saw similar heterogeneity, but we were unable to evaluate its significance with respect to outcome because of the limited number of patients examined. In another previous study,5 we observed that pathologist assessment of 10 cores is comparable to the whole slide assessment, despite the uneven weighting of fields. Thus, even though we only examined five fields here, we believe that we can extrapolate these findings to whole slides. In some sense, the findings in this study have already been translated into clinical practice. That is, at many sites, ER is called positive if only 5% to 10% of the cells (or sometimes even fewer) show nuclear staining. Similarly, the recent guidelines for HER-2 are a step in the right direction going from 10% to 30% positive because these data support a uniform positive staining pattern. Perhaps most significantly, these data reveal that each biomarker must be assessed uniquely to determine how it should be optimally scored, perhaps as a function of the biologic activity. As new tissue biomarkers are introduced in the future, one can imagine that systematic analyses may be performed to best determine the level of expression and the degree of uniformity required to most accurately predict the biologic impact.
Although all authors completed the disclosure declaration, the following author(s) indicated a financial or other interest that is relevant to the subject matter under consideration in this article. Certain relationships marked with a "U" are those for which no compensation was received; those relationships marked with a "C" were compensated. For a detailed description of the disclosure categories, or for more information about ASCO's conflict of interest policy, please refer to the Author Disclosure Declaration and the Disclosures of Potential Conflicts of Interest section in Information for Contributors. Employment or Leadership Position: None Consultant or Advisory Role: Robert L. Camp, HistoRx (C); David L. Rimm, HistoRx (C) Stock Ownership: David L. Rimm, HistoRx Honoraria: None Research Funding: None Expert Testimony: None Other Remuneration: None
Conception and design: Jennifer M. Giltnane, Robert L. Camp, David L. Rimm Financial support: David L. Rimm Provision of study materials or patients: Andrew Robinson, Karen Gelmon, David Huntsman, David L. Rimm Collection and assembly of data: Christopher B. Moeder, Jennifer M. Giltnane, Malini Harigopal, David L. Rimm Data analysis and interpretation: Christopher B. Moeder, Jennifer M. Giltnane, Annette Molinaro, Robert L. Camp, David L. Rimm Manuscript writing: Christopher B. Moeder, Jennifer M. Giltnane, David L. Rimm Final approval of manuscript: David L. Rimm
Supported by the an Avon-National Cancer Institute (NCI) Progress for Patients grant, NCI Grants No. R33 CA 106709 and R33 CA 110511 (D.L.R.) and K22 KCA123146A (A.M.), and a Medical Scientist Training Program grant (J.M.G.). Terms in blue are defined in the glossary, found at the end of this article and online at www.jco.org. C.B.M. and J.M.G. both contributed equally to this work. Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
1. Wolff AC, Hammond ME, Schwartz JN, et al: American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. J Clin Oncol 25:118-145, 2007 2. Vincent-Salomon A, MacGrogan G, Couturier J, et al: Calibration of immunohistochemistry for assessment of HER2 in breast cancer: Results of the French multicentre GEFPICS study. Histopathology 42:337-347, 2003[CrossRef][Medline] 3. Paik S, Bryant J, Tan-Chiu E, et al: Real-world performance of HER2 testing: National Surgical Adjuvant Breast and Bowel Project experience. J Natl Cancer Inst 94:852-854, 2002 4. Roche PC, Suman VJ, Jenkins RB, et al: Concordance between local and central laboratory HER2 testing in the breast intergroup trial N9831. J Natl Cancer Inst 94:855-857, 2002 5. Camp RL, Charette LA, Rimm DL: Validation of tissue microarray technology in breast carcinoma. Lab Invest 80:1943-1949, 2000[Medline] 6. Torhorst J, Bucher C, Kononen J, et al: Tissue microarrays for rapid linking of molecular changes to clinical endpoints. Am J Pathol 159:2249-2256, 2001 7. Camp RL, Chung GG, Rimm DL: Automated subcellular localization and quantification of protein expression in tissue microarrays. Nat Med 8:1323-1327, 2002[CrossRef][Medline] 8. McCabe A, Dolled-Filhart M, Camp RL, et al: Automated quantitative analysis (AQUA) of in situ protein expression, antibody concentration, and prognosis. J Natl Cancer Inst 97:1808-1815, 2005 9. Cregger M, Berger AJ, Rimm DL: Immunohistochemistry and quantitative analysis of protein expression. Arch Pathol Lab Med 130:1026-1030, 2006[Medline] 10. Chung GG, Zerkowski MP, Ghosh S, et al: Quantitative analysis of estrogen receptor heterogeneity in breast cancer. Lab Invest 87:662-669, 2007[CrossRef][Medline] 11. Dolled-Filhart M, McCabe A, Giltnane J, et al: Quantitative in situ analysis of beta-catenin expression in breast cancer shows decreased expression is associated with poor outcome. Cancer Res 66:5487-5494, 2006 12. Camp RL, Dolled-Filhart M, King BL, et al: Quantitative analysis of breast cancer tissue microarrays shows that both high and normal levels of HER2 expression are associated with poor outcome. Cancer Res 63:1445-1448, 2003 13. Robinson AG, Turbin D, Thomson T, et al: Molecular predictive factors in patients receiving trastuzumab-based chemotherapy for metastatic disease. Clin Breast Cancer 7:254-261, 2006[Medline] 14. Garraway LA, Sellers WR: Lineage dependency and lineage-survival oncogenes in human cancer. Nat Rev Cancer 6:593-602, 2006[CrossRef][Medline] Submitted May 30, 2007; accepted August 30, 2007.
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2007 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|