|
|||||
|
|
||||||
Journal of Clinical Oncology, Vol 22, No 13 (July 1), 2004: pp. 2567-2575 © 2004 American Society of Clinical Oncology. DOI: 10.1200/JCO.2004.11.141 Tree-Based Model for Breast Cancer PrognosticationFrom the Department of Biostatistics, University of Michigan, Ann Arbor; Barbara Ann Karmanos Cancer Institute and the Center for Healthcare Effectiveness Research, Wayne State University, Detroit, MI; and the Department of Management Science and Statistics, The University of Texas at San Antonio, San Antonio, TX Address reprint requests to Mousumi Banerjee, PhD, Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109-2029; e-mail: mousumib{at}umich.edu
PURPOSE: To define prognostic groups for recurrence-free survival in breast cancer, assess relative effects of prognostic factors, and examine the influence of treatment variations on recurrence-free survival in patients with similar prognostic-factor profiles. PATIENTS AND METHODS: We analyzed 1,055 patients diagnosed with stage I-III breast cancer between 1990 and 1996. Variables studied included socioeconomic factors, tumor characteristics, concurrent medical conditions, and treatment. The primary end point was recurrence-free survival (RFS). Multivariable analyses were performed using recursive partitioning and Cox proportional hazards regression.
RESULTS: The most significant difference in prognosis was between patients with fewer than four and those with at least four positive nodes (P < .0001). Four distinct prognostic groups (5-year RFS, 97%, 78%, 58%, and 27%) were developed, defined by the number of positive nodes, tumor size, progesterone receptor (PR) status, differentiation, race, and marital status. Patients with fewer than four positive nodes and tumor CONCLUSION: Lymph node status, PR status, tumor size, differentiation, race, and marital status are valuable for prognostication in breast cancer. The prognostic groups derived can provide guidance for clinical trial design, patient management, and future treatment policy.
Breast cancer is the most common malignancy diagnosed among women and the second leading cause of cancer mortality in the United States.1 In 2004, approximately 215,990 new cases of female invasive breast cancer will be diagnosed in the United States, and 40,110 women will die of the disease.1 A decline in breast cancer mortality rates has been observed since 1990.2,3 Results from two large population-based studies suggest the decline is due to adjuvant tamoxifen treatment in women older than 50 years and adjuvant chemotherapy in women younger than 50 years.4,5 Still, patients treated with adjuvant chemotherapy or tamoxifen present great heterogeneity in terms of recurrence-free survival, suggesting that patient and/or tumor characteristics may influence outcome more strongly than modifications in standard therapeutic approaches. An analysis of the association of recurrence-free survival (RFS) with patient and tumor characteristics, as well as treatment-related variables, is necessary to assure reliable evaluation of new approaches for treatment of breast cancer. The goals of the current study, which uses a nonparametric statistical technique known as recursive partitioning (RP), are to (1) analyze the relative contributions of patient and tumor-related prognostic factors to the RFS of patients with stage I-III breast cancer; (2) identify patient subgroups with similar outcome within a group but different outcomes between groups (ie, prognostic grouping of patients); and (3) examine influence of treatment variations on RFS in patients with similar prognostic-factor profiles. Prognostic groups are important in assessing disease heterogeneity and for design and stratification of future clinical trials. Because patterns of breast cancer treatment are changing so rapidly, it was important that the results of the present analysis be applicable to contemporary patients. Hence, we chose a cohort of patients diagnosed between 1990 and 1996, using RFS as the end point to avoid the inaccuracies inherent in death-certificate reports and to maximize the number of cancer-specific events within the shortest time possible.
RP, or tree-based analysis,6-11 was selected to analyze the data rather than the traditional Cox regression analysis for several reasons. Discovery of interactions is difficult using Cox regression, because the interactions must be specified a priori. In contrast, RP automatically detects important interactions. Furthermore, unlike Cox analysis, RP is adept in uncovering variables that may be largely operative within a specific patient subgroup but may have minimal effect or none in other patient subgroups. Also, RP provides a superior means for prognostic classification.9-11 Rather than fitting a model to the data, RP sequentially divides the patient group into two subgroups based on prognostic factor values (eg, age > 50 years v
Study Cohort and Sources of Information Eligible women were newly diagnosed patients with stage I, II, or III breast cancer, diagnosed between January 1, 1990, and December 31, 1996, at the Wayne State Universityaffiliated Harper Hospital (Detroit, MI), and treated at the Karmanos Cancer Institute (KCI), a National Cancer Institute (NCI)designated comprehensive cancer center. Detailed demographic, clinical, pathologic, and treatment information was obtained from the Surveillance, Epidemiology, and End Results database, Harper Hospital records, clinics of the KCI, individual physicians attending Harper but not affiliated with KCI, and 1990 census data. Follow-up information regarding recurrence was obtained from clinic records or pathologic reports.
Study Variables
Statistical Methods Although the RP method ensures that left and right terminal subgroups from the same parent are significantly different in terms of RFS, it is possible that terminal subgroups from distinct parents may have similar RFS profiles. Therefore, further amalgamation of terminal subgroups with similar RFS may be required. For amalgamation, we first ordered the terminal subgroups based on hazard ratio of a terminal subgroup relative to the leftmost terminal subgroup of the final tree. The monotone ordering was then coded as a single ordered covariate, and the recursive partitioning algorithm was used again17,18 to form the final prognostic groups. Cox proportional hazards regression21 analyses were also performed on this patient cohort using the same variables used in the RP analyses. Results obtained using proportional hazards analyses were compared with those generated by RP.
Characteristics of Patient Cohort A total of 1,125 women were diagnosed between January 1990 and December 1996 with stage I-III breast cancer. Seventy were excluded (33 whites and 37 African Americans) for lack of any follow-up information after initial diagnosis, and the remaining 1,055 patients constitute our analysis cohort. Of these, 56% were African American, and 44% were white, reflecting the population served by Harper Hospital and KCI in Detroit. The median age at diagnosis was 59 years. Forty-one percent of these women presented with stage I disease, 48% with stage II disease, and 11% with stage III disease. Reflecting the pattern of care at an NCI-designated comprehensive cancer center, 39% of our patients had lumpectomy, and 61% had mastectomy. Tables 1 and 2 present frequency distributions of sociodemographic, tumor, and treatment characteristics in our cohort. The median follow-up of our cohort was 44 months. Twenty-three percent of the patients had recurrence, and among them, 75% had distant disease.
RP Analysis of RFS Fifteen pretreatment variables (age, race, marital status, SES rank, obesity, diabetes, hypertension, heart disease, stroke, cholesterol level at the time of breast cancer diagnosis, tumor size, number of positive lymph nodes, tumor differentiation, and ER and PR status) were considered for the RP analysis. The regression tree for RFS is shown in Figure 1. The most significant split was by the number of positive nodes, with the best cutoff fewer than four versus at least four positive nodes (log-rank, 76.61; P < .0001). The competitor at this step was tumor size ( 2 cm v > 2 cm), and the corresponding log-rank was 50.39. Thus, number of positive nodes was 1.5-fold stronger than the second best splitter.
The subgroup with at least four positive nodes was next split by PR status (log-rank, 7.39; P =.04), and the competitor at this step was differentiation (well or moderately v poorly differentiated; corresponding log-rank, 6.3). Thus, PR status was 1.2-fold stronger than the second best splitter. Furthermore, PR status was 2.2-fold stronger than ER status (log-rank, 3.33), and 2.6-fold stronger than tumor size (log-rank, 2.87) at this step. Patients with positive PR status had significantly better outcome than the PR-negative patients (estimated 25th percentile RFS, 31 months and 16 months, respectively). None of these subgroups had any other significant split and formed terminal subgroups VII and VIII in the tree. Of note, PR-negative patients with at least four positive nodes (ie, terminal subgroup VIII) had the worst prognosis among all subgroups.
On the opposite side of the tree, the subgroup with fewer than four positive nodes was next split by tumor size (best cutoff
The subgroup with fewer than four positive lymph nodes, tumors
The patient subgroup with fewer than four positive nodes and tumor size > 2 cm was then split by race (log-rank, 6.12; P < .0001). The competitor at this step was age (< 50 v There were 153, 93, 180, 77, 53, 206, 57, and 69 patients in the terminal subgroups I to VIII, respectively. The subgroups with similar RFS were amalgamated (II, III, and IV and V, VI, and VII). These, together with subgroups I and VIII, resulted in four distinct prognostic groups for RFS, designated A to D. Table 3 presents details of this amalgamation. Kaplan-Meier survival curves for the four groups are presented in Figure 2. The 5-year RFS rates for patients in groups A, B, C, and D were 97%, 78%, 58%, and 27%, respectively (P = .0001).
Proportional Hazards Analysis of RFS Based on univariate log-rank analyses, eight of the 15 pretreatment variables were found to be significantly predictive of RFS: age, race, SES rank, tumor size, number of positive lymph nodes, tumor differentiation, ER status, and PR status. Multivariate Cox analysis identified number of positive lymph nodes (P < .0001), tumor size (P = .005), and tumor differentiation (P = .01) as independent predictors of RFS (Table 4). None of the following variables had independent prognostic significance: age, race, marital status, SES rank, obesity, diabetes, hypertension, heart disease, stroke, high cholesterol level, ER status, and PR status.
Treatment Analysis Table 5 presents the treatment distributions in the four groups. There were significant differences in the patterns of primary (P = .0001) and adjuvant therapies (P =.0001) in the four prognostic groups. RFS distributions according to adjuvant therapy in the four prognostic groups are shown in Table 6. For patients in groups A and B, there were no significant differences in RFS by adjuvant radiation. Patients in groups C and D who received adjuvant radiation therapy had significantly better outcome compared with patients who did not receive radiation therapy. However, within each prognostic group, a comparison of patients treated with lumpectomy plus adjuvant radiation versus patients treated with mastectomy (with or without radiation) showed no significant differences in RFS (group A: P = .23; group B: P = .97; group C: P = .21; group D: P = .90).
For patients in the low-risk group (group A), RFS did not vary by the type of adjuvant systemic treatment received (P = .38). However, RFS did differ significantly by systemic treatment in each of the intermediate- and high-risk groups (group B: P = .0002; group C: P = .0002; group D: P = .0001), with tamoxifen or a combination of tamoxifen and chemotherapy yielding the best outcome in each of the aforementioned groups.
The goal of this analysis was to increase our understanding of the relative influence of prognostic variables on outcome in breast cancer patients. While the results confirm the importance of lymph node status and tumor size in patients with stage I-III breast cancer, several additional observations have been made that were not possible with the Cox regression analysis. The significance of PR status in patients with at least four positive lymph nodes was potentially the most important finding of the RP analysis. Patients with PR-positive status had much better prognosis than those with PR-negative tumors (5-year RFS, 55% v 27%, respectively) in this nodal subgroup. Furthermore, tumor size was only operative in the subgroup of patients with fewer than four positive nodes. Tumor size was a strong predictor of RFS (log-rank, 29.73; P < .0001) in patients with fewer than four positive nodes but failed to predict RFS significantly (log-rank, 2.87; P = .09) in patients with at least four positive nodes. Tumor differentiation was also relatively more important and may be largely operative in the subgroup with fewer than four positive nodes. In this nodal subgroup, tumor differentiation emerged as a significant splitter at various levels of the tree (either as the best or the competitor splitter). In contrast, tumor differentiation was only marginally significant in predicting RFS (log-rank, 6.3; P = .05) in patients with at least four positive nodes. The significance of race in patients with fewer than four positive lymph nodes and tumors > 2 cm and the selective importance of marital status in white patients with fewer than four positive lymph nodes and tumors > 2 cm were other interesting RP refinements on the results of the Cox model. The influence of these sociodemographic variables relative to tumor pathology was a unique finding of the RP analysis and may be a reflection of the patient population at the cancer center. Racial disparities in breast cancer survival have been widely reported in the literature.2,22-25 African Americans present a 24% increased risk of dying of breast cancer, compared with whites.1 Poorer survival among African Americans is often attributed to a more advanced stage of disease at diagnosis. However, residual disparities remain even after adjusting for stage. The current study lends further evidence toward a racial disparity; specifically, in the patient subgroup with fewer than four positive lymph nodes and tumors > 2 cm, African Americans had significantly poorer RFS than did whites. Although race is highly correlated with SES in our population, race is an independent predictor of breast cancer survival.26 Several studies27-31 have found a significant association of marital status with breast cancer survival. Although psychosocial factors, such as stress and attitude toward life, have been shown to affect quality of life in cancer survivors, the only psychosocial factor with a documented effect on survival is external social support.32,33 Thus, support given to women with breast cancer has not only a positive effect on their reactions to the illness but is also associated with better survival. In the present analysis, marital status may well have been acting as a proxy to external social support in affecting survival.
Recursive partitioning analyses found four prognostic groups, defined only by the number of positive nodes, PR status, tumor size, differentiation, race, and marital status. Use of the tree model could thus add an important facet to managing the individual patient. For example, Figure 1 indicates that a patient with fewer than four positive lymph nodes and PR-negative tumor Both proportional hazards regression and RP enable assessment of prognostic factors, but the two approaches differ substantively. A detailed comparison of the two is described elsewhere,19 but one important difference is worth emphasizing. Regression favors global effectsthat is, factors that have uniform effect on survival across the entire patient population. RP, however, can uncover factors that may act differently for different subgroups of patients. This is an important point, given that in an individual patient, the discriminating power of one prognostic factor may be significantly enhanced or overshadowed by the presence or absence of other factors. The four prognostic groups created through RP and amalgamation could potentially be valuable for patient management and future clinical-trial design. For example, in group A, the 5-year RFS rate of 97% indicates that highly effective treatment had been initiated and that some patients were perhaps overtreated. Conversely, for the 69 patients with poor prognosis (Group D), the 5-year RFS rate of 27% represents a disappointing result and deserves much greater focus of research into better systemic therapies. There are several important caveats to be applied to these suggestions. First, validation of the prognostic groups must be obtained using an independent data set, because shortcomings of RP include the instability of terminal subgroups and the potential for overfitting. These are common problems with exploratory methods. The RP methodology we used addresses the overfitting problem by using tests that adjust for the fact that one has searched for optimal cut points. One way of addressing the need for validation is to use 10% of the population as an "internal" data set for independent verification of the terminal subgroups and amalgamation. However, we chose not to do this, because for our population, the number of events in the validation set would be too small. Thus, it would be important to identify populations similar to ours to see if the prognostic value of the groupings identified in this study is reproduced and if these groupings are indeed more predictive than standard stratification factors. A second caveat is that the prognostic groups are partly conditioned by patients' choice of treatment. Questions may arise as to whether the prognostic effects of race and marital status observed in the RP analyses could be partly attributed to differences in treatment choices in the racial and marital-status groups. In our study, there were no significant differences in treatments received between African Americans and whites nor between married women and others, controlling for stage, age, and PR status. Treatment recommended based on subtleties in clinical judgment could also have altered the outcome and therefore conditioned the groupings. As a third caveat, the observed treatment differences could potentially have been influenced by unmeasured factors such as structural barriers (eg, insurance coverage) and factors that influence patient freedom of choice and/or decision making (eg, attitudes and beliefs about specific treatments, ability to navigate the medical system, and personal preferences and biases). Fourth, we did not have adequate power to detect treatment differences in group A patients and the effect of adjuvant radiation in group B patients. Fifth, it will be critical to add prognostic factors such as HER-2neu, and proliferative indices to future RP analyses. The RP method will be useful to determine if these factors, which may be more costly or time-consuming, define different prognostic groups with an improved distinction among RFS potentials, compared with subgroups derived in this investigation. Finally, the advent of microarray analysis techniques to identify important prognostic subgroups may render all previous approaches moot. This seems unlikely, and it is more realistic to imagine that microarray results would be complementary to clinically defined subgroups.
The following authors or their immediate family members have indicated a financial interest. No conflict exists for drugs or devices used in a study if they are not being evaluated as part of the investigation. Received more than $2,000 a year from a company for either of the last 2 years: Eun Young Song, Ford Motor Company; William Hryniuk, Ford Motor Company.
Supported by grants from the National Science Foundation (DMS 9973410; M.B.) and the Ford Motor Company Foundation (W.H. and E.Y.S.). Authors' disclosures of potential conflicts of interest are found at the end of this article.
1. American Cancer Society: Breast Cancer Facts and Figures. Atlanta, GA, American Cancer Society, 2002, pp 9-10 2. Ries LAG, Eisner MP, Kosary CL, et al (eds): SEER Cancer Statistics Review, 1973-1998. Bethesda, MD, National Cancer Institute, 2001 3. Garfinkel L, Boring CC, Heath CW Jr: Changing trends: An overview of breast cancer incidence and mortality. Cancer 74:222-227, 1994[Medline]
4. Olivotto IA, Bajdik CD, Plenderleith IH, et al: Adjuvant systemic therapy and survival after breast cancer. N Engl J Med 330:805-810, 1994
5. Quinn M, Allen E: Changes in incidence of and mortality from breast cancer in England and Wales since introduction of screening. BMJ 311:1391-1395, 1995 6. Zhang H, Singer B: Recursive partitioning in the health sciences. New York, NY, Springer-Verlag, 1999 7. Albain KS, Crowley JJ, LeBlanc M, et al: Determinants of improved outcome in small-cell lung cancer: An analysis of the 2,580-patient southwest oncology group data base. J Clin Oncol 8:1563-1574, 1990[Abstract] 8. Erlichman C, Warde P, Gadalla T, et al: RECPAM analysis of prognostic factors in patients with stage III breast cancer. Breast Cancer Res Treat 16:31-242, 1990 9. Albain KS, Green S, LeBlanc M, et al: Proportional hazards and recursive partitioning and amalgamation analyses of the Southwest Oncology Group node-positive adjuvant CMFVP breast cancer data base: A pilot study. Breast Cancer Res Treat 22:273-284, 1992[CrossRef][Medline]
10. Curran WJ, Scott CB, Horton J, et al: Recursive partitioning analysis of prognostic factors in three Radiation Therapy Oncology Group malignant glioma trials. J Natl Cancer Inst 85:704-710, 1993 11. Katz A, Buchholz TA, Thames H, et al: Recursive partitioning analysis of locoregional recurrence patterns following mastectomy: Implications for adjuvant irradiation. Int J Radiat Oncol Biol Phys 50:397-403, 2001[CrossRef][Medline] 12. Fritz A, Ries L: The SEER Program Code Manual. (ed 3). Bethesda, MD, National Institutes of Health, 1998 13. World Health Organization: Obesity: Preventing and managing the global epidemicReport of a WHO consultation presented at the World Health Organization, June 3-5, 1997. Geneva, Switzerland, WHO, 1998 14. Beahrs OH, Myers MH: Manual for staging of cancer (ed 2). Philadelphia, PA: JB Lippincott, 1983 15. Segal M: Regression trees for censored data. Biometrics 44:35-47, 1988 16. LeBlanc M, Crowley J: Relative risk trees for censored survival data. Biometrics 48:411-425, 1992[CrossRef][Medline] 17. LeBlanc M, Crowley J: Survival trees by goodness of split. J Am Stat Assoc 88:457-467, 1993[CrossRef] 18. LeBlanc M: Tree-based methods for prognostic stratification, in Crowley J (ed): Handbook of Statistics in Clinical Oncology. New York, NY, Marcel Dekker, 2001 19. Ciampi A, Lawless JF, McKinney SM, et al: Regression and recursive partitioning strategies in the analysis of medical survival data. J Clin Epidemiol 41:737-748, 1988[CrossRef][Medline] 20. Ciampi A, Negassa A, Lou Z: Tree-structured prediction for censored survival data and the Cox model. J Clin Epidemiol 48:675-689, 1995[CrossRef][Medline] 21. Allison P: Survival analysis using the SAS system: A practical guide. Cary, NC, SAS Institute, 1995 22. Dignam JJ: Differences in breast cancer prognosis among African-American and Caucasian women. CA Cancer J Clin 50:50-64, 2000[Abstract] 23. Russell A, Langlois T, Johnson G, et al: Increasing gap in breast cancer mortality between black and white women. WMJ 98:37-39, 1999
24. Chu KC, Tarone RE, Brawley OW: Breast cancer trends of black women compared with white women. Arch Fam Med 8:521-528, 1999 25. Chen VW, Correa P, Kurman RJ, et al: Histological characteristics of breast carcinoma in blacks and whites. Cancer Epidemiol Biomarkers Prev 3:127-135, 1994[Abstract] 26. Simon MS, Severson RK: Racial differences in breast cancer survival: The interaction of socioeconomic status and tumor biology. Am J Obstet Gynecol 176:S233-S239, 1997[CrossRef][Medline] 27. Boffetta P, Merletti F, Winkelmann R, et al: Survival of breast cancer patients from Piedmont, Italy. Cancer Causes Control 4:209-215, 1993[Medline] 28. Neale AV: Racial and marital status influences on 10 year survival from breast cancer. J Clin Epidemiol 47:475-483, 1994[CrossRef][Medline] 29. Gajalakshmi CK, Shanta V, Swaminathan R, et al: A population-based survival study on female breast cancer in Madras, India. Br J Cancer 75:771-775, 1997[Medline] 30. Meng L, Maskarinec G, Lee J: Ethnicity and conditional breast cancer survival in Hawaii. J Clin Epidemiol 50:1289-1296, 1997[CrossRef][Medline]
31. Bradley CJ, Given CW, Roberts C: Race, socioeconomic status, and breast cancer treatment and survival. J Natl Cancer Inst 94:490-496, 2002 32. Lindop E, Cannon S: Evaluating the self-assessed support needs of women with breast cancer. J Adv Nurs 34:760-771, 2001[CrossRef][Medline] 33. Lee CO: Quality of life and breast cancer survivors: Psychosocial and treatment issues. Cancer Pract 5:309-316, 1997[Medline] Submitted November 27, 2002; accepted April 7, 2004. This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||
|
Copyright © 2004 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
|