Advertisement
Journal of Clinical Oncology  
Search for:
Limit by:
  Browse by Subject or Issue
Home Search or Browse JCO My JCO Subscriptions Customer Service Site Map

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Erasmus, J. J.
Right arrow Articles by Munden, R. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Erasmus, J. J.
Right arrow Articles by Munden, R. F.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?
Journal of Clinical Oncology, Vol 21, Issue 13 (July), 2003: 2574-2582
© 2003 American Society for Clinical Oncology

Interobserver and Intraobserver Variability in Measurement of Non–Small-Cell Carcinoma Lung Lesions: Implications for Assessment of Tumor Response

Jeremy J. Erasmus, Gregory W. Gladish, Lyle Broemeling, Bradley S. Sabloff, Mylene T. Truong, Roy S. Herbst, Reginald F. Munden

From the Departments of Diagnostic Radiology, Biostatistics, and Thoracic Head and Neck Medical Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX.

Address reprint requests to Jeremy J. Erasmus, MD, M.D. Anderson Cancer Center, Department of Diagnostic Radiology, 1515 Holcombe Blvd., Box 57, Houston, TX 77030; email: jerasmus{at}di.mdacc.tmc.edu.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Purpose: Response of solid malignancies to therapy is usually determined by serial measurements of tumor size. The purpose of our study was to assess the consistency of measurements performed by readers evaluating lung tumors.

Materials and Methods: The study group was composed of 33 patients with lung tumors more than 1.5 cm. Bidimensional (BD) and unidimensional (UD) measurements were performed on computed tomography (CT) scans according to the World Health Organization (WHO) criteria and the Response Evaluation Criteria in Solid Tumors (RECIST), respectively. Measurements were performed independently by five thoracic radiologists using printed film and were repeated after 5 to 7 days. Inter- and intraobserver measurement variations were estimated through statistical modeling.

Results: There were 40 tumors with an average size of 1.8 to 8.0 cm (mean, 4.1 cm). Analysis of variance showed a significant difference (P < .05) among readers and among the measured nodules for UD and BD measurements. Interobserver misclassification rates were more than intraobserver misclassification rates using either progressive disease or response criteria. The probability of misclassifying a tumor with the WHO criteria or RECIST was greatest with interobserver measurements when criteria for progression (43% BD, 30% UD) were used and lowest with intraobserver measurements when criteria for response (2.5% BD, 3.0% UD) were used. In addition, interobserver misclassification rates were more than intraobserver misclassification rates for both regular and irregular tumors.

Conclusion: Measurements of lung tumor size on CT scans are often inconsistent and can lead to an incorrect interpretation of tumor response. Consistency can be improved if the same reader performs serial measurements for any one patient.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
AN ESSENTIAL component of evaluating the results of cancer treatment in patients enrolled on clinical trials is the reporting of the response rate. Because small differences in the response rate can affect the outcome of phase I and II clinical trials, it is important that the criteria used to make this determination are meaningful and consistent. Uniform criteria for reporting response, recurrence, disease-free interval, and toxicity were proposed in 1979 after a meeting on the Standardization of Reporting Results of Cancer Treatment and were subsequently widely accepted.1,2 These criteria, known as the World Health Organization (WHO) criteria in reporting the results of cancer treatment, are based largely on tumor measurements in two dimensions (the longest perpendicular diameters in the axial plane). In 1994, the WHO criteria were reviewed and revised guidelines known as Response Evaluation Criteria in Solid Tumors (RECIST) were proposed, with treatment response determined by using a single measurement of the largest tumor diameter in the axial plane.3

Although the WHO and RECIST recommendations are largely based on assessment of serial measurements of tumor size before and after treatment, they do not provide specific instructions on the methodology of performing these measurements. The general assumption has been that these measurements can be performed by different readers and are accurate and reproducible. The purpose of our study was to determine the consistency of measurements among readers evaluating lung tumors on computed tomography (CT) scans in patients with non–small-cell lung cancer.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Tumor Measurements
Sixty patients with non–small-cell lung cancer participating in a randomized, double-blind, phase III comparative trial of a novel targeted therapy in combination with paclitaxel and carboplatin formed the study group (M.D. Anderson Cancer Center, Houston, TX). All patients had a thoracic CT scan performed at enrollment and all patients with measurable lung tumors were considered eligible. Because slice collimation was 7 to 7.5 mm in most of the CTs used in the study, patients with lung tumors less than 1.5 cm in maximal diameter were excluded from the study group, per the recommendations of RECIST. From August 2000 to May 2001 there were 33 patients who met these criteria; 16 men and 17 women who ranged in age from 43 to 78 years (mean age, 62 years).

All CT scans were performed on a HiSpeed or Lightspeed Advantage scanner (General Electric Medical Systems, Milwaukee, WI) after intravenous administration of contrast. CT scans were performed with collimation of 7 to 7.5 mm and pitch ratio of 1:1.5 in all patients except one (the scan for this patient had a 5-mm collimation). CT images were printed using conventional parameters in lung (window width, 1,500HU; window length, -600HU) and mediastinal (window width, 500HU; window level, 55HU) windows.

CT images (lung and mediastinal windows) were made available to five radiologists with thoracic fellowship training and more than 4 years of posttraining experience. They were instructed to measure tumors on preselected images. Measurements were performed on the baseline CT examination according to the criteria of the WHO (bidimensional measurement, product of the longest diameter and longest perpendicular diameter) and RECIST (unidimensional measurement, longest diameter in the axial plane). Each observer used printed film and performed measurements independently with calipers or rulers; the same preselected images were used for tumor measurements 5 to 7 days later. After completion of the measurement phase of the study, tumors were classified by consensus according to contour (ovoid or spherical, lobular, irregular) and edge characteristics (well-defined, predominantly well-defined with focal regions of irregularity, or irregular or spiculated edges) and divided into two groups: regular (those with ovoid, spherical, or lobular contours, or well-defined or predominantly well-defined edges) and irregular (those with irregular contours and irregular or spiculated edges).

Statistical Analysis
Inter- and intraobserver measurement variations were estimated through statistical modeling. The various sources of variation for the tumor measurements were modeled with the analysis of variance. The model has three factors: tumor (n = 40), readers (n = 5), and replication (n = 2). The total variance of an observation was partitioned into four sources: the three defined above plus an error term, which accounts for the variability in the tumor measurements not explained by the three factors. Thus for each response, either uni- or bidimensional measurements, the model tests for differences in the average tumor size among readers, the average tumor size between the two replications, and the average tumor size among nodules.

Using this model, the various statistical hypotheses were tested with the F test from the analysis of variance. All computations were performed with the SPSS statistical package (SPSS Interactive Graphics, Version 10.00, SPSS Inc, Chicago, IL).

The impact of measurement variability was assessed by estimating the misclassification rates between pairs of tumor measurements. For intraobserver measurements, the procedure was as follows. For each reader and each tumor, the difference between the smallest and largest measurement was computed. All measurement differences were assessed relative to the smaller measurement using RECIST and WHO criteria for progressive disease (RECIST > 20% and WHO > 25%) and relative to the larger measurement using criteria for response (RECIST > 30% and WHO > 50%). A misclassification was recorded in each group if the relative change exceeded these criteria. Thus, each reader had misclassification rates for progressive disease and response for both uni- and bidimensional tumor measurements. The five rates in each group were averaged to give overall intraobserver misclassification rates for progressive disease and response.

In a similar fashion, the interobserver misclassification rates were estimated. Only the first replication was used for this estimate. For each pair of observers, the difference between measurements of each tumor was computed. All measurement differences were assessed relative to the smaller measurement using RECIST and WHO criteria for progressive disease (RECIST > 20% and WHO > 25%) and relative to the larger measurement using criteria for response (RECIST > 30% and WHO > 50%). A misclassification was recorded in each group if the relative change exceeded these criteria. Thus, each pair of observers had misclassification rates for progressive disease and response for both uni- and bidimensional tumor measurements. The 10 rates in each group were averaged to give overall interobserver misclassification rates for progressive disease and response.

Because the RECIST and WHO criteria sum the measurements of all tumors in each patient to determine progression or response, we repeated the intra- and interobserver misclassification rate calculations on a per-patient basis using the sum of measurements for each patient. These relative changes were compared to the RECIST and WHO criteria for progressive disease and response. This procedure also accounts for any correlation in measurement variability among tumors within a single patient.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
There were 40 lung tumors in the 33 patients who formed the study group. Twenty-seven patients had one measured tumor each, four patients had two measured tumors each, and two patients had three measured tumors each. Tumors had an average measured size among the five readers of 1.8 to 8.0 cm (mean, 4.1 cm). Considering all measurements by all readers, unidimensional measurements ranged in size from 1.0 to 9.0 cm (mean, 4.1 cm) and bidimensional measurements ranged from 0.8 to 60.8 cm2 (mean, 14.5 cm2). Twenty-three of the tumors were regular and 17 were irregular. Three tumors had edges that were well defined and contours that were ovoid or spherical (n = 2) and irregular (n = 1). Twenty-one tumors had edges that were predominantly well defined and contours that were ovoid or spherical (n = 14) and lobular (n = 7). Sixteen tumors had edges that were irregular or spiculated and contours that were ovoid or spherical (n = 3), lobular (n = 1), and irregular (n = 12).

Tables 1Go and 2Go list the analysis of variance for the uni- and bidimensional measurements, respectively. For the analysis of variance, the dependent variable was the tumor size and the independent variables were reader, nodule, and time (replication). This model has a good fit for tumor size, with an R2 value of 0.895 for the unidimensional and 0.942 for the bidimensional model. For the unidimensional model, there was a significant difference (P < .05) among the five readers in the average nodule size and among the 40 nodules. With regard to replications, the difference was not significant. For the bidimensional analysis, there was a significant difference among readers, among nodules, and among replications (Figs 1Go and 2Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Analysis of Variance of Unidimensional Tumor Measurements*
 

View this table:
[in this window]
[in a new window]
 
Table 2. Analysis of Variance of Bidimensional Tumor Measurements*
 


View larger version (134K):
[in this window]
[in a new window]
 
Fig 1. Non-small-cell lung cancer manifesting as lobular, well-marginated mass. Computed tomography shows well-defined margin. Interobserver measurement variability was small (≤ 7.1% variability in maximum diameter, 14.3% in product size). Intraobserver measurement variability was also small (≤ 4.8% variability in maximum diameter, 8.6% in product size).

 


View larger version (86K):
[in this window]
[in a new window]
 
Fig 2. Non-small-cell lung cancer manifesting as poorly marginated mass. (A, B) Contiguous computed tomography images show irregular, spiculated mass. Interobserver measurement variability was large (≤ 140% variability in maximum diameter, 140% in product size). Intraobserver measurement was smaller (≤ 37% variability in maximum diameter, 20% in product size).

 
Table 3Go lists the sample size, mean, median, range, and SD of the 40 tumor sizes and provides information about the tumor size variability, interobserver variability, and intraobserver variability. Table 4Go lists the sample size, mean, median, range, and SD of the 80 tumor sizes and provides information about the interobserver variation. Table 5Go lists the sample size, mean, median, range, and SD of the 200 tumor sizes and provides information about the among-replication (intraobserver) variation of the study. A comparison of Tables 4Go and 5Go demonstrates less difference in median tumor size among replications compared with among readers, indicating less intraobserver than interobserver variability.


View this table:
[in this window]
[in a new window]
 
Table 3. Tumor Measurements for All Observers, Replications, and Methods (N = 40)
 

View this table:
[in this window]
[in a new window]
 
Table 4. Tumor Measurements for All Observers and Methods (N = 80)
 

View this table:
[in this window]
[in a new window]
 
Table 5. Tumor Measurements for All Methods and Replications (N = 200)
 
Tables 6Go and 7Go, which list intraobserver and interobserver misclassification rates using RECIST and WHO criteria for progressive disease, demonstrate the potential impact of measurement variability. Intraobserver relative measurement change varied from 0.0% to 80% for unidimensional measurements and 0.0% to 230% for bidimensional measurements. This resulted in an average of 3.8 misclassifications/reader (9.5% of tumors) for unidimensional measurements and 8.2 misclassifications/reader (21% of tumors) for bidimensional measurements. Interobserver relative measurement change varied from 0.0% to 194% for unidimensional measurements and 0.0% to 493% for bidimensional measurements. This resulted in an average of 11.9 misclassifications/reader pair (30% of tumors) for unidimensional measurements and 17 misclassifications/reader pair (43% of tumors) for bidimensional measurements.


View this table:
[in this window]
[in a new window]
 
Table 6. Intraobserver Misclassifications by Tumor Using Criteria for Progressive Disease (RECIST > 20%, WHO > 25%)
 

View this table:
[in this window]
[in a new window]
 
Table 7. Interobserver Misclassifications by Tumor Using Criteria for Progressive Disease (RECIST > 20%, WHO > 25%)
 
Tables 8Go and 9Go, which list intraobserver and interobserver misclassification rates using RECIST and WHO criteria for response, demonstrate less potential influence of measurement variability than when criteria for progressive disease are used. Intraobserver relative measurement change varied from 0.0% to 44% for unidimensional measurements and 0.0% to 70% for bidimensional measurements. This resulted in an average of 1.2 misclassifications/reader (3.0% of tumors) for unidimensional measurements and 1.0 misclassifications/reader (2.5% of tumors) for bidimensional measurements. Interobserver relative measurement change varied from 0.0% to 66% for unidimensional measurements and 0.0% to 83% for bidimensional measurements. This resulted in an average of 5.5 misclassifications/reader pair (14% of tumors) for unidimensional measurements and 3.3 misclassifications/reader pair (8.3% of tumors) for bidimensional measurements.


View this table:
[in this window]
[in a new window]
 
Table 8. Intraobserver Misclassifications by Tumor Using Criteria for Response (RECIST > 30%, WHO > 50%)
 

View this table:
[in this window]
[in a new window]
 
Table 9. Interobserver Misclassifications by Tumor Using Criteria for Response (RECIST > 30%, WHO > 50%)
 
The intraobserver and interobserver misclassification rates as calculated on a per-patient basis using RECIST and WHO criteria for progressive disease and response are listed in Tables 10Go to 13Go. When progressive disease criteria were used, intraobserver average misclassifications were 3.0 patients/reader (9.1% of patients) for unidimensional measurements and 7.2 patients/reader (22% of patients) for bidimensional measurements. Interobserver average misclassifications were 10.1 patients/reader pair (31% of patients) for unidimensional measurements and 14.3 patients/reader pair (43% of patients) for bidimensional measurements. When response criteria were used, intraobserver average misclassifications were 1.2 patients/reader (3.6% of patients) for unidimensional measurements and 1.0 patients/reader (3.0% of patients) for bidimensional measurements. Interobserver average misclassifications were 4.9 patients/reader pair (15% of patients) for unidimensional measurements and 3.3 patients/reader pair (10% of patients) for bidimensional measurements. The lower number of patients compared to the number of tumors resulted in slightly lower numbers of misclassifications but slightly higher misclassification rates because of the smaller denominator. However, there were no meaningful differences in misclassification rates compared to intra- and interobserver measurements for the individual tumors.


View this table:
[in this window]
[in a new window]
 
Table 10. Intraobserver Misclassifications by Case Using RECIST and WHO Criteria for Progressive Disease (RECIST > 20%, WHO > 25%)
 

View this table:
[in this window]
[in a new window]
 
Table 13. Interobserver Misclassifications by Case Using RECIST and WHO Criteria for Response (RECIST > 30%, WHO > 50%)
 
To demonstrate the effect of tumor morphologic characteristics (contour and edge) on measurements, the analysis of variance was repeated on the regular and irregular tumor categories independently. For the regular and irregular tumors, the analysis of variance was repeated using the three factors of reader, nodule (ie, which nodule is being measured), and replication. For both types of tumors, the F test from the analysis of variance indicated a significant (P < .05) difference between readers and nodules. For the difference among replications, the analysis of variance test was not significant, with P = .394 for the regular and 0.190 for the irregular nodules.

Similarly, the misclassification calculations were repeated for the regular (Fig 1Go) and irregular tumor (Fig 2Go) categories separately. Intraobserver misclassifications of regular tumors (n = 23) occurred in 6.1% of tumors (average, 1.4 misclassifications/reader) for unidimensional measurements and 17% of tumors (average, 4.0 misclassifications/reader) for bidimensional measurements. For irregular tumors (n = 17), intraobserver misclassifications occurred in 14% of tumors (average, 2.4 misclassifications/reader) for unidimensional measurements and 25% of tumors (average, 4.2 misclassifications/reader) for bidimensional measurements. Interobserver misclassifications of regular tumors occurred in 19% of tumors (average, 4.3 misclassifications/reader) for unidimensional measurements and 35% of tumors (average, 8.0 misclassifications/reader) for bidimensional measurements. For irregular tumors, interobserver misclassifications occurred in 45% of tumors (average, 7.6 misclassifications/reader) for unidimensional measurements and 53% of tumors (average, 9.0 misclassifications/reader) for bidimensional measurements. There were higher rates of misclassifications of irregular tumors, but intraobserver misclassifications remained lower than interobserver misclassifications, regardless of tumor shape.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Improvements and advances in cancer therapy are dependent on continual investigation of anticancer agents in clinical trials. Evaluation of response to the different treatment regimens within these trials and meaningful comparison of these results to other trials requires consistent criteria for determining response. The antitumor effect of a treatment in patients with solid tumors can be determined clinically, biochemically, by surgical pathologic restaging, by or image-based serial measurements of tumor size.3–5 Tumor measurements before and after treatment are usually considered preferable even though errors in measurement can affect the interpretation of response to anticancer chemotherapy.6,7 Because these measurements can be used to warrant additional testing of an agent or regimen, extrapolate clinical benefit, allow an assumption of a palliative effect, or determine whether therapy should be continued, the reproducibility of these measurements is a critical component of many cancer trials.

To prevent an inappropriate designation of tumor response, tumor measurements should be standardized and accurate, with low intra- and interobserver variability. Although the guidelines concerning the specifications for radiologic imaging are well defined (posteroanterior radiographs in full inspiration with constant film-to-tube distance, CT scans performed with intravenous contrast unless contraindicated, and contiguous slices with collimation of 7 to 7.5 mm to measure lesions 1.5 cm or greater), the methodology of measuring lesions is poorly defined.3,8 For instance, the new RECIST guidelines state that measurements should be performed with electronic calipers or a ruler, and that the same method of assessment and the same technique should be used to characterize each identified and reported lesion at baseline and during follow-up.3 In this regard, lung or soft tissue windows can be used to measure lung lesions, provided the same window settings are used throughout the study. How irregular or spiculated lesions should be measured is not addressed. If mediastinal windows are used, the individual spicules are generally less apparent and the measurement incorporates mainly the solid component of the tumor. However, visualization of the spicules is optimized with lung window settings. Should these measurements include the entire length of the spicule or should an estimation be made about the solid component of the tumor?

We elected to perform measurements using calipers or rulers on film to duplicate common clinical practice in which many studies are reviewed without the benefit of a picture archiving and communication system workstation. Although no statistical difference has been reported between measurements obtained with hand-held calipers on film and electronic calipers on a picture archiving and communication system workstation, the use of an automated computer technique that delineates tumor margins using a tissue segmentation technique may allow a more accurate and reproducible method to measure tumors.9 It has also been suggested that measurement of changes in volume of a tumor may allow an earlier and more accurate assessment of tumor response.10–12 These techniques have not been validated in large trials. However, in a study of 22 patients with non–small-cell lung cancer, there was substantial agreement between unidimensional, bidimensional, and volumetric assessment when tumor size was evaluated.13 CT measurement of tumor volume requires that the tumor margin be outlined on each axial image. A potential inherent limitation of this technique when poorly defined lesions are evaluated is that the accuracy of the tumor delineation depends on either subjective determination of tumor margins or computer software that uses tissue segmentation techniques.

Although interobserver agreement in size determination of small pulmonary tumors has been reported to be good when spiral CT is used to assess well-defined tumors, the results of our study show that there can be considerable variability in measurements of lung tumors performed by different readers, especially if the lesion is irregular.14,15 In our study, which used WHO and RECIST guidelines, progressive disease (on the basis of growth percentage change of > 20% for unidimensional measurements and > 25% for bidimensional measurements, respectively) was erroneously determined to have occurred in 43% and 30% of lesions, respectively, when tumors were measured by different observers and in 21% and 9.5% of lesions, respectively, when tumors were measured by one observer. As expected, the larger percentage change used to determine response (decrease in size of > 30% for unidimensional measurements and > 50% for bidimensional measurements, respectively) resulted in lower overall rates of misclassifications than when the criteria for progressive disease were used. Overall, however, the range of tumor misclassification that occurs when all tumors are assumed to have responded or progressed according to RECIST and WHO criteria provides an indication of what could be encountered in clinical practice and the potential effect measurement variability could have on patient management and clinical trials.

It is important to note that because survival and not response was the primary end point for the patients participating in this comparative trial at our institution, the outcome of this trial was not adversely affected by intra- and interobserver measurement variability. However, because many clinical trials use percentage change in size or cross-sectional area of lung lesions in the determination of complete response, partial response, stable disease, and progressive disease, marked interobserver differences in measurements could erroneously affect the outcome. Accordingly, to improve the assessment of tumor response at our institution, the same reader performs all serial measurements for any one patient enrolled onto a clinical cancer trial. RECIST recommendations do not specifically address this issue and do not prohibit multiple readers from performing tumor measurements during a trial.3 At present, RECIST guidelines recommend that when the response rate is the primary end point of a trial, an expert independent of the study should review all responses when the study is completed.16–18 Although this review by a third party can be useful in confirming tumor responses claimed by investigators, we believe that an amendment of the WHO and RECIST guidelines mandating that all serial measurements for any one patient be performed by the same reader for the duration of the trial would improve assessment of tumor response.

The assessment of objective response has also been complicated further by the development of gene therapy as well as treatment protocols that target tumor biology, including tumor cell proliferation and invasion, angiogenesis, and metastasis. The antitumor effect in many of these regimens is cytostatic and, unlike anticancer cytotoxic agents, may not cause regression in tumor size. Because of this inherent limitation of assessing response together with the inaccuracy of serial measurements, anatomic measurements are not optimal in the investigation of cytostatic agents in cancer trials. Accurate determination of response may require functional and molecular techniques that assess metabolism, growth kinetics, angiogenesis growth factors, tumor cell markers, and in vivo genetic alterations and gene expression.19–21 In this regard, positron emission tomography (PET) is a clinically available method that is sensitive for studying molecular interactions in vivo. Fluorine-18 (F18)–fluorodeoxyglucose uptake, which is increased in most malignant tumors, and oxygen-15–labeled water can be measured using PET and may allow an early and sensitive assessment of the antitumor effect of anticancer chemotherapy (metabolism and blood flow, respectively).22–24 Accordingly, the European Organization for Research and Treatment of Cancer PET Study Group has recently published guidelines for the use of F18-fluorodeoxyglucose to determine response to therapy in anticancer studies.22 Measurement of tumor response using PET and carbon-11–thymidine is also being developed and may allow an early assessment of the effectiveness of anticancer chemotherapy, although large-scale trials are needed to validate molecular imaging in the assessment of response.25

In summary, serial measurements of tumor size before and after treatment are commonly used as an objective assessment of response to therapy in patients with solid tumors. However, measurements of lung tumor size on CT scans are often inconsistent and can lead to an incorrect interpretation of tumor response. Interobserver variability of measurements is greater than intraobserver variability; measurement differences are greatest when the edge of the lesion is irregular or spiculated and differences are smallest when the edge is well defined. Because these differences could lead to an incorrect interpretation of tumor growth or response, the methodology for measuring lesions needs to be improved. Clarification is needed about how irregular tumors should be measured. Furthermore, to limit interobserver variability, the same reader should perform measurements for any one patient for the duration of the clinical cancer trial.


View this table:
[in this window]
[in a new window]
 
Table 11. Interobserver Misclassifications by Case Using RECIST and WHO Criteria for Progressive Disease (RECIST > 20%, WHO > 25%)
 

View this table:
[in this window]
[in a new window]
 
Table 12. Intraobserver Misclassifications by Case Using RECIST and WHO Criteria for Response (RECIST > 30%, WHO > 50%)
 

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
1. WHO: WHO Handbook for Reporting Results of Cancer Treatment. Geneva, Switzerland, World Health Organization, WHO offset publication 48, 1979

2. Miller AB, Hoogstraten B, Staquet M, et al: Reporting results of cancer treatment. Cancer 47:207–214, 1981[CrossRef][Medline]

3. Therasse P, Arbuck SG, Eisenhauer EA, et al: New guidelines to evaluate the response to treatment in solid tumors: European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 92:205–216, 2000[Abstract/Free Full Text]

4. Pujol JL, Demoly P, Daures JP, et al: Chest tumor response measurement during lung cancer chemotherapy: Comparison between computed tomography and standard roentgenography. Am Rev Respir Dis 145:1149–1154, 1992[Medline]

5. Langendijk HA, Lamers RJ, ten Velde GP, et al: Is the chest radiograph a reliable tool in the assessment of tumor response after radiotherapy in nonsmall cell lung carcinoma? Int J Radiat Oncol Biol Phys 41:1037–1045, 1998[CrossRef][Medline]

6. Warr D, McKinney S, Tannock I: Influence of measurement error on assessment of response to anticancer chemotherapy: Proposal for new criteria of tumor response. J Clin Oncol 2:1040–1046, 1984[Abstract]

7. Lavin PT, Flowerdew G: Studies in variation associated with the measurement of solid tumors. Cancer 46:1286–1290, 1980[CrossRef][Medline]

8. Saini S: Radiologic measurement of tumor size in clinical trials: Past, present, and future. AJR Am J Roentgenol 176:333–334, 2001[Free Full Text]

9. Schwartz LH, Ginsberg MS, DeCorato D, et al: Evaluation of tumor measurements in oncology: Use of film-based and electronic techniques. J Clin Oncol 18:2179–2184, 2000[Abstract/Free Full Text]

10. Yankelevitz DF, Reeves AP, Kostis WJ, et al: Small pulmonary nodules: Volumetrically determined growth rates based on CT evaluation. Radiology 217:251–256, 2000[Abstract/Free Full Text]

11. Hopper KD, Kasales CJ, Eggli KD, et al: The impact of 2D versus 3D quantitation of tumor bulk determination on current methods of assessing response to treatment. J Comput Assist Tomogr 20:930–937, 1996[CrossRef][Medline]

12. Nawaratne S, Fabiny R, Brien JE, et al: Accuracy of volume measurement using helical CT. J Comput Assist Tomogr 21:481–486, 1997[CrossRef][Medline]

13. Werner-Wasik M, Xiao Y, Pequignot E, et al: Assessment of lung cancer response after nonoperative therapy: Tumor diameter, bidimensional product, and volume—A serial CT scan-based study. Int J Radiat Oncol Biol Phys 51:56–61, 2001[Medline]

14. Wormanns D, Diederich S, Lentschig MG, et al: Spiral CT of pulmonary nodules: Interobserver variation in assessment of lesion size. Eur Radiol 10:710–713, 2000[CrossRef][Medline]

15. Hopper KD, Kasales CJ, Van Slyke MA, et al: Analysis of interobserver and intraobserver variability in CT tumor measurements. AJR Am J Roentgenol 167:851–854, 1996[Abstract/Free Full Text]

16. Gwyther SJ: Response assessment using radiological methods. Crit Rev Oncol Hematol 30:45–62, 1999[Medline]

17. Gwyther S, Bolis G, Gore M, et al: Experience with independent radiological review during a topotecan trial in ovarian cancer. Ann Oncol 8:463–468, 1997[Abstract/Free Full Text]

18. Thiesse P, Ollivier L, Di Stefano-Louineau D, et al: Response rate accuracy in oncology trials: Reasons for interobserver variability—Groupe Francais d’Immunotherapie of the Federation Nationale des Centres de Lutte Contre le Cancer. J Clin Oncol 15:3507–3514, 1997[Abstract/Free Full Text]

19. Smith TA: FDG uptake, tumour characteristics and response to therapy: A review. Nucl Med Commun 19:97–105, 1998[Medline]

20. Voss SD, Kruskal JB: Gene therapy: A primer for radiologists. Radiographics 18:1343–1372, 1998[Abstract]

21. Weissleder R: Molecular imaging: Exploring the next frontier. Radiology 212:609–614, 1999[Free Full Text]

22. Young H, Baum R, Cremerius U, et al: Measurement of clinical and subclinical tumour response using [18F]-fluorodeoxyglucose and positron emission tomography: Review and 1999 EORTC recommendations—European Organization for Research and Treatment of Cancer (EORTC) PET Study Group. Eur J Cancer 35:1773–1782, 1999[CrossRef][Medline]

23. Weber WA, Schwaiger M, Avril N: Quantitative assessment of tumor metabolism using FDG-PET imaging. Nucl Med Biol 27:683–687, 2000[CrossRef][Medline]

24. Herbst RS, Mullani NA, Davis DW, et al: Development of biologic markers of response and assessment of antiangiogenic activity in a clinical trial of human recombinant endostatin. J Clin Oncol 20:3804–3814, 2002[Abstract/Free Full Text]

25. Shields AF, Mankoff DA, Link JM, et al: Carbon-11-thymidine and FDG to measure therapy response. J Nucl Med 39:1757–1762, 1998[Abstract/Free Full Text]

Submitted January 23, 2003; accepted April 13, 2003.


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Facebook Facebook   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Ann OncolHome page
R. Thiam, L. S. Fournier, L. Trinquart, J. Medioni, G. Chatellier, D. Balvay, B. Escudier, C. Dromain, C. A. Cuenod, and S. Oudard
Optimizing the size variation threshold for the CT evaluation of response in metastatic renal cell carcinoma treated with sunitinib
Ann. Onc., November 4, 2009; (2009) mdp466v1.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
H. A. Jacene, S. Leboulleux, S. Baba, D. Chatzifotiadis, B. Goudarzi, O. Teytelbaum, K. M. Horton, I. Kamel, K. J. Macura, H.-L. Tsai, et al.
Assessment of Interobserver Reproducibility in Quantitative 18F-FDG PET and CT Measurements of Tumor Response to Therapy
J. Nucl. Med., November 1, 2009; 50(11): 1760 - 1769.
[Abstract] [Full Text] [PDF]


Home page
Ann OncolHome page
P. A. Tang, G. R. Pond, and E. X. Chen
Influence of an independent review committee on assessment of response rate and progression-free survival in phase III clinical trials
Ann. Onc., October 29, 2009; (2009) mdp478v1.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
J. Wang, N. Wu, M. D. Cham, and Y. Song
Tumor Response in Patients With Advanced Non-Small Cell Lung Cancer: Perfusion CT Evaluation of Chemotherapy and Radiation Therapy
Am. J. Roentgenol., October 1, 2009; 193(4): 1090 - 1096.
[Abstract] [Full Text] [PDF]


Home page
Br. J. Radiol.Home page
K MARTEN, C DULLIN, W MACHANN, J S SCHMID, M DAS, K-P HERMANN, and C ENGELKE
Comparison of flat-panel-detector-based CT and multidetector-row CT in automated volumetry of pulmonary nodules using an anthropomorphic chest phantom
Br. J. Radiol., September 1, 2009; 82(981): 716 - 723.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
B. Zhao, L. P. James, C. S. Moskowitz, P. Guo, M. S. Ginsberg, R. A. Lefkowitz, Y. Qin, G. J. Riely, M. G. Kris, and L. H. Schwartz
Evaluating Variability in Tumor Measurements from Same-day Repeat CT Scans of Patients with Non-Small Cell Lung Cancer
Radiology, July 1, 2009; 252(1): 263 - 272.
[Abstract] [Full Text] [PDF]


Home page
ChestHome page
P. J. Nietert, J. G. Ravenel, W. M. Leue, J. V. Miller, K. K. Taylor, E. S. Garrett-Mayer, and G. A. Silvestri
Imprecision in Automated Volume Measurements of Pulmonary Nodules and Its Effect on the Level of Uncertainty in Volume Doubling Time Estimation
Chest, June 1, 2009; 135(6): 1580 - 1587.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
R. L. Wahl, H. Jacene, Y. Kasamon, and M. A. Lodge
From RECIST to PERCIST: Evolving Considerations for PET Response Criteria in Solid Tumors
J. Nucl. Med., May 1, 2009; 50(Suppl_1): 122S - 150S.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
W. A. Weber
Assessing Tumor Response to Therapy
J. Nucl. Med., May 1, 2009; 50(Suppl_1): 1S - 10S.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
M. A. Gavrielides, L. M. Kinnard, K. J. Myers, and N. Petrick
Noncalcified Lung Nodules: Volumetric Assessment with Thoracic CT
Radiology, April 1, 2009; 251(1): 26 - 37.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
W. Cai, A. Kassarjian, M. A. Bredella, G. J. Harris, H. Yoshida, V. F. Mautner, R. Wenzel, and S. R. Plotkin
Tumor Burden in Patients with Neurofibromatosis Types 1 and 2 and Schwannomatosis: Determination on Whole-Body MR Images
Radiology, March 1, 2009; 250(3): 665 - 673.
[Abstract] [Full Text] [PDF]


Home page
Am Soc Clin Oncol Ed BookHome page
W. A. Weber
The Use of Positron Emission Tomography versus Computerized Tomography Scans in the Assessment of Response to Therapy
ASCO Educational Book, January 1, 2009; 2009(1): 474 - 478.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
T. Tanvetyanon, E. A. Eikman, E. Sommers, L. Robinson, D. Boulware, and G. Bepler
Computed Tomography Response, But Not Positron Emission Tomography Scan Response, Predicts Survival After Neoadjuvant Chemotherapy for Resectable Non-Small-Cell Lung Cancer
J. Clin. Oncol., October 1, 2008; 26(28): 4610 - 4616.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
F. Girvin and J. P. Ko
Pulmonary Nodules: Detection, Assessment, and CAD
Am. J. Roentgenol., October 1, 2008; 191(4): 1057 - 1069.
[Abstract] [Full Text] [PDF]


Home page
The OncologistHome page
C. H. Takimoto
Commentary: Tumor Growth, Patient Survival, and the Search for the Optimal Phase II Efficacy Endpoint
Oncologist, October 1, 2008; 13(10): 1043 - 1045.
[Full Text] [PDF]


Home page
Br. J. Radiol.Home page
P S MURPHY, T J McCARTHY, and A S K DZIK-JURASZ
The role of clinical imaging in oncological drug development
Br. J. Radiol., September 1, 2008; 81(969): 685 - 692.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
L. E. Dodd, E. L. Korn, B. Freidlin, C. C. Jaffe, L. V. Rubinstein, J. Dancey, and M. M. Mooney
Blinded Independent Central Review of Progression-Free Survival in Phase III Clinical Trials: Important Design Element or Unnecessary Expense?
J. Clin. Oncol., August 1, 2008; 26(22): 3791 - 3796.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
Y. Wang, R. J. van Klaveren, H. J. van der Zaag-Loonen, G. H. de Bock, H. A. Gietema, D. M. Xu, A. L. M. Leusveld, H. J. de Koning, E. T. Scholten, J. Verschakelen, et al.
Effect of Nodule Characteristics on Variability of Semiautomated Volume Measurements in Pulmonary Nodules Detected in a Lung Cancer Screening Program
Radiology, August 1, 2008; 248(2): 625 - 631.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
P. N. Lara Jr, M. W. Redman, K. Kelly, M. J. Edelman, S. K. Williamson, J. J. Crowley, and D. R. Gandara
Disease Control Rate at 8 Weeks Predicts Clinical Benefit in Advanced Non-Small-Cell Lung Cancer: Results From Southwest Oncology Group Randomized Trials
J. Clin. Oncol., January 20, 2008; 26(3): 463 - 467.
[Abstract] [Full Text] [PDF]


Home page
Am Soc Clin Oncol Ed BookHome page
W. M. Stadler
Tumor Burden Endpoints and Phase II Clinical Trial Design
ASCO Educational Book, January 1, 2008; 2008(1): 89 - 93.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. S. Gierada, T. K. Pilgram, M. Ford, R. M. Fagerstrom, T. R. Church, H. Nath, K. Garg, and D. C. Strollo
Lung Cancer: Interobserver Agreement on Interpretation of Pulmonary Findings at Low-Dose CT Screening
Radiology, December 1, 2007; 246(1): 265 - 272.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
T. G. Karrison, M. L. Maitland, W. M. Stadler, and M. J. Ratain
Design of Phase II Cancer Trials Using a Continuous Endpoint of Change in Tumor Size: Application to a Study of Sorafenib and Erlotinib in Non Small-Cell Lung Cancer
J Natl Cancer Inst, October 3, 2007; 99(19): 1455 - 1461.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
C. Beigelman-Aubry, P. Raffy, W. Yang, R. A. Castellino, and P. A. Grenier
Computer-Aided Detection of Solid Lung Nodules on Follow-Up MDCT Screening: Evaluation of Detection, Tracking, and Reading Time
Am. J. Roentgenol., October 1, 2007; 189(4): 948 - 955.
[Abstract] [Full Text] [PDF]


Home page
Mayo Clin Proc.Home page
E. A. Hahn, D. Cella, O. Chassany, D. L. Fairclough, G. Y. Wong, R. D. Hays, and Clinical Significance Consensus Meeting Group
Precision of Health-Related Quality-of-Life Data Compared With Other Clinical Measures
Mayo Clin. Proc., October 1, 2007; 82(10): 1244 - 1254.
[Abstract] [Full Text] [PDF]


Home page
Br Med BullHome page
J. D. Berry and G. J. R. Cook
Positron emission tomography in oncology
Br. Med. Bull., December 14, 2006; (2006) ldl013v1.
[Abstract] [Full Text] [PDF]


Home page
Br. J. Radiol.Home page
M P Sampat, G J Whitman, T W Stephens, L D Broemeling, N A Heger, A C Bovik, and M K Markey
The reliability of measuring physical characteristics of spiculated masses on mammography
Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S134 - S140.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
G. T. Budd, M. Cristofanilli, M. J. Ellis, A. Stopeck, E. Borden, M. C. Miller, J. Matera, M. Repollet, G. V. Doyle, L. W.M.M. Terstappen, et al.
Circulating Tumor Cells versus Imaging--Predicting Overall Survival in Metastatic Breast Cancer.
Clin. Cancer Res., November 1, 2006; 12(21): 6403 - 6409.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
S. G. Jennings, H. T. Winer-Muram, M. Tann, J. Ying, and I. Dowdeswell
Distribution of Stage I Lung Cancer Growth Rates Determined with Serial Volumetric CT Measurements
Radiology, November 1, 2006; 241(2): 554 - 563.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
C. C. Jaffe
Measures of Response: RECIST, WHO, and New Alternatives
J. Clin. Oncol., July 10, 2006; 24(20): 3245 - 3251.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. P. Ko, R. Marcus, E. Bomsztyk, J. S. Babb, C. Stefanescu, M. Kaur, D. P. Naidich, and H. Rusinek
Effect of blood vessels on measurement of nodule volume in a chest phantom.
Radiology, April 1, 2006; 239(1): 79 - 85.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
L. R. Goodman, M. Gulsun, L. Washington, P. G. Nagy, and K. L. Piacsek
Inherent Variability of CT Lung Nodule Measurements In Vivo Using Semiautomated Volumetric Measurements.
Am. J. Roentgenol., April 1, 2006; 186(4): 989 - 994.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. G. Armato III, J. L. Ogarek, A. Starkey, N. J. Vogelzang, H. L. Kindler, M. Kocherginsky, and H. MacMahon
Variability in mesothelioma tumor response classification.
Am. J. Roentgenol., April 1, 2006; 186(4): 1000 - 1006.
[Abstract] [Full Text] [PDF]


Home page
Neuro Oncol DukeHome page
G. D. Shah, S. Kesari, R. Xu, T. T. Batchelor, A. M. O'Neill, F. H. Hochberg, B. Levy, J. Bradshaw, and P. Y. Wen
Comparison of linear and volumetric criteria in assessing tumor response in adult high-grade gliomas
Neuro-oncol, January 1, 2006; 8(1): 38 - 46.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
R. F. Munden, S. S. Swisher, C. W. Stevens, and D. J. Stewart
Imaging of the Patient with Non-Small Cell Lung Cancer
Radiology, December 1, 2005; 237(3): 803 - 818.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
H. I. Scher, M. J. Morris, W. K. Kelly, L. H. Schwartz, and G. Heller
Prostate Cancer Clinical Trial End Points: "RECIST"ing a Step Backwards
Clin. Cancer Res., July 15, 2005; 11(14): 5223 - 5232.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
V. Goh, S. Halligan, J.-A. Hugill, P. Bassett, and C. I. Bartram
Quantitative Assessment of Colorectal Cancer Perfusion Using MDCT: Inter- and Intraobserver Agreement
Am. J. Roentgenol., July 1, 2005; 185(1): 225 - 231.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
D. R. Aberle, C. Chiles, C. Gatsonis, B. J. Hillman, C. D. Johnson, B. L. McClennan, D. G. Mitchell, E. D. Pisano, M. D. Schnall, and A. G. Sorensen
Imaging and Cancer: Research Strategy of the American College of Radiology Imaging Network
Radiology, June 1, 2005; 235(3): 741 - 751.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. M. Goo, T. Tongdee, R. Tongdee, K. Yeo, C. F. Hildebolt, and K. T. Bae
Volumetric Measurement of Synthetic Lung Nodules with Multi-Detector Row CT: Effect of Various Image Reconstruction Parameters and Segmentation Thresholds on Measurement Accuracy
Radiology, June 1, 2005; 235(3): 850 - 856.
[Abstract] [Full Text] [PDF]


Home page
Ann OncolHome page
M. J. Byrne and A. K. Nowak
Modified RECIST criteria for assessment of response in malignant pleural mesothelioma
Ann. Onc., February 1, 2004; 15(2): 257 - 260.
[Abstract] [Full Text] [PDF]


Home page
Jpn J Clin OncolHome page
J. O. Park, S. I. Lee, S. Y. Song, K. Kim, W. S. Kim, C. W. Jung, Y. S. Park, Y.-H. Im, W. K. Kang, M. H. Lee, et al.
Measuring Response in Solid Tumors: Comparison of RECIST and WHO Response Criteria
Jpn. J. Clin. Oncol., October 1, 2003; 33(10): 533 - 537.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Save to my personal folders
Right arrow Download to citation manager
Right arrowRights & Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Erasmus, J. J.
Right arrow Articles by Munden, R. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Erasmus, J. J.
Right arrow Articles by Munden, R. F.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

About
JCO
 Editorial
Roster
 Advertising
Information
 Librarians &
Institutions
 Rights &
Permissions
 PDA Services

Copyright © 2003 by the American Society of Clinical Oncology, Online ISSN: 1527-7755. Print ISSN: 0732-183X
Terms and Conditions of Use
  HighWire Press HighWire Press™ assists in the publication of JCO Online