Translator Disclaimer
4 March 2010 Rating scales for observer performance studies
Author Affiliations +
We compared the performance of radiologists reading a set of screening mammograms with and without CADe as measured by the BI-RADS assessment scale to that measured by a 9-point rating scale. Eight MQSA radiologists read 300 screening mammograms, of which 66 cases contained at least one cancer and 234 were normal based on two-year follow-up. Both without and then with CADe, the radiologists gave their BI-RADS assessment for each case and, for each suspicious lesion in the image, reported their confidence on a 9-point scale (1=no evidence for recall; 5=equivocal; 9=overwhelming evidence for recall) that the lesion needed to be worked up. The radiologists were instructed to read the cases as they would clinically. We used MRMC ROC analysis employing PROPROC curve fitting to analyze the data, once for the BI-RADS data and again for that collected on the 9-point scale. Given that the radiologists were reading screening mammograms and were instructed to read in their normal clinical manner, not all radiologists used the full BI-RADS scale. Two radiologists used only BI-RADS 0,1 and 2, three used the full scale, and three used the full scale but employed categories 3, 4 and 5 sparingly. This mimics what occurs clinically, according to the literature. The BI-RADS and the 9-point rating scales gave similar results in terms of AUC. However, the 95% CIs of the estimates of AUC were substantially smaller for the 9-point scale.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Robert M. Nishikawa, Yulei Jiang, and Charles E. Metz "Rating scales for observer performance studies", Proc. SPIE 7627, Medical Imaging 2010: Image Perception, Observer Performance, and Technology Assessment, 762703 (4 March 2010);

Back to Top