Detection of low contrast liver metastases varies between radiologists. Training may improve performance for lower-performing readers and reduce inter-radiologist variability. We recruited 31 radiologists (15 trainees, eight non-abdominal staff, and eight abdominal staff) to participate in four separate reading sessions: pre-test, search training, classification training, and post-test. In the pre-test, each radiologist interpreted 40 liver CT exams containing 91 metastases, circumscribed suspected hepatic metastases while under eye tracker observation, and rated confidence. In search training, radiologists interpreted a separate set of 30 liver CT exams while receiving eye tracker feedback and after coaching to increase use of coronal reformations, interpretation time, and use of liver windows. In classification training, radiologists interpreted up to 100 liver CT image patches, most with benign or malignant lesions, and compared their annotations to ground truth. Post-test was identical to pre-test. Between pre- and post-test, sensitivity increased by 2.8% (p = 0.01) but AUC did not change significantly. Missed metastases were classified as search errors (<2 seconds gaze time) or classification errors (>2 seconds gaze time) using the eye tracker. Out of 2775 possible detections, search errors decreased (10.8% to 8.1%; p < 0.01) but classification errors were unchanged (5.7% vs 5.7%). When stratified by difficulty, easier metastases showed larger reductions in search errors: for metastases with average sensitivity of 0-50%, 50-90%, and 90-100%, reductions in search errors were 16%, 35%, and 58%, respectively. The training program studied here may be able to improve radiologist performance by reducing errors but not classification errors.
Purpose: Radiologists exhibit wide inter-reader variability in diagnostic performance. This work aimed to compare different feature sets to predict if a radiologist could detect a specific liver metastasis in contrast-enhanced computed tomography (CT) images and to evaluate possible improvements in individualizing models to specific radiologists.Approach: Abdominal CT images from 102 patients, including 124 liver metastases in 51 patients were reconstructed at five different kernels/doses using projection domain noise insertion to yield 510 image sets. Ten abdominal radiologists marked suspected metastases in all image sets. Potentially salient features predicting metastasis detection were identified in three ways: (i) logistic regression based on human annotations (semantic), (ii) random forests based on radiologic features (radiomic), and (iii) inductive derivation using convolutional neural networks (CNN). For all three approaches, generalized models were trained using metastases that were detected by at least two radiologists. Conversely, individualized models were trained using each radiologist’s markings to predict reader-specific metastases detection.Results: In fivefold cross-validation, both individualized and generalized CNN models achieved higher area under the receiver operating characteristic curves (AUCs) than semantic and radiomic models in predicting reader-specific metastases detection ability (p < 0.001). The individualized CNN with an AUC of mean (SD) 0.85(0.04) outperformed the generalized one [AUC = 0.78 ( 0.06 ) , p = 0.004]. The individualized semantic [AUC = 0.70 ( 0.05 ) ] and radiomic models [AUC = 0.68 ( 0.06 ) ] outperformed the respective generalized versions [semantic AUC = 0.66 ( 0.03 ) , p = 0.009; radiomic AUC = 0.64 ( 0.06 ) , p = 0.03].Conclusions: Individualized models slightly outperformed generalized models for all three feature sets. Inductive CNNs were better at predicting metastases detection than semantic or radiomic features. Generalized models have implementation advantages when individualized data are unavailable.
Eye-tracking techniques can be used to understand the visual search process in diagnostic radiology. Nonetheless, most prior eye-tracking studies in CT only involved single cross-sectional images or video playback of the reconstructed volume and meanwhile applied strong constraints to reader-image interactivity, yielding a disconnection between the corresponding experimental setup and clinical reality. To overcome this limitation, we developed an eye-tracking system that integrates eye-tracking hardware with in-house-built image viewing software. This system enabled recording of radiologists’ real-time eye-movement and interactivity with the displayed images in clinically relevant tasks. In this work, the system implementation was demonstrated, and the spatial accuracy of eye-tracking data was evaluated using digital phantom images and patient CT angiography exam. The measured offset between targets and gaze points was comparable to that of many prior eye-tracking systems (The median offset: phantom – visual angle ~0.8°; patient CTA – visual angle ~0.7 – 1.3°). Further, the eye-tracking system was used to record radiologists’ visual search in a liver lesion detection task with contrast-enhanced abdominal CT. From the measured data, several variables were found to correlate with radiologists’ sensitivity, e.g., mean sensitivity of readers with longer interpretation time was higher than that of the others (88 ± 3% vs 78 ± 10%; p < 0.001). In summary, the proposed eye-tracking system has the potential of providing high-quality data to characterize radiologists’ visual-search process in clinical CT tasks.
There is substantial variability in the performance of radiologist readers. We hypothesized that certain readers may have idiosyncratic weaknesses towards certain types of lesions, and unsupervised learning techniques might identify these patterns. After IRB approval, 25 radiologist readers (9 abdominal subspecialists and 16 non-specialists or trainees) read 40 portal phase liver CT exams, marking all metastases and providing a confidence rating on a scale of 1 to 100. We formed a matrix of reader confidence ratings, with rows corresponding to readers, and columns corresponding to metastases, and each matrix entry providing the confidence rating that a reader gave to the metastasis, with zero confidence used for lesions that were not marked. A clustergram was used to permute the rows and columns of this matrix to group similar readers and metastases together. This clustergram was manually interpreted. We found a cluster of lesions with atypical presentation that were missed by several readers, including subspecialists, and a separate cluster of small, subtle lesions where subspecialists were more confident of their diagnosis than trainees. These and other observations from unsupervised learning could inform targeted training and education of future radiologists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.