This paper investigates whether two publicly available Artificial Intelligence (AI) models can detect retrospectively identified missed cancers within a double reader breast screening program and determine whether challenging mammographic cases are reflected in the performance of AI models. Transfer learning was conducted on the Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) models using an Australian mammographic dataset. Mammograms were enhanced to improve poor contrast using the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The sensitivity of the two AI models with pre-trained and transfer learning modes was evaluated on four mammographic case groups: ‘missed’ cancers, ‘prior-visible’ cancers, ‘prior-invisible’ cancers and ‘current’ cancers from the archives of a double reader breast screening program. The GMIC model outperformed the GLAM model with pre-trained and transfer learning modes in terms of sensitivity for all four cancer groups. The performance of the GMIC and GLAM models was best in ‘prior-visible’ cancers, followed by ‘prior-invisible’ cancers, ‘current’ cancers and ‘missed’ cancers. The performance of the GMIC and GLAM models on the ‘missed’ cancer cases was 84.2% and 81.5%, respectively while for the ‘prior-visible’ cancer cases, the performance was 92.7% and 89.2%, respectively. After transfer learning, both the GMIC and GLAM models demonstrated statistically significant improvement (>9.4%) in terms of sensitivity for all cancer groups. The AI models with transfer learning showed significant improvement in malignancy detection in challenging mammographic cases. The study also supports the potential of the AI models to identify missed cancers within a double reader breast screening program.
This study aims to investigate how the fluctuation of time intervals between self-assessment test sets influence the performance of radiologists and radiology trainees. The data was collected from 54 radiologists and 92 trainees who completed 260 and 550 readings of 9 mammogram test sets between 2019 and 2023. Readers’ performances were evaluated via case sensitivity, lesion sensitivity, specificity, ROC AUC and JAFROC. There was significant positive correlation between the intervals of test sets and radiologist's improvement in specificity and JAFROC (P<0.05). For separations in test sets exceeding 90 days, radiologists’ performance improved for sensitivity (5.2%), lesion sensitivity (6.6%), ROC (3.1%) and JAFROC (6.3%), with specificity remaining consistent. For trainees who completed test sets within a single day, a significant postive correlation was recorded between the time intervals of test sets and their improvement in ROC AUC (P=0.008) and JAFROC (P=0.02). However, for trainees who needed more than 1 day to complete a test set, this correlation was reversed in sensitivity (P=0.009) and ROC AUC (P=0.02). The most notable progress of trainees was found in sensitivity (6.15%), lesion sensitivity (11.6%), ROC AUC (3.5%) and JAFROC (4.35%) with specificity remained unchanged when the test sets were completed between 31-90 days.
KEYWORDS: Digital breast tomosynthesis, Mammography, Education and training, Cancer, Breast cancer, Breast imaging, Diagnostics, Breast, Cancer detection, Radiology
The final stage in the medical imaging diagnostic system is the radiologist’s interpretation of the images, though research on the factors influencing performance in digital breast tomosynthesis (DBT) is inconclusive. This study seeks to understand the performance of radiologists in reading DBT images and the parameters impacting observer performance in three different countries. The study used a DBT mammogram test to compare the performance of radiologists from Australia, China and Iran in reading thirty-five DBT cases. A range of performance metrics including specificity, sensitivity, lesion sensitivity, ROC AUC and JAFROC FOM were generated for each radiologist upon the conclusion of the test set. The radiologists also provided demographic information relating to their experience in reading digital mammograms and DBT. Each country had a greater percentage of radiologists that have completed a breast imaging fellowship compared to those that have not. Australia had a greater percentage of radiologists that have completed training in DBT reading (Australia=88.2%), while China and Iran had a smaller percentage of radiologists that have not completed training in DBT reading (China=37%, Iran=40%). Significant differences were identified between the three countries in specificity (p=.001), lesion sensitivity (p=.016), ROC (p<.001) and JAFROC (p<.001). Australia had the highest mean value for all performance metrics, while China had the lowest mean value for all performance metrics. Australian radiologists have a moderate positive correlation between lesion sensitivity and the number of years reading DBT images (r=.513, p=.042). Iranian radiologists who read more than 20 DBT cases per week obtained significantly higher performance in lesion sensitivity 73.3% vs. 51.8%; p=.032) than the ones who read less than 20 DBT cases per week.
KEYWORDS: Mammography, Diagnostics, Breast density, Education and training, Breast cancer, Breast, Cancer, Cancer detection, Tissues, Statistical analysis
Previous research has revealed that Vietnamese radiologists had lower diagnostic efficacy in interpreting mammograms than radiologists from Western countries. This study investigated the improvement in diagnostic performances of Vietnamese doctors in breast cancer detection via VIETRAD (VIEtnam: Transformation of Radiological Detection) program. Data of 33 participants who completed three training sessions containing normal and cancer mammographic cases from Australia and Vietnam were assessed in sensitivity, specificity, ROC and JAFROC. Results show that Vietnamese doctors have improved their diagnostic accuracy in identifying normal and cancer cases on mammograms across different levels of breast density.
KEYWORDS: Mammography, Current controlled current source, Artificial intelligence, Education and training, Breast density, Cancer, Cancer detection, Breast cancer
This preliminary study investigates the magnitude of concordance, affecting factors and restrictions when radiologists' make annotations on mammographic images. Annotated data is key to the development of artificial intelligence (AI) tools and errors from annotations can reduce the accuracy of these tool. Two highly experienced radiologists (>20 years’ experience) provided annotations as rectangular regions of interest to mark the location of lesions when they read 856 mammographic images with known cancer signs. Mammographic images were resized to same resolution of 1664 × 768 pixels using bilinear interpolation. We calculated Lin’s concordance correlation coefficient (CCC) between the coordinates in x-axis and y-axis of the 4 corners of the overlapped annotations. The two overlapped annotations in different views (cranio-caudal (CC) and medio-lateral oblique (MLO)) were evaluated for agreement between radiologists. The values of Lin’s CCC were classified in four interpretation levels: the ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ according to McBride's guide (2015). The results demonstrated ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ concordance in 50.1%, 29.8%, 9.5% and 10.6% of the total overlapped annotations in the MLO view, with 93.1%, 5.6%, 0.3% and 1.0% of the total overlapped annotations in the CC view, respectively. Overall, the radiologists demonstrated stronger concordance when annotating the CC view compared to the MLO. Breast density (BD) also affected the concordance of the radiologists’ annotations with a decrease in the strength of concordance agreement between breast density classifications, from 0-50% BD = higher concordance to 50-100% BD = lower concordance. Our annotation investigation has implications for AI, where delineation of lesions is often the starting point for training data.
KEYWORDS: Digital breast tomosynthesis, Cancer, Cancer detection, Breast cancer, Education and training, Mammography, Breast, Diagnostics, Architectural distortion, Breast density
Introduction: Breast cancer is the most common cancer among women in China and early detection is key to reducing mortality. This study aimed to understand diagnostic performances of Chinese radiologists between FFDM (full-field digital mammography) and DBT (digital breast tomosynthesis) images in terms of lesion features and reader characteristics.
Methods: 32 Chinese radiologists read two mammogram test sets to identify cancer cases and to detect lesions. The first set was of FFDM images (60 cases, 21 cancers) and the second was of DBT images (35 cases, 15 cancers). The accuracy in cancer case detection and lesion detection of radiologists in each test set were analysed. Comparison of diagnostic performances of radiologists with different working experiences were also undertaken. Results were compared using the Wilcoxon Sign Rank and Mann-Whitney U tests.
Results: Chinese radiologists recorded higher diagnostic accuracy with FFDM than DBT for detecting certain lesion types (calcifications, architectural distortion, mixed types) and lesions ≤ 10 mm. There was no significant difference in the accuracy for cancer case detection between FFDM and DBT. Radiologists who had more than eight years working experience, read more than 60 cases per week or had no DBT training had significantly higher lesion accuracy with FFDM than DBT.
Conclusion: Chinese radiologists had higher lesion accuracy with FFDM in certain lesion types and sizes than DBT. This may be related to the lack of appropriate DBT training for radiologists in China.
KEYWORDS: Digital breast tomosynthesis, Breast density, Mammography, Breast, Cancer, Education and training, Diagnostics, Breast cancer, Cancer detection, Radiology
PurposeThis study aims to investigate the diagnostic performances of Australian and Shanghai-based Chinese radiologists in reading full-field digital mammogram (FFDM) and digital breast tomosynthesis (DBT) with different levels of breast density.ApproachEighty-two Australian radiologists interpreted a 60-case FFDM set, and 29 radiologists also reported a 35-case DBT set. Sixty Shanghai radiologists read the same FFDM set, and 32 radiologists read the DBT set. The diagnostic performances of Australian and Shanghai radiologists were assessed using truth data (cancer cases were biopsy proven) and compared overall in specificity, case sensitivity, lesion sensitivity, receiver operating characteristics (ROC) area under the curve, and jack-knife free-response receiver operating characteristics (JAFROC) figure of merit, and they were stratified by case characteristics using the Mann–Whitney U test. The Spearman rank test was used to explore the association between radiologists’ performances and their work experience in mammogram interpretation.ResultsThere were significantly higher performances of Australian radiologists compared with Shanghai radiologists in low breast density for case sensitivity, lesion sensitivity, ROC, and JAFROC in the FFDM set (P < 0.0001); in high breast density, Shanghai radiologists’ performances in lesion sensitivity and JAFROC were also lower than Australian radiologists (P < 0.0001). In the DBT test set, Australian radiologists performed better than Shanghai radiologists in cancer detection in both low and high breast density. The work experience of Australian radiologists was positively linked to their diagnostic performances, whereas this association was not statistically significant in Shanghai radiologists.ConclusionThere were significant variations in reading performances between Australian and Shanghai radiologists in FFDM and DBT across different levels of breast density, lesion types, and lesion sizes. An effective training initiative tailored to suit local readers is essential to enhancing the diagnostic accuracy of Shanghai radiologists.
Objectives: To study the effect on radiology trainees’ observer performance through the availability of prior screening mammograms as part of seven unique education test sets. Methods: Australian radiology trainees (n=150) completed 469 readings of seven educational test sets (each set with 60 cases, 40 normal and 20 cancer cases). The percentage of cases with a prior screening mammogram was 68.7%. Mammographic density (MD) evaluated via BIRADS was spread across the test sets, with 40.5% having 25-50% glandular tissue (BIRADS “B”), 37.4% of cases having 50-75% or “C”, 12.6% have a >75% MD and 9.5% having the lowest MD rating “A”. Trainees were asked to score the cases on a scale of 1 (normal), 2 (benign), 3 (equivocal findings), 4 (suspicious finding) and 5 (highly suggestive malignancy). Mann-Whitney U was used to compare the specificity and sensitivity of radiology trainees among cases with and without prior images. Results: Radiology trainees had significantly higher sensitivity across all MD levels when prior images were not available (A-B, P=0.006; C-D, P=0.027). Specificity was also significantly higher for cases of high (C-D) MD without prior images compared with priors available by trainees who read less than 20 cases per week (P=0.008). Conclusions: In a simulated environment, radiology trainees achieved better results in cases without prior images, especially for those who read less than 20 cases per week. The utility of prior case inclusion when providing education and training in reading screening mammograms needs to be revisited, especially for women with high MD.
Current literature has described the usefulness of the DBT in addition to FFDM because of the increase in cancer detection and decrease in recall rates. The primary limitations of using FFDM plus DBT for screening are the rise in radiation dose, which approximately doubles if both modalities are used. Subsequently, synthesized two-dimensional views can be reconstructed from DBT slices with the ideal to replace FFDM. Although many studies have explored the value of DBT in addition to FFDM, little attention is given to the effectiveness that synthesized views might bring to the radiologists as a supplement view for DBT. The aim of this study is to investigate the diagnostic accuracy of radiology trainees with DBT only compared with DBT plus the synthesized view (C-View). Twenty radiology trainees were asked to report a set of 35 two-projection DBT images of left and right breasts (15 were cancer cases). Another group of 8 trainees read the same DBT set with the addition of the C-View. Participants searched for the presence of lesions within the cases using the Tabar RANZCR system where 2 represented a benign lesion; 3-5 represented the suspicion of a malignancy with a higher value indicating a higher malignant possibility. The readers’ performances were evaluated via specificity, sensitivity, lesion sensitivity, ROC and JAFROC between two reading modes. The results demonstrated diagnostic metrics of participants were not significantly different in reading DBT only compared with the group reading DBT plus synthesized view (P<0.05). This finding implies that viewing DBT only could be equivalent to DBT plus C-View for radiology trainees.
Several previous studies investigate the performance of radiologists in western countries when reading 3D mammographic cases, however the diagnostic efficacy of this modality in China is understudied. This study aimed to improve the understanding of reading performance of 3D mammography among Chinese radiologists and compare their performances with Australian radiologists. One test set consisting of 35 3D mammography cases was used to assess reading performance. Twelve Chinese and twelve Australian radiologists read the test set independently and provide a score of 1-5 to each perceived cancer lesion. Case sensitivity, specificity, lesion sensitivity and Area Under the receiver operating characteristic Curve (AUC) were used to assess performance and radiologists’ characteristics were collected. Performance metrics and characteristics were compared using Mann-Whitney U tests and Fisher’s Exact tests. Higher specificity (0.65 vs 0.38, p=0.0003), lesion sensitivity (0.70 vs 0.40, p=0.0172) and AUC (0.81 vs 0.57, p=0.0001) were found in Australian radiologists compared to their Chinese counterparts. There was no difference between case sensitivity (0.82 vs 0.75, p=0.31). Higher values for number of years reading 3D mammography (p=0.0194) and cases read per week (p=0.0122) and numbers of hours of reading per week of 2D mammography (p=0.0094) were shown among the Australian group. In conclusion, Australian radiologists had higher reading performance when reading a 3D mammography test set compared to Chinese radiologists. Training and education programs of 3D mammography may effectively address this discrepancy.
This study explored the possibility of using the gist signal (radiologists’ first impression about a case) for improving the performance of two recently developed deep learning-based breast cancer detection tools. We investigated whether by combining the cancer class probability from the networks with the gist signal, higher performance in identifying malignant cases can be achieved. In total, we recruited 53 radiologists, who provided an abnormality score on a scale from 0 to 100 to unilateral mammograms following a 500-millisecond presentation of the image. Twenty cancer cases, 40 benign cases, and 20 normal were included. Two state-ofthe-art deep learning-based tools (M1 and M2) for breast cancer detection were adopted. The abnormality scores from the networks and the gist responses for each observer were fed into a support vector machine (SVM). The SVM was personalized for each radiologist and its performance was evaluated using leave-one-out cross-validation. We also considered the average reader; whose gist responses were the mean abnormality scores given by all 53 readers to each image. The mean and range of AUCs in the gist experiment were 0.643 and 0.492-0.794, respectively. The AUC values for M1 and M2 were 0.789 (0.632-0.892) and 0.814 (0.673-0.897), respectively. For the average reader, the AUC for gist, gist+M1, and gist+M2 were 0.760 (0.617-0.862), 0.847 (0.754-0.928), 0.897 (0.789-0.946). For 45 readers, the performance of at least one of the models improved after aggregating its output with the gist signal. The results showed that the gist signal has the potential to improve the performance of adopted deep learning-based tools.
This study measured the correlation between the magnitude of the presence of the abnormality gist and case difficulty based on standard presentation and reporting mechanisms for 80 cases. Half of the cases contained biopsy-proven cancer while the remainder were normal and confirmed to be cancer-free for at least two years of follow-up. In the gist experiment, seventeen breast radiologists and physicians gave an abnormality score on a scale from 0 (confident normal) to 100 (confident abnormal) to unilateral CC mammograms following a very brief, 500 millisecond presentation of the image. Independently, each mammogram was assessed by a separate sample of at least 40 radiologists using standard presentation and reporting mechanisms, with these readers asked to locate any cancers present. All readers reported at least 1000 cases annually. For each case and each category, the percentage of correct reports served as an objective measure of case difficulty (lower rate of correct report shows a more difficult case). For each of the 17 readers, the association between the abnormality scores from the gist study and detection rates from the earlier reports was examined using Spearman correlation. None of the coefficients were significantly different from zero (p<0.05). For the normal cases, the correlation coefficient between abnormality scores and detection rates for the 17 readers ranged from -0.262 to 0.258, and for cancer -0.180 to 0.309. The results suggest that the gist signal may indicate the presence of cancer, using mechanisms other than those employed in usual reporting, and might be exploited to improve breast cancer detection.
Can radiologists distinguish prior mammograms with no overt signs of cancer from women who were later diagnosed with breast cancer from the prior mammograms of women reported as normal and subsequently confirmed to be cancerfree? Twenty-three radiologists and breast physicians viewed 200 craniocaudial mammograms for a half-second and rated whether the woman would be recalled on a scale of 0 (clearly normal) to 100 (clearly abnormal). The dataset included five categories of mammograms, with each category containing 40 cases. The categories were Cancer (current cancer-containing mammograms), Prior-Vis (prior mammograms with visible cancer signs), Contra (current ‘normal’ mammograms contralateral to the cancer), Prior-Invis (priors without visible cancer signs), and Normal (priors of normal cases). For each radiologist, four pairs of analyses were performed to evaluate whether the radiologists could distinguish mammograms in each category from the normal mammograms: Cancer vs Normal, Prior-Vis vs Normal, Contra vs Normal, and Prior-Invis vs Normal. The Area under Receiver Operating Characteristic curves (AUC) was calculated for each paired grouping and each radiologist. Wilcoxon Signed Rank test showed the AUC values were above-chance for all comparisons: Cancer (z=4.20, P<0.001); Prior-Vis (z=4.11, P<0.001); Contra (z=4.17, P<0.001); Prior-Invis (z=3.71, P<0.001). The results suggest that radiologists can distinguish patients who were diagnosed with cancer from individuals without breast cancer at an above-chance level based on a half-second glimpse of mammogram even before the lesion becomes apparently visible (Prior-Invis). Apparently, something about the breast parenchyma can look abnormal before the appearance of a localized lesion
This study aims to investigate patterns of breast density among women in Vietnam and their association with demographic, reproductive and lifestyle features. Mammographic densities of 1,651 women were collected from the two largest breast cancer screening and treatment centers in Ha Noi and Ho Chi Minh city. Putative factors associated with breast density were obtained from self-administered questionnaires which considered demographic, reproductive and lifestyle elements and were provided by women who attended mammography examinations. Results show that a large proportion of Vietnamese women (78.4%) had a high breast density. With multivariable logistic regression, significant associations of high breast density were evident with women with less than 55 years old (OR=3.0), having BMI less than 23 (OR=2.2), experiencing pre-menopausal status (OR=2.9), having less than three children (OR=1.7), and being less than 32 years old when having their last child (OR=1.8). Participants who consumed more than two vegetable servings per day also had an increased risk of higher density (OR=2.6). The findings suggest some unique features regarding mammographic density amongst Vietnamese compared with westernized women.
KEYWORDS: Mammography, Breast, Cancer, Breast cancer, Diagnostics, Digital mammography, Medical imaging, Digital imaging, Data analysis, Health sciences
This study aims to investigate the effectiveness of the single cranio-caudal (CC) mammogram in comparison with traditional two projection mammography for breast cancer detection. Sixteen radiologists were invited to report 60 two-projection (MLO and CC) mammograms of the left and right breasts of which 20 cases contained cancer. Participants searched for the presence of breast lesion(s) on each view and provided a confidence score. Sensitivity, lesion sensitivity and specificity were compared between the CC projection versus the two projection approach among different groups of readers. Results showed that expert readers needed only single CC mammogram in their reading while non-expert readers required two-projection mammography.
Satisfaction of search (SOS) is a well known phenomenon in radiology, in which the detection of one abnormality facilitates the neglect of other abnormalities. Over the years SOS has been thoroughly studied primarily in chest and in trauma, and it has been found to be an elusive effect, appearing in some settings but not in others. Unfortunately, very little is known about SOS in mammography. In this study we will explore SOS in breast cancer detection by considering a case set of digital mammograms as interpreted by breast radiologists. However, the primary goal of the study will be to challenge the core of the paradigm; for decades, many have associated SOS with incomplete search, but as Kundel has put eloquently when addressing the SPIE Medical Imaging in 2004 [1], “observers do not stop viewing when one abnormality has been found on an image with multiple abnormalities”. What else could cause SOS then? According to our previous work, the first “perceived” abnormality reported by a radiologist has an influential role in the report of any other “perceived” abnormalities on the case, which supports the idea that perhaps SOS is caused a perceptual suppression of the recognition of different abnormalities. In other words, once the radiologist has made a first report (regardless of whether that first report is a TP or FP), detection and hence reporting of other abnormalities present in the case are greatly dependent on whether these associated abnormalities “fit the profile” of what has been already found.
Our research aims to assess the value of the single cranio-caudal view mammogram in the detection of breast cancer. 129 radiologists were asked to report 60 two-view mammograms of the left and right breasts and 55 radiologists assessed a set of 55 single cranio-caudal views. Participants were asked to search for the presence of any breast lesions and provide confidence scores for their decisions. Results showed that two-view mammograms were more effective in detecting malignant nodules than single cranio-caudal view in terms of sensitivity, localized-sensitivity, ROC and JAFROC. The single cranio-caudal view had a higher specificity as compared to two-view mammography.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.