Estimating model uncertainty of artificial intelligence (AI)-based breast cancer detection algorithms could help guide the reading strategy in breast cancer screening. For example, the recall decision can be made solely by AI when it exhibits high certainty, while cases where the certainty is low should be read by radiologists. This study aims to evaluate two metrics to predict model uncertainty of a lesion characterization network: 1) the variance of a set of outputs generated with stochastic layer depth, and 2) the entropy of the average output. To test these approaches, 367 mammography exams with cancer (333 screen-detected, and 34 interval) and 367 cancer-negative exams from the Dutch Breast Cancer Screening Program were included. Using a commercial lesion detection algorithm operating at high sensitivity, 6,477 suspicious regions were included (14.1% labeled malignant). By varying the uncertainty threshold, the predictions were classified as certain or uncertain by a specified proportion. Radiologists double reading had a sensitivity of 90.9% (95% CI 89.0% – 92.7%) and a specificity of 93.8% (95% CI 93.2% – 96.2%) for all regions. At equal specificity, the network had a sensitivity of 92.1% (95% CI 89.9% – 94.0%) for all regions. The sensitivity of the network was higher for regions with low uncertainty for both approaches; for the top 50% most certain regions the sensitivity was 96.9% (95% CI 94.7% – 98.4%) and 97.1% (95% CI 94.9% – 98.8%) at equal specificity to radiologists. In conclusion, AI-based lesion classification uncertainty of breast regions can be estimated by applying stochastic layer depth during prediction.
|