The communication of reliable uncertainty estimates is crucial in the effort towards increasing trust in Deep Learning applications for medical image analysis. Importantly, reliable uncertainty estimates should remain stable under naturally occurring domain shifts. In this study, we evaluate the relationship between epistemic uncertainty and segmentation quality under domain shift within two clinical contexts: optic disc segmentation in retinal photographs and brain tumor segmentation from multi-modal brain MRI. Specifically, we assess the behavior of two epistemic uncertainty metrics derived from i, a single UNet’s sigmoid predictions, ii, deep ensembles, and iii, Monte Carlo dropout UNets, each trained with both soft Dice and weighted cross-entropy loss. Domain shifts were modeled by excluding a group with a known characteristic (glaucoma for optic disc segmentation and low-grade glioma for brain tumor segmentation) from model development and using the excluded data as additional, domain-shifted test data. While the performance of all models dropped slightly on the domain-shifted test data compared to the in-domain test set, there was no change in the Pearson correlation coefficient between the uncertainty metrics and the Dice scores of the segmentations. However, we did observe differences in the performance of two quality assessment applications based on epistemic uncertainty between the segmentation tasks. We introduce a new metric, the empirical strength distribution, to better describe the strength of the relationship between segmentation performance and epistemic uncertainty on a dataset level. We found that failures of the studied quality assessment applications were largely caused by shifts in the empirical strength distributions between training, in-domain, and domain-shifted test datasets. In conclusion, quality assessment tools based on the strong relationship between epistemic uncertainty and segmentation quality can be stable under small domain shifts. Developers should thoroughly evaluate the strength relationships for all available data and, if possible, under domain shift to ensure the validity of these uncertainty estimates on unseen data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.