Purpose: Generalizability is an important problem in deep neural networks, especially with variability of data acquisition in clinical magnetic resonance imaging (MRI). Recently, the spatially localized atlas network tiles (SLANT) can effectively segment whole brain, non-contrast T1w MRI with 132 volumetric labels. Transfer learning (TL) is a commonly used domain adaptation tool to update the neural network weights for local factors, yet risks degradation of performance on the original validation/test cohorts.
Approach: We explore TL using unlabeled clinical data to address these concerns in the context of adapting SLANT to scanning protocol variations. We optimize whole-brain segmentation on heterogeneous clinical data by leveraging 480 unlabeled pairs of clinically acquired T1w MRI with and without intravenous contrast. We use labels generated on the pre-contrast image to train on the post-contrast image in a five-fold cross-validation framework. We further validated on a withheld test set of 29 paired scans over a different acquisition domain.
Results: Using TL, we improve reproducibility across imaging pairs measured by the reproducibility Dice coefficient (rDSC) between the pre- and post-contrast image. We showed an increase over the original SLANT algorithm (rDSC 0.82 versus 0.72) and the FreeSurfer v6.0.1 segmentation pipeline (rDSC = 0.53). We demonstrate the impact of this work decreasing the root-mean-squared error of volumetric estimates of the hippocampus between paired images of the same subject by 67%.
Conclusion: This work demonstrates a pipeline for unlabeled clinical data to translate algorithms optimized for research data to generalize toward heterogeneous clinical acquisitions.
Generalizability is an important problem in deep neural networks, especially in the context of the variability of data acquisition in clinical magnetic resonance imaging (MRI). Recently, the Spatially Localized Atlas Network Tiles (SLANT) approach has been shown to effectively segment whole brain non-contrast T1w MRI with 132 volumetric labels. Enhancing generalizability of SLANT would enable broader application of volumetric assessment in multi-site studies. Transfer learning (TL) is commonly to update neural network weights for local factors; yet, it is commonly recognized to risk degradation of performance on the original validation/test cohorts. Here, we explore TL by data augmentation to address these concerns in the context of adapting SLANT to anatomical variation (e.g., adults versus children) and scanning protocol (e.g., non-contrast research T1w MRI versus contrast-enhanced clinical T1w MRI). We consider two datasets: First, 30 T1w MRI of young children with manually corrected volumetric labels, and accuracy of automated segmentation defined relative to the manually provided truth. Second, 36 paired datasets of pre- and post-contrast clinically acquired T1w MRI, and accuracy of the post-contrast segmentations assessed relative to the pre-contrast automated assessment. For both studies, we augment the original TL step of SLANT with either only the new data or with both original and new data. Over baseline SLANT, both approaches yielded significantly improved performance (pediatric: 0.89 vs. 0.82 DSC, p<0.001; contrast: 0.80 vs 0.76, p<0.001 ). The performance on the original test set decreased with the new-data only transfer learning approach, so data augmentation was superior to strict transfer learning.
Whole brain segmentation on structural magnetic resonance imaging (MRI) is essential for understanding neuroanatomical-functional relationships. Traditionally, multi-atlas segmentation has been regarded as the standard method for whole brain segmentation. In past few years, deep convolutional neural network (DCNN) segmentation methods have demonstrated their advantages in both accuracy and computational efficiency. Recently, we proposed the spatially localized atlas network tiles (SLANT) method, which is able to segment a 3D MRI brain scan into 132 anatomical regions. Commonly, DCNN segmentation methods yield inferior performance under external validations, especially when the testing patterns were not presented in the training cohorts. Recently, we obtained a clinically acquired, multi-sequence MRI brain cohort with 1480 clinically acquired, de-identified brain MRI scans on 395 patients using seven different MRI protocols. Moreover, each subject has at least two scans from different MRI protocols. Herein, we assess the SLANT method’s intra- and inter-protocol reproducibility. SLANT achieved less than 0.05 coefficient of variation (CV) for intra-protocol experiments and less than 0.15 CV for inter-protocol experiments. The results show that the SLANT method achieved high intra- and inter- protocol reproducibility.
Known for its distinct role in memory, the hippocampus is one of the most studied regions of the brain. Recent advances
in magnetic resonance imaging have allowed for high-contrast, reproducible imaging of the hippocampus. Typically, a
trained rater takes 45 minutes to manually trace the hippocampus and delineate the anterior from the posterior segment at
millimeter resolution. As a result, there has been a significant desire for automated and robust segmentation of the
hippocampus. In this work we use a population of 195 atlases based on T1-weighted MR images with the left and right
hippocampus delineated into the head and body. We initialize the multi-atlas segmentation to a region directly around each
lateralized hippocampus to both speed up and improve the accuracy of registration. This initialization allows for
incorporation of nearly 200 atlases, an accomplishment which would typically involve hundreds of hours of computation
per target image. The proposed segmentation results in a Dice similiarity coefficient over 0.9 for the full hippocampus.
This result outperforms a multi-atlas segmentation using the BrainCOLOR atlases (Dice 0.85) and FreeSurfer (Dice 0.75).
Furthermore, the head and body delineation resulted in a Dice coefficient over 0.87 for both structures. The head and body
volume measurements also show high reproducibility on the Kirby 21 reproducibility population (R2 greater than 0.95, p
< 0.05 for all structures). This work signifies the first result in an ongoing work to develop a robust tool for measurement
of the hippocampus and other temporal lobe structures.