The performance and evaluation of segmentation algorithms will benefit from large fully annotated data sets, but the heavy workload of manual contouring is unrealistic in clinical and research practice. In this work, we propose a method of automatically creating pseudo ground truth (p-GT) segmentations of anatomical objects from given sparse manually annotated slices and utilize them to evaluate actual segmentations. Sparse slices are selected spatially evenly on the whole slice range of the target object, where one slice is selected to conduct manual annotation and the next t slices are skipped, repeating this process starting from one end of the object to its other end. A shape-based interpolation (SI) strategy and an object-specific 2D U-net based deep learning (DL) strategy are investigated to create p-GT. The largest t value where the created p-GT is considered to be not statistically significantly different from the actual ground with its natural imprecision due to variability in manually specified ground truth is determined as the optimal t for the considered object. Experiments are conducted on ~300 computed tomography (CT) studies involving two objects – cervical esophagus and mandible and two segmentation evaluation metrics – Dice Coefficient and average symmetric boundary distance. Results show that the DL strategy overwhelmingly outperforms the SI strategy, where ~95% and ~66-83% of manual workload can be reduced without sacrificing evaluation accuracy compared to actual ground truth data via the DL and SI strategies respectively. Furthermore, the p-GT with optimal t is able to evaluate actual segmentations with accurate metric values.
|