Image segmentation of ultrasound videos is not an easy task because of the vagueness of ultrasound images, yet it is a helpful application for intra-surgery tumor detection and key organ protection. Previous ultrasound segmentation methods, such as UNet and Swin-UNetR focus on single image segmentation, and natural image video segmentation methods such as VisTR requires a large volume of GPU memory which is not easily trained and applied with limited calculation resources. In this paper, we put forward an ultrasound video segmentation framework called Ultra-TransUNet, which makes use of both temporal and spatial context, to segment ultrasound videos. Based on the Dice metric, our method improved the Dice and sensitivity performance from baseline methods for about 2.6% and 14.1%, respectively, averaged for 3 datasets. In this paper we evaluate our methods on phantom, animal and cadaver labs for segmenting two types of clinically relevant targets: tumors through phantom studies and the ureter in animal and cadaver labs. We hope our algorithm could provide with surgeons real-time support to locate key structures with ultrasound during surgeries, and thus protect patients and improve surgical outcomes.
|