Purpose: A recent imaging method (viz., Long-Film) for capturing long-length images of the spine was enabled on the Oarm™ system. Proposed work uses a custom, multi-perspective, region-based convolutional neural network (R-CNN) for labeling vertebrae in Long-Film images and evaluates approaches for incorporating long contextual information to take advantage of the extended field-of-view and improve the labeling accuracy. Methods: Evaluated methods for incorporating contextual information include: (1) a recurrent network module with long short-term memory (LSTM) added after R-CNN classification; and (2) a post-processing, sequence-sorting step based on the label confidence scores. The models were trained and validated on 11,805 Long-Film images simulated from projections of 370 CT images and tested on 50 Long-Film images of 14 cadaveric specimens. Results: The multi-perspective R-CNN with LSTM module achieved 91.7% vertebrae level identification rate, compared to 72.4% when used without LSTM, thus demonstrating the improvement of incorporating contextual information. While sequence sorting achieved 89.4% in labeling accuracy, it failed to handle errors during detection and did not provide additional improvements when applied following the LSTM module. Conclusions: The proposed LSTM module significantly improved the labeling accuracy upon the base model through effective contextual information incorporation and training in an end-to-end fashion. Compared to sequence sorting, it showed more flexibility towards false positives and false negatives in vertebrae detection. The proposed model offers the potential to provide a valuable check for target localization and forms the basis for automatic measurement of spinal curvature changes in interventional settings.
|