Medical image classification, aiming to categorize images according to the underlying lesion conditions, has been widely used in computer-aided diagnosis. Previously, most models are obtained via transfer learning where the backbone model is designed for and trained on generic image datasets, resulting in the lack of model interpretability. While adding lesion location information introduces domain-specific knowledge during transfer learning and thus helps mitigate the problem, it may bring more complicated ones. Many of the existing models are rather complex containing multiple disjoint CNN streams. In addition, they are mainly geared towards a specific task lacking adaptability across different tasks. In this paper, we present a simple and generic approach, named the Spotlight Scheme, to leverage the knowledge of lesion locations in image classification. In particular, in addition to the whole image classification stream, we add a spotlighted image stream by blacking out the non-suspicious regions. We then introduce a hybrid two-stage intermediate fusion module, namely, shallow tutoring and deep ensemble, to enhance the image classification performance. The shallow tutoring module allows the whole image classification stream to focus on the lesion area with the help of the spotlight stream. This module can be placed in any backbone architecture multiple times, and thus penetrates the entire feature extraction procedure. At a later point, a deep ensemble network is adopted to aggregate the two streams and learn a joint representation. The experimental results show state-of-the-art or competitive performance on three medical tasks, Retinopathy of Prematurity, glaucoma, and Colorectal polyps. In addition, we demonstrate the robustness of our scheme by showing that it consistently achieves promising results with different backbone architectures and model configurations.
|