As deep learning greatly accelerates the field of computer vision, there has been growing interest in applying deep learning models for the purpose of predicting the presence of cancer in mammography images. However, unlike in conventional object recognition where one can leverage very large diverse datasets such as ImageNet, datasets for identifying cancer with mammography images are typically small and potentially non-representative due to the high cost of acquiring medical data and labels. This makes the training and assessment of such models challenging and raises reliability as well as generalizability concerns. In this work, we propose using the jigsaw task1 as a self-supervised method to pre-train models in the case where unlabeled data is available. We show that models that are pre-trained with this task outperform randomly initialized models even when they are only trained on a half or a quarter of the train set for the malignancy prediction task. In particular, we find that when using only a quarter of the labeled data, model trained using randomly initialized weights has an area under the receiver operating characteristic curve (AUC) of 0.944. On the other hand, the model that was pre-trained with the jigsaw task achieved an AUC of 0.958 when fine-tuned on the same quarter of the training set for the malignancy prediction task, outperforming even the model that was trained on all of the labeled data starting from randomized weights (0.954 AUC). Furthermore, we propose using performance on the jigsaw task as a way to measure confidence in our model’s predictions to enable the option to abstain from making a prediction when the model is not confident. We tested multiple strategies to filter out samples on which the jigsaw model perform poorly and measured the AUC in the remaining pool of samples. We show that the best filtering strategy improves malignancy prediction performance from an AUC of 0.890 on a completely unfiltered, off-site test set from a different country to an AUC of 0.913 on the filtered set.
|