Medical imaging-based machine learning (ML) for in vivo lesion diagnosis consists of two basic components, or modules, of (i) feature extraction from non-invasively acquired medical images and (ii) feature learning and classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography screening-detected lesions of pulmonary nodules and colorectal polyps. Three feature extraction methods were investigated. One uses the gray-level co-occurrence image texture measure to extract the well-cited Haralick image texture features (HFs). One uses the convolutional neural network (CNN) to extract deep learning (DL) image characteristic features (DFs). The third one uses the interactions between tissues and X-ray energy to extract tissue-energy specific characteristic features (TFs). All three categories of features were classified by the well-cited Random Forest (RF) classifier with comparison to the baseline of DL-CNN, which reads the images, extracts DFs and classifies DFs for the prediction in an end-to-end manner. The lesion diagnosis or prediction of lesion malignancy was measured by the well-established area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions’ tissue pathological reports were used as machine learning labels. The experiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the baseline of DL-CNN with AUC from 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module. The extraction of tissue-energy specific characteristic features is more important than the extraction of the image characteristic features.
|