PurposeMedical imaging-based machine learning (ML) for computer-aided diagnosis of in vivo lesions consists of two basic components or modules of (i) feature extraction from non-invasively acquired medical images and (ii) feature classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography (CT) screening-detected lesions of pulmonary nodules and colorectal polyps.ApproachThree feature extraction methods were investigated. One uses the mathematical descriptor of gray-level co-occurrence image texture measure to extract the Haralick image texture features (HFs). One uses the convolutional neural network (CNN) architecture to extract deep learning (DL) image abstractive features (DFs). The third one uses the interactions between lesion tissues and X-ray energy of CT to extract tissue-energy specific characteristic features (TFs). All the above three categories of extracted features were classified by the random forest (RF) classifier with comparison to the DL-CNN method, which reads the images, extracts the DFs, and classifies the DFs in an end-to-end manner. The ML diagnosis of lesions or prediction of lesion malignancy was measured by the area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions’ tissue pathological reports were used as the learning labels.ResultsExperiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the DL-CNN of 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module and the extraction of tissue-energy specific characteristic features dramatically improved AUC value.ConclusionsThe feature extraction module is more important than the feature classification module. Extraction of tissue-energy specific characteristic features is more important than extraction of image abstractive and characteristic features.
Medical imaging-based machine learning (ML) for in vivo lesion diagnosis consists of two basic components, or modules, of (i) feature extraction from non-invasively acquired medical images and (ii) feature learning and classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography screening-detected lesions of pulmonary nodules and colorectal polyps. Three feature extraction methods were investigated. One uses the gray-level co-occurrence image texture measure to extract the well-cited Haralick image texture features (HFs). One uses the convolutional neural network (CNN) to extract deep learning (DL) image characteristic features (DFs). The third one uses the interactions between tissues and X-ray energy to extract tissue-energy specific characteristic features (TFs). All three categories of features were classified by the well-cited Random Forest (RF) classifier with comparison to the baseline of DL-CNN, which reads the images, extracts DFs and classifies DFs for the prediction in an end-to-end manner. The lesion diagnosis or prediction of lesion malignancy was measured by the well-established area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions’ tissue pathological reports were used as machine learning labels. The experiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the baseline of DL-CNN with AUC from 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module. The extraction of tissue-energy specific characteristic features is more important than the extraction of the image characteristic features.
Recent advancement of spectral computed tomography (SpCT) technologies by either multi-energy spectral data acquisition with energy-integration detector or single-energy spectral data acquisition with photon counting detector has enabled the reconstruction of virtual monochromatic images (VMIs) at any energy values within and outside the energy spectral ranges of current CTs’ X-ray tubes, resulting in the possibility of not only visualizing the tissue contrast variation characteristics along the X-ray energy dimension, but also quantifying the variation characteristics by machine learning (ML) for prediction of lesion malignancy or computer-aided diagnosis (CADx). This study explored the energy spectral information of SpCT, i.e., the contrast variation characteristics along the X-ray energy dimension, for ML-CADx of lesion type of colorectal polyps. Particularly, the tissue contrast variation patterns, called energy spectral features, along the Xray energy dimension in the VMIs is investigated. A figure of merit (FOM) for the task of ML-CADx is proposed, which ranks the series of VMIs along the X-ray energy dimension by inputting each VMI into a single channel deep learning (DL) pipeline and generating a corresponding a score of AUC (area under the curve of receiver operating characteristics). Then the FOM selects different numbers of the most highly ranked VMIs as the inputs to a multi-channel DL pipeline to generate the corresponding of AUC scores until all VMIs are selected. It is hypothesized that the AUC scores from the multi-channel DL pipeline will increase to reach the highest score and then drop along the ranking order, because all VMIs have the same anatomic structure and, therefore, the strong data redundancy. The FOM reaches the highest AUC score by minimizing the redundancy. We tested the hypothesis by comparing the proposed FOM-rank ML-CADx with the widely used Karhunen-Loève (KL) transform-based ranking method where the principal components are ordered automatically by the KL transform. The lesion data include the CT images of colorectal polyps and the pathological reports after they were resected. The proposed FOM-rank method outperformed the KL-based ranking method with an optimal gain of 4.7%, showing its effectiveness in prediction of lesion malignancy.
Computer-aided diagnosis (CADx) of polyps is essential for advancing computed tomography colonography (CTC) with diagnostic capability. In this paper, we present a study of investigating the performance between deep learning and Random Forest (RF) classifier for polyp differentiation in CTC. First, we conducted feature extraction via an extended Haralick model (eHM) to build a total of 30 texture features. The gray level co-occurrence matrix (GLCM) is generated to encode 3D CT image information into a 2D matrix as input to the convolutional neural network (CNN). Then, we split the polyp classification into two state-of-the-art frameworks: the eHM texture features/RF and the GLCM texture matrices/CNN. We evaluated their performances by the merit of area under the curve of receiver operating characteristic using 1,278 polyps (confirmed by pathology). Results demonstrated that by balancing the data, both CNN model and RF classifier can learn or analyze features effectively, and achieve high performance. RF classifier in general outperformed CNN model with a gain of 6.4% (balanced datasets) and 5.4% (unbalanced datasets), showing its effective in feature extraction and analysis for polyp differentiation. However, the performance of CNN got improved through the addition of new data with a gain of 3.6% (balanced datasets) and 3.4% (unbalanced datasets), whereas RF classifier showed no gain when we enlarged datasets. This demonstrated that CNN model have the potential to improve the classification task performance when dealing with larger dataset. This study provided valuable information on how to design experiments to improve CADx of polyps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.