Open Access
29 November 2017 Automatic classification of retinal three-dimensional optical coherence tomography images using principal component analysis network with composite kernels
Leyuan Fang, Chong Wang, Shutao Li, Jun Yan, Xiangdong Chen, Hossein Rabbani
Author Affiliations +
Abstract
We present an automatic method, termed as the principal component analysis network with composite kernel (PCANet-CK), for the classification of three-dimensional (3-D) retinal optical coherence tomography (OCT) images. Specifically, the proposed PCANet-CK method first utilizes the PCANet to automatically learn features from each B-scan of the 3-D retinal OCT images. Then, multiple kernels are separately applied to a set of very important features of the B-scans and these kernels are fused together, which can jointly exploit the correlations among features of the 3-D OCT images. Finally, the fused (composite) kernel is incorporated into an extreme learning machine for the OCT image classification. We tested our proposed algorithm on two real 3-D spectral domain OCT (SD-OCT) datasets (of normal subjects and subjects with the macular edema and age-related macular degeneration), which demonstrated its effectiveness.

1.

Introduction

Macula is an oval-shaped pigmented area near the center of retina, which is mainly responsible for the central vision. Damages to the macula, such as macular edema (ME) and age-related macular degeneration (AMD), will directly result in the loss of central vision.13 Clinical diagnosis of ME and AMD relies on the localization of macular structure abnormalities (also called lesions), whose types and numbers are important diagnostic criteria for ophthalmologists. For example, compared with the normal macula (NM) [see an example in Fig. 1(a)], edema and exudates are often related to diabetic retinopathy (one type of ME)4 [see an example in Fig. 1(b)], while drusen is a typical lesion often found in AMD eyes5 [see an example in Fig. 1(c)]. Therefore, it is essential to investigate the macular lesions for the clinical diagnosis and treatment of ophthalmic disease.

Fig. 1

Examples of (a) OCT image with NM: clear subretinal boundaries; (b) OCT image with ME: swollen retina, edema, and exudates around the macula; and (c) OCT image with AMD: RPE layer interrupted by drusen of various sizes.

JBO_22_11_116011_f001.png

Optical coherence tomography (OCT) can provide in vivo three-dimensional (3-D) cross-sectional imaging of human tissue at micrometer resolutions;68 it has been widely used for a variety of medical imaging applications.913 High resolution of OCT enables the visualization of multiple retinal cell layers and the capability of volumetric quantitative evaluation of the retinal structures.1416 By employing near-infrared light to image the eye with micron resolution, subtle while valuable pathological structures can be observed, from which many macular and ocular diseases can be identified in their early stages.1720 In the clinical diagnosis, ophthalmologists need to manually identify various macular lesions at each cross section of the OCT volume and then determine the types of disease. Such manual analysis is time-consuming and demanding for expert graders and often yields subjective results. Consequently, it is urgent to develop an effective computer-assisted OCT image analysis technique.

During the past decades, a multitude of classification methods has been developed for the automatic analysis of OCT images.2128 In general, these OCT image classification approaches mainly consist of the following two key components: feature extraction2127 and classifier design.2226,28 The feature extraction first extracts a set of representative features to describe the original OCT images, and then the classifier determines the type of disease by mapping the extracted features to a category. For the feature extraction of the OCT images, Sugmk et al.21 first segmented the retinal pigment epithelium (RPE) layer and then computed the binary features from the layer for the identification of AMD and diabetic macular edema (DME). Liu et al.22 computed the multiscale local binary pattern (LBP) features to encode the texture and shape information of the OCT images. Srinivasan et al.23 utilized the multiscale histogram of oriented gradient (HOG) features, which are useful for the detection of AMD, DME, and NM, to describe each OCT B-scan. Hassan et al.24 used five features (three thickness profiles and two cyst fluids) based on structure tensors to detect ME and central serous retinopathy. For each specific kind of lesion in the OCT image, the above aforementioned works2124 design the corresponding features and can provide promising classification results. However, the above features (e.g., LBP, HOG, and structure tensor) are designed based on fixed mathematical models, which means that the features designed for one type of lesion may be suboptimal for representing other kinds of lesions. Since the clinically acquired OCT images usually contain very complex pathological structures, a more desirable strategy is to learn features from the original OCT images. Sun et al.25 used sparse coding and a multiscale dictionary learning method based on scale-invariant feature transform descriptor to extract representative features from the input images for the detection of AMD, DME, and NM. Venhuizen et al.26 introduced an unsupervised feature learning29 approach to distinguish AMD from normal volumes, in which small descriptive image patches are selectively extracted to create patch occurrence histogram features. Very recently, in Ref. 27, a typical deep learning model called the convolutional neural network (CNN)3032 was utilized to extract the very high-level features from each OCTB-scan, and it delivered very satisfactory results. In general, the CNN has many network layers and usually contains at least millions of layer parameters, which thus requires a very huge amount of training datasets and computational cost to train networks.

On the other hand, numerous researchers have attempted to design various classifiers [e.g., support vector machines (SVMs),2225 random forest,26 and sequential minimal optimization (SMO)28] for the OCT image classification task. In Refs. 2223.24.25, the extracted features were considered feature vectors in the SVM classifier, which identifies the presence of NM and each of the pathologies. In Ref. 26, a supervised random forest classifier was trained using the aforementioned feature vectors for category discrimination. Wang et al.28 systematically evaluated these classifiers and demonstrated that the SMO tended to achieve the best performance. However, all the classifiers in the works are only designed for the classification of single B-scan, without considering the correlations among B-scans of the 3-D OCT images.

To address the above issues, we propose a feature learning-based classification algorithm called the principal component analysis network with composite kernel (PCANet-CK) for the automatic diagnosis of AMD, ME, and NM in OCT images. First, a PCANet model33 is used to automatically extract multiple level features from each B-scan of the 3-D OCT images. Compared with the CNN, the PCANet has a simple network structure and much less parameters, which can be more efficiently trained. Second, a set of the most important features of each volume are carefully selected to construct multiple kernels, which are then combined together to create a composite kernel. Finally, the composite kernel34 is fed into an extreme learning machine (ELM)35,36 classifier for the classification of 3-D OCT images. Such a composite kernel-based classification strategy can jointly exploit the correlations among features of the 3-D OCT images while still reducing the computational cost for classifying the 3-D OCT images.

The rest of this paper is organized as follows. Section 2 briefly reviews the PCANet feature extraction method and ELM classifier. Section 3 introduces the proposed PCANet-CK model. Experimental results on two clinically acquired datasets and related discussions are detailed in Sec. 4. Section 5 concludes this paper and suggests some future works.

2.

Review

2.1.

PCANet Feature Extraction

As a modified version of the typical deep CNN, the PCANet method33 adopts a series of PCA convolution filters to extract the features from an input image. A commonly used PCANet model consists of two PCA filter convolution stages and one output stage. Figure 2 shows the typical architecture of the PCANet method, as also described in the following.

Fig. 2

Illustration of the PCANet architecture with two convolution stages and one output stage.

JBO_22_11_116011_f002.png

In the first convolution stage, given H training images XhRM×N, h{1,2,,H}, we convolve each image with L1 PCA filters in a patch-based pattern to obtain L1×H feature maps Xh1,l, l{1,2,,L1}. Specifically, we first extract the patches xhi (of size n1×n2) for each image Xh and remove the mean for each patch. Then, the extracted patches of each image Xh are vectorized and can be combined into a matrix Ph=[P¯h1,,P¯hi,P¯hM*N*], where P¯hiRn1n2×1 is a vector for the related patch, M*=Mn1/2, N*=Nn2/2, and n gives the smallest integer greater than or equal to n. The matrices of all the training images are also constructed as a matrix:

Eq. (1)

P=[P1,P2,,PH].

After that, we compute the eigenvectors of PPT and select the first L1 principal eigenvectors as the L1 PCA filters Wl1, l=1,2,,L1. Finally, L1 filters are separately applied to each training image Xh:

Eq. (2)

Xh1,l=Xh*Wl1,l=1,2,,L1,
where the operator * denotes the two-dimensional convolution. So, for a given input image Xh, L1 feature maps Xh1,l can be obtained.

For each feature map Xh1,l from the first stage, the second stage convolves this map with L2 PCA filters to obtain L2 feature maps Xh2,l, l{1,2,,L2}. Similar to the first stage, we also first extract the mean-removed patches from all L1×H maps Xh1,l and use the vector version of patches to construct the matrix U. Then, we compute the eigenvectors of UUT, and select the first L2 principal eigenvectors as the L2 PCA filters Wl2, l=1,2,,L2. As in Eq. (2), by applying the L2 PCA filters on each feature map Xh1,l from the first stage, we can obtain L1×L2×H feature maps. Given an input image Xh, the first stage has L1 feature maps, and the second stage can create L1×L2 feature maps.

In the output stage, we first binarize the L1×L2 feature maps of the second stage using a Heaviside step function and apply the hashing encoding to convert them into L1 integer-valued images Zhl, l{1,2,,L1}.33 Then, each integer-valued image Zhl is partitioned into B blocks (of size nB1×nB2), in which the local histogram (overlapping or nonoverlapping) is calculated. After that, all B histograms are concatenated into a vector Bhist(Zhl). Finally, the feature vector of the one input image Xh can be extracted as

Eq. (3)

fh=[Bhist(Zh1),Bhist(Zh2),,Bhist(ZhL1)]T.

The extracted feature vector can be fed into a classifier (e.g., SVM or ELM) for the classification.

2.2.

ELM Classifier

ELM is a very efficient supervised learning model, and its objective is to find a decision rule for the classification.35,36 To be specific, let {xi,yi}i=1H be the training set. {x1,x2,,xH}Rm×1 denote the H input training samples and yi=[0,,1,,0]TRG×1 represent the corresponding class labels, which are a vector with one for the true label and zero entries for the others. G denotes the number of training classes. In general, the ELM aims to simultaneously minimize the training error and the norm of output weights of the objective function

Eq. (4)

minβ,ξ{12β2+C2i=1Hξi2}s.t.  βTϕ(xi)=yiξi,  i=1,2,,H,
where ϕ() is a feature mapping function determining the hidden layer output. βR|ϕ()|×G are the output weights, ξRG×H is the training error matrix, and |ϕ()| represents the element number of the vector ϕ(). C is a regularization parameter balancing the norm of output weights and training errors. Based on the Lagrange multiplier method and the Karush–Kuhn–Tucker optimality conditions,35 the solution of Eq. (4) can be analytically obtained as

Eq. (5)

β^=ϕT(IC+ϕϕT)1YT,
where ϕ=[ϕ(x1),ϕ(x2),,ϕ(xH)]TRH×|ϕ()|, Y=[y1,y2,,yH]RG×H, and I is an identity matrix.

As described in Refs. 34 and 37, kernel can further transform the features into a higher dimensional feature space and thus improve the discriminative capacity for the features. Since the mapping ϕ() in the ELM learning is represented by the inner function, a kernel function K can be defined by

Eq. (6)

K(xi,xj)=ϕ(xi),ϕ(xj).
Then, we can establish a liner ELM by the kernel function, without considering the mapping ϕ() explicitly. The most widely used kernel is the linear kernel, which can be calculated as follows:

Eq. (7)

K(xi,xj)=xiTxj.

Finally, by incorporating Eq. (6) into Eq. (5), the decision rule of the ELM for any test sample x is determined by

Eq. (8)

f(x)=β^Tϕ(x).

3.

Proposed PCANet-CK Method

In this paper, we propose a PCANet-CK method, which combines PCANet with composite kernel models for the classification of 3-D OCT images. The PCANet-CK method mainly consists of two parts: (1) apply the PCANet on each B-scan to automatically extract the features and (2) use the composite kernel to exploit the correlations among features of B-scans in each volume for the classification. The outline of the proposed PCANet-CK algorithm is shown in Fig. 3.

Fig. 3

Outline of the proposed PCANet-CK algorithm.

JBO_22_11_116011_f003.png

3.1.

Feature Extraction with PCANet

Since OCT images have large variations in intensity ranges, we first conduct the intensity normalization on each B-scan of the training and testing OCT volumes, which linearly rescales the intensity values to [0, 1]. Then, as in Ref. 23, for training and testing OCT images, we adopt the BM3D algorithm38 to remove the noise of each B-scan and use the flattening to reduce the variations for imaged retinas in their angles of inclination, positions, and natural curvatures among B-scans. After the above preprocessing steps, the PCANet feature extraction consists of training and testing phases. Specifically, in the training phase, we first input T 3-D OCT volumes VttrainRM×N×Ht, t=1,2,,T into the PCANet training model. Each volume contains Ht B-scans XhtrainRM×N, h={1,2,,Ht}. As described in Sec. 2.1, the PCANet will take all B-scans of all training OCT volumes together as input. Then, we decompose the B-scans into patches and use the patches to train L1 PCA filters Wl1, l=1,2,,L1 and L2 PCA filters Wl2, l=1,2,,L2, in the first and second stages, respectively. Finally, after the binary quantization and calculating the block-wise histogram in the output stage, for each B-scan Xhtrain, one feature vector fhtrain is obtained. For each volume Vttrain, its extracted features can be constructed as a matrix Fttrain=[f1train,f2train,,fHttrain].

In the testing phase, given an input 3-D OCT volume VttestRM×N×Ht, we apply the PCA filters Wl1, l=1,2,,L1 and Wl2, l=1,2,,L2 from the training phase on each B-scan of this OCT volume. Then, after the binary quantization and computation of the block-wise histogram, we obtain one feature matrix Fttest=[f1test,f2test,,fHttest].

Figure 4 shows two examples of the extracted feature maps from different layers on the AMD and ME OCT images, respectively. As can be observed, for the input AMD image, the drusen regions will have high responses in the different feature maps of different layers [see the zoomed rectangle regions in Fig. 4(a)], while the edema is also very prominent in feature maps extracted from the ME image [see the zoomed rectangle regions in Fig. 4(b)]. Therefore, the learned PCA filters tend to capture meaningful pathology structure information, and the extracted feature maps can be used to achieve an effective classification, even without quantifying the size of the lesions.

Fig. 4

Two examples of feature maps extracted by PCANet from OCT images on subjects (a) AMD and (b) ME.

JBO_22_11_116011_f004.png

3.2.

Classification with Composite Kernel

As described above, given an OCT volume, the PCANet can extract one feature vector for each single B-scan in both the training and testing phases. Since the nearby B-scans of the 3-D OCT images are very similar, high correlations should also exist in their extracted features, which can be utilized to enhance the classification.39 One possible way for exploiting the correlations among features within one 3-D OCT volume is to fuse them together. As described in Ref. 40, using the kernel to map the features into the higher dimensional feature space and then combining them as a composite kernel can better utilize the correlations and differences among different features. In addition, the parameters of different features are integrated into the composite kernel function and jointly optimized during the training process,37 which can also effectively combine the information of different features for classification. Specifically, for each feature fh of a 3-D OCT volume, we use the linear kernel function in Eq. (7) to compute the corresponding kernel Kh. As introduced in Refs. 41 and 42, directly stacking these kernels or a linear combination of them can exploit the correlations among them. Since stacking these kernels will greatly increase the dimension and computational cost, we create the composite kernel by a linearly weighted combination of these kernels

Eq. (9)

Kcomp=h=1HμhKh,
where μh is the weight for the kernel Kh. Note that the kernel weights are the same for all the training and testing OCT volumes. In the training phase, for T training volumes, we can create T composite kernels Kcomptrain,t, t=1,2,,T. Then, we replace the kernel in Eq. (6) with these composite kernels to train an ELM decision rule. In the testing phase, for each input OCT volume, we create one composite kernel Kcomptest and apply the decision rule on this kernel to assign the class label to the volume.

Note that, since PCANet extracts many feature vectors (e.g., more than 30 in our test) from each volume, a number of corresponding weights μh, h=1,2,,H in Eq. (9) are required to be set, and searching the optimal values for so many weights would be very hard. In addition, different OCT volumes may consist of different numbers Ht of B-scans; thus, the number of kernel weights for different volumes is also varied. To address this issue, before the construction of multiple kernels, we employ the PCA transform43 to reduce the number of features and create the same number of kernel weights for each volume. Specifically, the PCA transform maps the feature matrix Ft=[f1,,f2,,,fHt] to a new set of principal components (PCs): FtPCA=[fPC1,fPC2,fPC3] using the SVD technique on the covariance matrix of Ft, where several eigenvectors are calculated as the optimal projection axes. In this paper, we only retain the first three PCs, which account for the most information in the feature matrix Ft. After such a feature selection strategy, only three kernels (see Fig. 3) are generated, which are then fused as a composite kernel.

4.

Experimental Results

4.1.

Datasets

To evaluate the effectiveness of the proposed PCANet-CK method, we tested it on two real OCT datasets acquired from Duke University and First Affiliated Hospital of Hunan University of Chinese Medicine (HUCM), respectively. The Duke dataset was acquired using the spectral domain (SD)-OCT imaging system (Heidelberg Engineering Inc., Heidelberg, Germany) by Duke University, Harvard University, and the University of Michigan, which was publicly available in Ref. 44. The Duke dataset consists of 45 SD-OCT volumes from 45 patients (15 AMD, 15 DME, and 15 normal), and each volume contains many B-scans (ranging from 31 to 97). The original axial and lateral resolutions of the B-scans are 3.87  μm/pixel and 6 to 12  μm/pixel, respectively. More details of the scanning protocols for this dataset can be found in Ref. 23. Three examples from NM, ME, and AMD subjects of the Duke dataset are shown in Fig. 5(a).

Fig. 5

Examples of SD-OCT images from (a) Duke dataset and (b) HUCM dataset.

JBO_22_11_116011_f005.png

The HUCM dataset was captured using the spectral SD-OCT imaging system (Heidelberg Engineering, Heidelberg, Germany) at the First Affiliated Hospital HUCM. The HUCM dataset is composed of 54 SD-OCT volumes (18 AMD, 18 DME, and 18 normal) from 48 patients. All volumes are of size 768×496×31  voxels, covering 8.8×8.8×2.0  mm3. Figure 5(b) shows three B-scans from NM, ME, and AMD subjects of the HUCM dataset, respectively. Note that only small pathological structures (e.g., drusen) exist in some B-scans of HUCM dataset, which are very challenging for recognition. The above two datasets used in our study were approved by the local Investigational Review Board and were performed in accordance with the tenets set forth in the Declaration of Helsinki. Written informed consent was obtained before enrolling patients in EUGENDA.

4.2.

Experimental Setting

In the proposed PCANet method, the main parameters include the PCA filter size, the block size for local histograms, and the filter numbers L1 and L2. In our experiment, we set the PCA filter size n1=n2=11, the block size nB1=nB2=11, and the block overlap ratio as 0.1. The effect of the PCA filter size and block size was analyzed in Sec. 4.4. As described in Ref. 33, L1 and L2 were set to be 8. In addition, for the composite kernel creation, the kernel weights μ1, μ2, and μ3 for the three kernels were set as 0.85, 0.10, and 0.05, respectively. The regularization parameter C in Eq. (5) for the ELM classifier training was set to be 5.

In our experiments, we utilized the cross validation to evaluate the classification performance of the proposed PCANet-CK method. The cross validations were repeated with different random seeds to avoid the dataset splitting bias. The accuracy, specificity, and sensitivity were adopted to evaluate the classification performances, which were defined in a binary classification problem:

Eq. (10)

accuracy=TP+TNTP+TN+FP+FN,

Eq. (11)

sensitivity=TPTP+FN,

Eq. (12)

specificity=TNTN+FP.
where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.

In our three-class classification problem, the sensitivity of a class label is also the prediction accuracy, and the specificity is defined in the same way for each class label, where the negative samples are the samples not in the considered class. Therefore, the overall sensitivity (Ov-Se), overall specificity (Ov-Sp), and overall accuracy (Ov-Acc) are averaged over the three-class labels. If the number of samples in each class is equal, the Ov-Se is also defined as the ratio: number of correctly predicted samples/total number of samples. In addition, the mean and standard deviation values of the above metrics are calculated.

4.3.

Results Comparisons

The performance of the proposed PCANet-CK method was first compared with the well-known OCT classification method: HOG-SVM.23 The 15-fold (Duke dataset) and 18-fold (HUCM dataset) cross validations was repeated 10 times with different random seeds. The HOG-SVM method utilizes the HOG descriptor to extract the feature vectors and trains three binary SVMs45 for the classification. The leave-three-out cross-validation strategy was applied on the two test datasets; this was achieved by choosing three volumes once from each class as the test dataset and the remaining 42 (in Duke dataset) or 51 (in HUCM dataset) volumes as the training dataset. For different experiments, three different volumes were chosen as test volumes, and 15 or 18 experiments were conducted to cover all the volumes in the two datasets.

The quantitative results on the Duke and HUCM datasets are tabulated in Tables 1 and 2, respectively. As can be observed, in the two test datasets, the proposed PCANet-CK method consistently performs better than the HOG-SVM method in terms of all quantitative metrics. Specifically, in the Duke dataset, the PCANet-CK can accurately classify all the test volumes of the three classes, whereas the Ov-Acc for the HOG-SVM method is about 96.6%. In the HUCM dataset, the Ov-Acc for the PCANet-CK method is 96.9%, whereas the Ov-Acc for the HOG-SVM method is about 90.7%. Moreover, in the classes of AMD and NM on HUCM dataset, the gain of the mean sensitivity of the proposed method over the HOG-SVM method is more than 9%, which demonstrates the effectiveness of the PCANet feature extraction and the composite kernel for exploiting the 3-D information.

Table 1

Classification results (%) on Duke dataset.

MethodsClassesSensitivitySpecificityAccuracyOv-SeOv-SpOv-Acc
HOG-SVM23AMD100.0±0.096.7±0.097.8±0.094.9±1.597.4±0.896.6±1.0
ME97.3±3.497.0±1.197.1±1.5
NM87.3±2.198.7±1.794.9±1.5
PCANet-CKAMD100.0±0.0100.0±0.0100.0±0.0100.0±0.0100.0±0.0100.0±0.0
ME100.0±0.0100.0±0.0100.0±0.0
NM100.0±0.0100.0±0.0100.0±0.0

Table 2

Classification results (%) on HUCM dataset.

MethodsClassesSensitivitySpecificityAccuracyOv-SeOv-SpOv-Acc
HOG-SVM23AMD82.8±1.893.3±1.989.8±1.686.1±1.693.1±0.890.7±1.1
ME86.1±2.999.7±0.995.1±1.3
NM89.4±1.886.1±0.087.2±0.6
PCANet-CKAMD92.2±4.796.9±0.995.4±1.695.4±1.697.7±0.896.9±1.1
ME94.4±0.098.3±1.497.0±1.0
NM99.4±1.897.8±1.898.3±1.1

Very recently, the deep CNN method was also tested on the Duke dataset.27 The deep CNN method first uses a pretrained CNN model (GoogleNet) and then fine-tunes it on the Duke dataset to extract the features for the identification of AMD, DME, and NM. In the experimental setting of Ref. 27, the cross validation was also utilized by dividing the whole dataset into 15 folds, with each fold containing three volumes (one from each class). However, different from the above validation, each experiment involved eight folds (24 volumes) for training and seven folds (21 volumes) for testing. Folds were sequentially rather than randomly chosen. Here, the proposed PCANet-CK and HOG-SVM methods were tested under the same experimental setting as in Ref. 27. Since only the mean sensitivity was reported in Ref. 27, the value of the deep CNN method was compared with those of the PCANet-CK and HOG-SVM methods, as reported in Table 3. As can be seen, the proposed PCANet-CK method generally performs much better than the HOG-SVM and deep CNN methods for classifying the images of AMD and ME subjects. In addition, for the classification of images from NM subject, the PCANet-CK delivers better performance than the HOG-SVM, while is very close to the deep CNN. These results show the superiority of the feature learning strategy used in PCANet-CK and deep CNN methods over the hand-crafted HOG feature extraction strategy adopted in the HOG-SVM method.

Table 3

Classification performances (sensitivities in %) of different methods on Duke dataset.

HOG-SVM23Deep CNN27PCANet-CK
AMD898994
ME838694
NM909998
Note: The best results among different methods are labeled in bold.

In our experiments, the proposed PCANet-CK method was implemented on a desktop computer with an Intel (R) Core i7-6700K CPU and 64 GB of RAM under the environment of MATLAB R2016b. The average time for testing one volume requires about 7.2 and 3.1 s on the Duke and HUCM datasets, respectively. Note that, since the training phase of the PCANet-CK method can be an offline process, it does not need to be considered in the testing phase. In addition, our code is not optimized for speed. The processing time is expected to be reduced significantly by more efficient coding coupled with a general purpose graphics processing unit.

4.4.

Effect of the PCA Filter Size and Block Size

In this section, the effect of the PCA filter size and the block size on the proposed PCANet model was analyzed on the Duke and HUCM datasets. The leave-three-out cross validation and the Ov-Se were also used here. For the evaluation of the PCA filter size, we varied the PCA filter size (n1=n2) in the first two convolution stages from 3 to 19 and kept other parameters (e.g., block size, overlap ratio, and kernel weights) the same as that in Sec. 4.2. The results with different PCA filter sizes are shown in Fig. 6(a). As can be observed, the performance of the proposed PCANet-CK method will generally improve as the PCA filter size increases from 3 to 11. When the PCA filter size further increases, our performance will become stable or even worse. Since utilizing a larger PCA filter will create higher computational cost, the PCA filter size is set to 11. For the evaluation of block size, we varied the block size (nB1=nB2) in the output stage in a range from 3 to 19. Figure 6(b) shows the classification results with different block sizes on two test datasets. As can be seen, our PCANet-CK method is comparatively stable and achieves the best performance with the block sizes from of 3×3, 5×5, and 11×11 on two datasets. In this paper, we set the block size to be 11×11.

Fig. 6

Overall sensitivities of the PCANet-CK method on Duke and HUCM datasets for different (a) PCA filter sizes in the first two stages and (b) block sizes in the output stage.

JBO_22_11_116011_f006.png

4.5.

Comparisons of the Composite Kernel with Single Kernel

We also conducted additional experiments using a single-kernel ELM classifier to validate the superiority of the composite kernel classifier. Specifically, after obtaining three PCs for one volume, we use them to separately create three different kernels. However, instead of fusing these kernels together, only one of them is utilized to train the ELM classifier and test one volume. The method using the first kernel is called the PCANet-K1. The method using the second kernel is denoted the PCANet-K2. The method using the third kernel is termed the PCANet-K3. The results for the Duke and HUCM datasets are tabulated in Tables 4 and 5. As can be observed, in both the Duke and HUCM datasets, the composite kernel method consistently performs better than the single kernel method in terms of all quantitative metrics, which demonstrates the effectiveness of the composite kernel for exploiting the correlations among adjacent B-scans within one OCT volume. The results also show that the composite kernel ELM can utilize complementary information of different single kernels to further enhance classification.

Table 4

Classification results (%) on Duke dataset using single kernel.

MethodsClassesSensitivitySpecificityAccuracyOv-SeOv-SpOv-Acc
PCANet-K1AMD100.0±0.099.0±1.699.3±1.198.2±1.499.1±0.798.8±0.9
ME98.0±3.298.3±1.898.2±1.4
NM96.7±3.5100.0±0.098.9±1.2
PCANet-K2AMD84.0±3.492.3±2.389.6±2.183.8±2.191.9±1.189.2±1.4
ME68.0±2.897.3±2.187.6±2.2
NM99.3±2.186.0±2.190.4±1.5
PCANet-K3AMD52.0±6.969.3±7.363.6±5.643.3±4.271.7±2.162.2±2.8
ME46.7±8.367.0±7.660.2±5.8
NM31.3±9.578.7±5.762.9±4.7
PCANet-CKAMD100.0±0.0100.0±0.0100.0±0.0100.0±0.0100.0±0.0100.0±0.0
ME100.0±0.0100.0±0.0100.0±0.0
NM100.0±0.0100.0±0.0100.0±0.0

Table 5

Classification results (%) on HUCM dataset using single kernel.

MethodsClassesSensitivitySpecificityAccuracyOv-SeOv-SpOv-Acc
PCANet-K1AMD89.4±4.195.0±1.293.2±1.393.2±1.396.6±0.695.4±0.8
ME93.3±2.398.6±1.596.9±1.3
NM96.7±2.896.1±1.996.3±1.2
PCANet-K2AMD68.9±2.986.9±3.980.9±2.572.4±2.586.2±1.381.6±1.7
ME62.8±7.887.8±1.979.4±3.0
NM85.6±2.983.9±3.784.4±2.3
PCANet-K3AMD36.7±6.961.1±7.853.0±7.934.3±5.967.1±2.956.2±4.0
ME33.3±9.470.0±5.457.8±6.0
NM32.8±8.470.3±6.057.8±5.4
PCANet-CKAMD92.2±4.796.9±0.995.4±1.695.4±1.697.7±0.896.9±1.1
ME94.4±0.098.3±1.497.0±1.0
NM99.4±1.897.8±1.898.3±1.1

5.

Conclusion and Future Works

In this paper, we present a fully automatic method named the PCANet-CK to identify AMD, ME, and NM using the 3-D retinal SD-OCT images. Instead of adopting hand-crafted features, the proposed PCANet-CK method can automatically learn features from the input OCT images without layer segmentation. In addition, the PCANet-CK utilizes a composite kernel to exploit the strong correlations among features of the 3-D OCT images for classification. Experimental results on two clinically acquired OCT datasets demonstrate the effectiveness of the proposed PCANet-CK method.

In this paper, an automatic retinal OCT image classification algorithm, which can achieve high classification accuracy in identifying AMD, ME, and NM, has been developed. The algorithm may be considered an effective computer-aided diagnosis tool for improving clinically OCT-based ophthalmic disease diagnosis and supporting remote clinical applications.

Note that each kind of disease (e.g., AMD) has large variations on its corresponding lesions (e.g., drusen in different sizes and shapes acquired from patients of different countries and also affected by different illuminations and noise). To better represent and classify the disease, we need to collect more training OCT datasets and then learn a more general model. This is one of our ongoing works, and it is expected to further improve the classification performance.

Since OCT images are 3-D volumetric data and the same pathological structures usually exist in several adjacent cross-sectional slices, one alternative way of feature extraction is to adopt the 3-D-PCANet model with 3-D PCA filters, which can be expected to better capture the main variation of all the 3-D cubes. In addition, our future publication will extend our algorithm to other retinal diseases, such as macular hole, macular telangiectasia, and central serous chorioretinopathy.

Disclosures

The authors have no relevant financial interests in this article and no potential conflicts of interest to disclose.

Acknowledgments

This paper was supported by the National Natural Science Foundation of China for Distinguished Young Scholars under Grant Nos. 61325007, the National Natural Science Foundation under Grant Nos. 61771192 and 61471167, the National Natural Science Foundation for Young Scientist of China under Grant No. 61501180, and China Postdoctoral Science Foundation funded Project No. 2017T100597.

References

1. 

D. A. Quillen, “Common causes of vision loss in elderly patients,” Am. Fam. Physician, 60 (1), 99 –108 (1999). AFPYAE 0002-838X Google Scholar

2. 

F. E. Hirai et al., “Clinically significant macular edema and survival in type 1 and type 2 diabetes,” Am. J. Ophthalmol., 145 (4), 700 –706 (2008). http://dx.doi.org/10.1016/j.ajo.2007.11.019 AJOPAA 0002-9394 Google Scholar

3. 

R. Klein et al., “The five-year incidence and progression of age-related maculopathy: the Beaver Dam eye study,” Ophthalmology, 104 (1), 7 –21 (1997). http://dx.doi.org/10.1016/S0161-6420(97)30368-6 OPANEW 0743-751X Google Scholar

4. 

D. A. Antonetti et al., “Diabetic retinopathy,” Diabetes, 55 (9), 2401 –2411 (2006). http://dx.doi.org/10.2337/db05-1635 DIAEAZ 0012-1797 Google Scholar

5. 

L. Fang et al., “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express, 8 (5), 2732 –2744 (2017). http://dx.doi.org/10.1364/BOE.8.002732 BOEICL 2156-7085 Google Scholar

6. 

D. Huang et al., “Optical coherence tomography,” Science, 254 (5035), 1178 –1181 (1991). http://dx.doi.org/10.1126/science.1957169 SCIEAS 0036-8075 Google Scholar

7. 

C. K. Hitzenberger et al., “Three-dimensional imaging of the human retina by high-speed optical coherence tomography,” Opt. Express, 11 (21), 2753 –2761 (2003). http://dx.doi.org/10.1364/OE.11.002753 OPEXFF 1094-4087 Google Scholar

8. 

A. M. Zysk et al., “Optical coherence tomography: a review of clinical development from bench to bedside,” J. Biomed. Opt., 12 (5), 051403 (2007). http://dx.doi.org/10.1117/1.2793736 JBOPFO 1083-3668 Google Scholar

9. 

M. Wojtkowski et al., “In vivo human retinal imaging by Fourier domain optical coherence tomography,” J. Biomed. Opt., 7 (3), 457 –463 (2002). http://dx.doi.org/10.1117/1.1482379 JBOPFO 1083-3668 Google Scholar

10. 

S. S. Gao et al., “Quantification of choroidal neovascularization vessel length using optical coherence tomography angiography,” J. Biomed. Opt., 21 (7), 076010 (2016). http://dx.doi.org/10.1117/1.JBO.21.7.076010 JBOPFO 1083-3668 Google Scholar

11. 

G. J. Tearney, I. K. Jang and B. E. Bouma, “Optical coherence tomography for imaging the vulnerable plaque,” J. Biomed. Opt., 11 (2), 021002 (2006). http://dx.doi.org/10.1117/1.2192697 JBOPFO 1083-3668 Google Scholar

12. 

M. Mogensen et al., “OCT imaging of skin cancer and other dermatological diseases,” J. Biophotonics, 2 (6–7), 442 –451 (2009). http://dx.doi.org/10.1002/jbio.v2:6/7 Google Scholar

13. 

D. C. Adler et al., “Three-dimensional endomicroscopy of the human colon using optical coherence tomography,” Opt. Express, 17 (2), 784 –796 (2009). http://dx.doi.org/10.1364/OE.17.000784 OPEXFF 1094-4087 Google Scholar

14. 

W. Drexler and J. G. Fujimoto, “State-of-the-art retinal optical coherence tomography,” Prog. Retinal Eye Res., 27 (1), 45 –88 (2008). http://dx.doi.org/10.1016/j.preteyeres.2007.07.005 PRTRES 1350-9462 Google Scholar

15. 

V. J. Srinivasan et al., “High-definition and 3-dimensional imaging of macular pathologies with high-speed ultrahigh-resolution optical coherence tomography,” Ophthalmology, 113 (11), 2054 –2065 (2006). http://dx.doi.org/10.1016/j.ophtha.2006.05.046 OPANEW 0743-751X Google Scholar

16. 

R. J. Zawadzki et al., “Adaptation of a support vector machine algorithm for segmentation and visualization of retinal structures in volumetric optical coherence tomography data sets,” J. Biomed. Opt., 12 (4), 041206 (2007). http://dx.doi.org/10.1117/1.2772658 JBOPFO 1083-3668 Google Scholar

17. 

C. A. Puliafito et al., “Imaging of macular diseases with optical coherence tomography,” Ophthalmology, 102 (2), 217 –229 (1995). http://dx.doi.org/10.1016/S0161-6420(95)31032-9 OPANEW 0743-751X Google Scholar

18. 

L. Fang et al., “Segmentation based sparse reconstruction of optical coherence tomography images,” IEEE Trans. Med. Imaging, 36 (2), 407 –421 (2017). http://dx.doi.org/10.1109/TMI.2016.2611503 ITMID4 0278-0062 Google Scholar

19. 

M. R. Hee et al., “Quantitative assessment of macular edema with optical coherence tomography,” Arch. Ophthalmol., 113 (8), 1019 –1029 (1995). http://dx.doi.org/10.1001/archopht.1995.01100080071031 AROPAW 0003-9950 Google Scholar

20. 

B. Baumann et al., “Segmentation and quantification of retinal lesions in age-related macular degeneration using polarization-sensitive optical coherence tomography,” J. Biomed. Opt., 15 (6), 061704 (2010). http://dx.doi.org/10.1117/1.3499420 JBOPFO 1083-3668 Google Scholar

21. 

J. Sugmk, S. Kiattisin and A. Leelasantitham, “Automated classification between age-related macular degeneration and diabetic macular edema in OCT image using image segmentation,” in 7th Biomedical Engineering Int. Conf. (BMEiCON ’14), 1 –4 (2014). Google Scholar

22. 

Y. Y. Liu et al., “Automated macular pathology diagnosis in retinal OCT images using multi-scale spatial pyramid and local binary patterns in texture and shape encoding,” Med. Image Anal., 15 (5), 748 –759 (2011). http://dx.doi.org/10.1016/j.media.2011.06.005 Google Scholar

23. 

P. P. Srinivasan et al., “Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images,” Biomed. Opt. Express, 5 (10), 3568 –3577 (2014). http://dx.doi.org/10.1364/BOE.5.003568 BOEICL 2156-7085 Google Scholar

24. 

B. Hassan et al., “Structure tensor based automated detection of macular edema and central serous retinopathy using optical coherence tomography images,” J. Opt. Soc. Am. A, 33 (4), 455 –463 (2016). http://dx.doi.org/10.1364/JOSAA.33.000455 JOAOD6 0740-3232 Google Scholar

25. 

Y. K. Sun, S. Li and Z. Y. Sun, “Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning,” J. Biomed. Opt., 22 (1), (2017). http://dx.doi.org/016012 JBOPFO 1083-3668 Google Scholar

26. 

F. G. Venhuizen et al., “Automated age-related macular degeneration classification in OCT using unsupervised feature learning,” Proc. SPIE, 9414 94141I (2015). http://dx.doi.org/10.1117/12.2081521 PSISDG 0277-786X Google Scholar

27. 

S. Karri, D. Chakraborty and J. Chatterjee, “Transfer learning based classification of optical coherence tomography images with diabetic macular edema and dry age-related macular degeneration,” Biomed. Opt. Express, 8 (2), 579 –592 (2017). http://dx.doi.org/10.1364/BOE.8.000579 BOEICL 2156-7085 Google Scholar

28. 

Y. Wang et al., “Machine learning based detection of age-related macular degeneration (AMD) and diabetic macular edema (DME) from optical coherence tomography (OCT) images,” Biomed. Opt. Express, 7 (12), 4928 –4940 (2016). http://dx.doi.org/10.1364/BOE.7.004928 BOEICL 2156-7085 Google Scholar

29. 

Y. Bengio, A. Courville and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., 35 (8), 1798 –1828 (2013). http://dx.doi.org/10.1109/TPAMI.2013.50 ITPIDJ 0162-8828 Google Scholar

30. 

Y. LeCun, Y. Bengio and G. Hinton, “Deep learning,” Nature, 521 (7553), 436 –444 (2015). http://dx.doi.org/10.1038/nature14539 Google Scholar

31. 

A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25 (NIPS ’12), 1097 –1105 (2012). Google Scholar

32. 

P. Prentasic et al., “Segmentation of the foveal microvasculature using deep learning networks,” J. Biomed. Opt., 21 (7), 075008 (2016). http://dx.doi.org/10.1117/1.JBO.21.7.075008 JBOPFO 1083-3668 Google Scholar

33. 

T. H. Chan et al., “PCANet: a simple deep learning baseline for image classification?,” IEEE Trans. Image Process., 24 (12), 5017 –5032 (2015). http://dx.doi.org/10.1109/TIP.2015.2475625 IIPRE4 1057-7149 Google Scholar

34. 

L. Fang et al., “Spectral-spatial classification of hyperspectral images with a superpixel-based discriminative sparse model,” IEEE Trans. Geosci. Remote Sens., 53 (8), 4186 –4201 (2015). http://dx.doi.org/10.1109/TGRS.2015.2392755 IGRSD2 0196-2892 Google Scholar

35. 

G. B. Huang et al., “Extreme learning machine for regression and multiclass classification,” IEEE Trans. Syst., Man, Cybern. B, 42 (2), 513 –529 (2012). http://dx.doi.org/10.1109/TSMCB.2011.2168604 Google Scholar

36. 

G. B. Huang, Q. Y. Zhu and C. K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, 70 (1), 489 –501 (2006). http://dx.doi.org/10.1016/j.neucom.2005.12.126 NRCGEO 0925-2312 Google Scholar

37. 

L. Fang, H. Zhuo and S. Li, “Super-resolution of hyperspectral image via superpixel-based sparse representation,” Neurocomputing, 273 171 –177 (2018). http://dx.doi.org/10.1016/j.neucom.2017.08.019 NRCGEO 0925-2312 Google Scholar

38. 

K. Dabov et al., “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., 16 (8), 2080 –2095 (2007). http://dx.doi.org/10.1109/TIP.2007.901238 IIPRE4 1057-7149 Google Scholar

39. 

L. Fang et al., “Hyperspectral image classification via multiple-feature-based adaptive sparse representation,” IEEE Trans. Instrum. Meas., 66 (7), 1646 –1657 (2017). http://dx.doi.org/10.1109/TIM.2017.2664480 IEIMAO 0018-9456 Google Scholar

40. 

K.R. Muller et al., “An introduction to kernel-based learning algorithms,” IEEE Trans. Neural Networks, 12 (2), 181 –201 (2001). http://dx.doi.org/10.1109/72.914517 ITNNEP 1045-9227 Google Scholar

41. 

X. Liu et al., “Multiple kernel extreme learning machine,” Neurocomputing, 149 253 –264 (2015). http://dx.doi.org/10.1016/j.neucom.2013.09.072 NRCGEO 0925-2312 Google Scholar

42. 

L. Fang et al., “Classification of hyperspectral images by exploiting spectral-spatial information of superpixel via multiple kernels,” IEEE Trans. Geosci. Remote Sens., 53 (12), 6663 –6674 (2015). http://dx.doi.org/10.1109/TGRS.2015.2445767 IGRSD2 0196-2892 Google Scholar

43. 

S. Wold, K. Esbensen and P. Geladi, “Principal component analysis,” Chemom. Intell. Lab. Syst., 2 (1), 37 –52 (1987). http://dx.doi.org/10.1016/0169-7439(87)80084-9 Google Scholar

45. 

C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., 20 (3), 273 –297 (1995). http://dx.doi.org/10.1007/BF00994018 MALEEZ 0885-6125 Google Scholar

Biography

Leyuan Fang received his PhD from the College of Electrical and Information Engineering, Hunan University, Changsha, China, in 2015. From August 2016 to 2017, he was a postdoc researcher at the Department of Biomedical Engineering, Duke University, Durham, USA. Since January 2017, he has been an associate professor at the College of Electrical and Information Engineering, Hunan University. His research interests include sparse representation and deep learning in medical image processing.

Chong Wang received his BS degree from Southern Medical University, Guangzhou, China, in 2017. He is currently working toward the master degree at the Laboratory of Vision and Image Processing, Hunan University, Changsha, China. His research interests focus on deep learning for medical image processing.

Shutao Li received his BS, MS, and PhD degrees from Hunan University, Changsha, China, in 1995, 1997, and 2001, respectively. From 2002 to 2003, he was a postdoctoral fellow at the Royal Holloway College, University of London, London, U.K. He is currently a Cheung Kong Scholars Professor at the College of Electrical and Information Engineering, Hunan University. His current research interests include compressive sensing, sparse representation, image processing, and pattern recognition.

Jun Yan received his MS degree from Hunan University, Changsha, China, in 2017. His research interests focus on deep learning for optical coherence tomography image processing.

Xiangdong Chen received his MD degree from Hunan University of Chinese Medicine, Changsha, China, in 2017. He is currently a full professor at Hunan University of Chinese Medicine, Changsha, China. His current research interests include analysis and diagnosis of age related macular and glaucoma diseases.

Hossein Rabbani received his MS and PhD degrees in biomedical engineering (bioelectrics) from Amirkabir University of Technology (Tehran Polytechnic). He is currently an associate professor at the Biomedical Engineering Department and also at Medical Image & Signal Processing Research Center of Isfahan University of Medical Sciences. His current research interests include medical image analysis and modeling, statistical (multidimensional) signal processing, sparse transforms, and image/video restoration.

© 2017 Society of Photo-Optical Instrumentation Engineers (SPIE) 1083-3668/2017/$25.00 © 2017 SPIE
Leyuan Fang, Chong Wang, Shutao Li, Jun Yan, Xiangdong Chen, and Hossein Rabbani "Automatic classification of retinal three-dimensional optical coherence tomography images using principal component analysis network with composite kernels," Journal of Biomedical Optics 22(11), 116011 (29 November 2017). https://doi.org/10.1117/1.JBO.22.11.116011
Received: 10 June 2017; Accepted: 8 November 2017; Published: 29 November 2017
Lens.org Logo
CITATIONS
Cited by 19 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical coherence tomography

Principal component analysis

Composites

3D image processing

Image classification

Feature extraction

3D modeling

RELATED CONTENT


Back to Top