Convolutional neural networks (CNNs) are very important deep neural networks for analyzing visual imagery. However, most CNN-based methods have the problem of over-smoothing at boundaries, which is unfavorable for hyperspectral image classification. To address this problem, a spectral-spatial multiscale residual network (SSMRN) by fusing two separate deep spectral features and deep spatial features is proposed to significantly reduce over-smoothing and effectively learn the features of objects. In the implementation of the SSMRN, a multiscale residual convolutional neural network is proposed as a spatial feature extractor and a band grouping-based bi-directional gated recurrent unit is utilized as a spectral feature extractor. Considering that the importance of spectral and spatial features may vary depending on the spatial resolution of images, we combine both features with two weighting factors with different initial values that can be adaptively adjusted during the network training. To evaluate the effectiveness of the SSMRN, extensive experiments are conducted on public benchmark data sets. The proposed method can retain the detailed boundary of different objects and yield competitive results compared with several state-of-the-art methods. |
1.IntroductionWith the rapid development of remote sensing imaging spectroscopy technology, hyperspectral images (HSIs) have become increasingly important in Earth observation due to their rich spectral and spatial information. Classification is an important technique for HSI data exploitation. HSI classification (HSIC) is the task of identifying the category for each pixel with a proper land-cover label,1 which is more challenging because of the large dimensionality, spectral heterogeneity, and complex spatial distribution of the objects.2 To alleviate these problems, traditional HSIC methods involve two steps: (1) feature selection and extraction.3 This step relies on utilizing feature engineering skills and domain expertise to design several human-engineered features. (2) Classifier training. A classifier in machine learning is an algorithm that automatically orders or categorizes data into one or more of a set of classes. However, the traditional HSIC approaches use handcrafted features to train the classifier. These features may be insubstantial in the case of real data. Therefore, it is difficult to fine-tune between robustness and discriminability as a set of optimal features considerably vary between different data.4 Deep neural networks (DNNs) can automatically learn the features from data in a hierarchical manner to construct a model with growing semantic layers until a suitable representation is achieved.5 To overcome the issue of high intraclass variability and high interclass similarity in HSI, stacked autoencoders6–8 and deep belief networks9,10 are introduced as accurate unsupervised methods to extract layerwise trained deep features. However, their standard fully connected (FC) architecture imposes a feature flattening process before the classification, leading to the loss of spatial-contextual information.11 On the contrary, convolutional neural networks (CNNs) can automatically extract spectral-spatial features from the raw input data. Recurrent neural networks (RNNs) process the spectral information of HSI data as a time sequence considering the spectral bands as time steps. There are three basic models of RNN: (1) Vanilla, (2) long–short-term memory (LSTM), and (3) gated recurrent unit (GRU). Therefore, a large number of CNN or RNN-based methods are proposed for end-to-end modeling and can handle HSI data in spectral and spatial domains individually, and also in a coupled fashion.12 For instance, Yang et al.13 designed a CNN model with two-branch architecture to learn the spectral features and spatial features jointly. Zhong et al.14 raised an end-to-end three-dimensional (3D) residual CNN architecture for spectral-spatial feature learning and classification. Motivated by the attention mechanism of the human visual system, a residual spectral-spatial attention network (RSSAN)15 was proposed for HSI classification. To reduce computations, fully convolutional networks were proposed for HSIC.16 For correctly discovering the contextual relations among pixels, the graph convolutional network was adopted for dealing with the HSIC, which was originally designed for arbitrarily structured non-Euclidean data.17 The morphological operations, i.e., erosion and dilation, are powerful nonlinear feature transformations. Inspired by these, an end-to-end morphological CNN (MorphCNN)11 was introduced for HSIC by concatenating the outputs from spectral and spatial morphological blocks extracted in a dual-path fashion. To represent high-level semantic features well, a spectral-spatial feature tokenization transformer (SSFTT) method18 was proposed to capture spectral-spatial features and high-level semantic features. Keeping in view the sequential property of HSI to determine the class labels, an RNN-based HSIC framework with a novel activation function (parametric rectified tanh) and GRU was proposed.19 The work20 proposed a spectral-spatial LSTM-based network that learns spectral and spatial features of HSI by utilizing two separate LSTM-followed Softmax layers for classification, while a decision fusion strategy is implemented to get joint spectral-spatial classification results. In the literature, several works have proposed a CNN joint RNN architecture for HSIC. Spatial-spectral unified network (SSUN) combined a spectral dimensional band grouping-based LSTM model with 2D CNN for spatial features and integrated the spectral feature extraction (FE), spatial FE, and classifier training into a unified neural network.2 In a spectral-spatial attention network (SSAN),21 RNN with attention can learn inner spectral correlations within a continuous spectrum, while CNN with attention is designed to focus on saliency features and spatial relevance between neighboring pixels in the spatial dimension. The work22 integrated CNN with bidirectional convolutional LSTM (CLSTM) in which a 3D CNN model is used to capture low-level spectral-spatial features and CLSTM recurrently analyzes this low-level spectral-spatial information. CNN is commonly applied to analyze visual imagery.23 Most of the above methods are based on the CNN backbone and its variants. However, most CNN-based methods have the problem of over-smoothing at boundaries, which is unfavorable for HSIC. DNNs usually yield overfitting methods24 and are sensitive to perturbations.25 A large number of training samples are usually required for deep learning methods.26,27 To significantly reduce the over-smoothing effect and effectively learn the features of objects, a multi-task learning spectral-spatial multiscale residual network (SSMRN) is proposed for the end-to-end HSIC. The contributions can be summarized as follows:
The rest of the sections are organized as follows. First, Sec. 2 introduces the preliminary knowledge of CNN, residual networks, and RNN. The proposed architecture along with the design methodology is introduced in Sec. 3. Next, experimental data sets and results are given in Sec. 4. Then, the impact of the SSMRN architecture on classification results is analyzed in Sec. 5. Finally, Sec. 6 concludes the paper with a summary of the proposed method and the scope of future work. 2.PreliminaryIn this section, we mainly recall the background information on CNN, residual networks, and RNN. 2.1.Convolutional Neural NetworkA CNN28 is a class of DNNs, most commonly applied to analyzing visual imagery. Three main types of layers are used to build CNNs architectures: convolutional layer, pooling layer, and FC layer. Compared with multilayer perceptron neural networks, CNNs are easier to train because of the parameter sharing scheme and local connectivity. While CNN-based methods have achieved large improvement in HSIC, they usually suffer from severe over-smoothing problems at edge boundaries. There are two major reasons: (1) the scales between supervised information and spatial features do not match. The supervised information of HSIC is pixel-level, while the spatial features are extracted from the neighbourhood of the current pixel. (2) The parameter sharing scheme makes the spatial features extracted for the patch instead of the current pixel. Two major reasons lead to an insufficient influence of the current pixel in classification. Attentional mechanisms can counteract the effects of parameter sharing,15,21 but increase the amount of computation. A smaller size patch will also decrease the possibility of the over-smoothing phenomenon2 but result in insufficient extraction of spatial information and lower classification accuracy (CA).29 Another approach is to utilize superpixel segmentation,17 but the segmentation algorithm affects the classification results. 2.2.Residual NetworksA residual network is an effective extension to CNNs that has empirically shown to increase performance in ImageNet classification. A residual network does this by utilizing a skip connection to jump over some layers. As shown in Fig. 1, the typical residual block is implemented with double-layer skips that contain nonlinearities. The skip connections between layers add the outputs from previous layers to the outputs of stacked layers. One motivation for skipping over layers is to avoid the problem of vanishing gradients, by reusing activations from a previous layer until the adjacent layer learns its weights.30 Skipping effectively simplifies the network, using fewer layers in the initial training stages. The residual block is easy to understand and optimize and can be stacked to any depth and embedded in any existing CNN. 2.3.Recurrent Neural NetworkRNNs allow us to operate over sequences of input, output, or both at the same time. RNN makes them applicable to challenging tasks involving sequential data such as speech recognition and language modeling. LSTM and GRU31 are introduced to learn long-term dependencies and alleviate the vanishing/exploding gradient problem. These two architectures do not have any fundamental differences from each other, but they use different functions to compute the hidden state. LSTM is strictly stronger than the GRU as it can easily perform unbounded counting. The GRU has fewer parameters than LSTM, and GRU has been shown to exhibit better performance on certain smaller and less frequent data sets. Bi-directional RNNs (Bi-RNNs) utilize a finite sequence to predict or label each element of the sequence based on the element's past and future contexts, as shown in Fig. 2. Bi-RNN concatenating the outputs of two RNNs allows them to receive information from the sequence from left to right, the other one from right to left. Hyperspectral data usually have hundreds of bands. So, pixel classification in HSI can be treated as a many-to-one task where we are given a sequence of bands of a pixel and then classify what classification that pixel is. A natural idea is to consider each band as a time step. The large length of RNNs input sequence can lead to an overfitting issue, which consumes high computing and storage resources. In addition, a large number of spectral channels and limited training samples restrict the performance of HSIC.26 3.Proposed FrameworkThe deep networks used for HSIC are divided into spectral-feature networks, spatial-feature networks, and spectral-spatial-feature networks. To effectively learn the features of objects, we utilize the spectral-spatial-feature networks to extract joint deep spectral-spatial features for HSIC. The joint deep spectral-spatial features are mainly obtained by the following three ways:32 (1) mapping the low-level spectral-spatial features to high-level spectral-spatial features via deep networks; (2) directly extracting deep features from original data or several principal components of the original data; and (3) fusing two separate deep spectral features and deep spatial features. Considering that the importance of spectral and spatial features may vary depending on the spatial resolution of images, we adopt the way of fusing two separate deep features to conveniently adjust the influence of different features on the classification results. Three sections are playing crucial roles in our methodology: a multiscale residual CNN (MRCNN)-based spatial feature learner, a bi-directional GRU (bi-GRU)-based spectral feature learner, and a multi-task learning model that combines both features with two weighting factors. 3.1.Multiscale Residual CNN for Spatial ClassificationThe proposed MRCNN architecture is shown in Fig. 3. Let be the original HSI data, where , and are the row number, column number, and band number, respectively. First, to suppress noise and reduce the computational costs, the principal component analysis is applied to the original HSI data, and only the first principle components are reserved. Denote the dimension-reduced data by . Around each pixel, a neighbor region is extracted with the size of as the input of the spatial branch. Considering the complex environment of the HSI, where different objects tend to have different scales, we propose to extract both shallow and deep features by applying a convolution layer with rectified linear unit (ReLU) activation and two residual blocks in the classification. The local max pooling layer is adopted in residual blocks. We add a flatten layer and an FC layer with the same number of neurons after each scale output. Then, these FC layers are merged into a new FC layer. Let , denotes the ’th FC layer, where is the flattened features in the th flatten layer, and are the corresponding weight matrix and bias term, respectively. The fourth FC layer can be calculated as . In this way, features in different layers are taken into consideration during the classification stage, and the network will possess the property of multiscale. The loss function for cross entropy of MRCNN can be expressed as where and denote the truth and predicted labels, respectively. is the number of training samples and is the number of classes.3.2.Bi-GRU for Spectral ClassificationGRU has fewer parameters than LSTM for modeling various sequential problems, and Bi-GRU allows the sequential vector to be fed into the architecture one by one to learn continuous features with forward and backward directions. So, we utilize Bi-GRU for spectral classification. The complete spectral classification framework is shown in Fig. 4. To reduce computation, a suitable grouping strategy2 is used in this paper. For each pixel in the HSI, let be the spectral vector, where is the reflectance of the ’th band and is the number of bands. Let be the number of time steps (e.g., number of groups). The transformed sequences can be denoted by , where is the sequence at the th time step. Specifically, the grouping strategy is where is the sequence length of each time step and function rounds numbers down. After grouping, spectral vector is transformed into sequences .The input to our model is the sequences , and the bi-directional hidden vector is calculated as Forward hidden state: Backward hidden state: where the coefficient matrices and are from the input at the present step, is from the hidden state at the previous step, is from at the succeeding step, is the hyperbolic tangent, and the memory of the input as the output of this encoder is : where is a function of concatenation between the forward hidden state and backward hidden state.The grouping strategy uses the original HSI spectral vector as the feature of the new sequence and the RNN uses the parameter sharing scheme, so a one-dimensional convolutional residual block is added to reassign the weight of the feature based on the channel attention mechanism. So, we can compute the predicted label of pixel as follows: where is one-dimensional convolutional layer with stride one and indicates a series of operations as shown in Fig. 4, including a ReLU activation, a flatten function, an FC layer and a Softmax activation function.3.3.SSMRNThe proposed SSMRN framework is shown in Fig. 5, which starts with two branches, learning the spatial and spectral features, respectively. Then, concatenate these two branches into a layer. and are the corresponding weighting factors. To better train the whole network, two auxiliary tasks are added to the framework.2 So, the proposed SSMRN is a triple-task framework, including one main task (classification based on spectral-spatial information) and two auxiliary tasks (classification based on spectral information and classification based on spatial information). The complete loss function for cross entropy of the SSMRN is defined as where is the main loss function, and are two auxiliary loss functions,, , and are the corresponding predicted labels, is the true label. is the number of training samples and is the number of classes. The whole network is trained in an end-to-end manner, where all the parameters are optimized by the batch stochastic gradient descent algorithm at the same time. In this way, the complete loss function will balance the convergences of both the whole network and the subnetworks.4.ExperimentIn this section, we introduce three public data sets used in our experiment and the configuration of the proposed SSMRN. In addition, classification performance based on the proposed method and other comparative methods is presented. 4.1.Experimental DataThree publicly available hyperspectral data sets are utilized to evaluate the performance of the proposed method, i.e., Indian Pines (IP) from the airborne visible/infrared imaging spectrometer (AVIRIS) sensor, Pavia University (PU) from the reflective optics systems imaging spectrometer (ROSIS) sensor, and Salinas (SA) from the AVIRIS sensor. The data set details are shown in the following Table 1. Table 1Summary of the HSI datasets used for experimental evaluation.
4.2.Experimental Setting4.2.1.Evaluation indicatorsTo quantitatively analyze the effectiveness of the proposed method and other methods for comparison, three quantitative evaluation indexes are introduced, including class-specified CA, overall classification accuracy (OA), and Kappa coefficient (Kappa). The larger value of each indicator represents a better classification effect. 4.2.2.ConfigurationAll the experiments are implemented with an Intel(R) Xeon(R) Sliver 4210 CPU @ 2.20-GHz with 64 GB of RAM and an NVIDIA RTX2080 graphic card, TensorFlow 2.3.1, and Keras 2.4.3 with python 3.7.6. We use the Adam optimizer to train the networks with a learning rate of 0.001. The gradient of each weight is individually clipped so that its norm is no higher than 1. The training epochs are set as 1500 with batch size 1048. 4.2.3.Parameter settingAll the experiments in this paper are randomly repeated 30 times. In each repetition, we first randomly generate the training set from the whole data set with the same number of the labelled class. Then, the remaining samples make up the test set. Details are given in Tables 2Table 3–4. Table 2Number of training and test samples used in the IP data set.
Table 3Number of training and test samples used in the PU data set.
Table 4Number of training and test samples used in the SA data set.
For the proposed MRCNN, the input is a patch, where 4 is the number of reserved principal components. All convolutional layers have 64 filters. The kernel size of the first left convolutional layer is , and the other kernel sizes are . The size of the max pooling layers is . The three FC layers after each scale output each own 64 units. For the proposed Bi-GRU, let 3 be the number of time steps. The hidden size in GRU is 64, so one-dimensional convolutional layers have 128 filters because of Bi-GRU. For the proposed SSMRN, the input is as same as the Bi-GRU and MRCNN. The number of neurons of the FC layer in the spectral branch and spatial branch is 192, so the number of neurons in the joint FC layer is 384. In our study, we adopt the way of fusing two separate deep spectral features and deep spatial features. Since the importance of spectral and spatial features may vary depending on different spatial resolutions, we consider the weight of these two parts and we need to specify the initial value of these hyperparameters. The principle is that the higher the spatial resolution and the smaller the influence of the mixed pixel effect, the greater the initial spectral weight should be. Suppose the sum of the two weights is 1 and the weights for both parts are close to each other. Owing to the proposed strategy, the weights for the spectral and spatial parts can be adjusted adaptively. The initial value of weighting factors and are given in Table 5. Table 5Initial value of weighting factors.
4.2.4.Ablation studyIn this section, we compare the SSMRN with the SSMRN without auxiliary tasks. As shown in Table 6, the SSMRN surpasses the SSMRN without auxiliary tasks, especially for small samples of the IP data set. These results demonstrate that multi-task learning can select the useful HSI data for feature learning. Table 6OA (%) of SSMRN with two modules.
4.3.Classification ResultsTo demonstrate the superiority and effectiveness of the proposed SSMRN model, it is compared with the proposed Bi-GRU, MRCNN, and advanced spectral-spatial DNNs methods, such as SSUN,2 SSAN,21 RSSAN,15 MorphCNN,11 and SSFTT.18 Bi-GRU is the spectral FE branch of SSMRN. MRCNN is the spatial FE branch of SSMRN. SSUN, SSAN, RSSAN, MorphCNN, SSFTT, and SSMRN are all based on the CNN backbone and its variants, integrating spatial features, and spectral features. RSSAN and SSFTT directly extract the joint deep spectral-spatial features via CNNs. SSUN, SSAN, MorphCNN, SSFTT, and SSMRN obtain deep spectral features and deep spatial features via two deep networks. And the two kinds of features are fused to generate the joint deep spectral-spatial features. The difference is that SSMRN considers the weight relationship between the spectral and spatial branches depending on the spatial resolution of images, and embeds multi-task learning technology at the same time. For SSUN, SSAN, and MorphCNN, the input is a patch, where 4 is the number of reserved principal components. Limited by our computer configuration, we cannot run RSSAN properly with the original input size in the corresponding reference, so the input of RSSAN is a patch, where 8 is the number of reserved principal components instead of the number of spectral bands. According to the reference, the input of SSFTT is a patch. For the SSUN, SSAN, RSSAN, MorphCNN, and SSFTT, all network settings are as described in their corresponding references. For a fair comparison, the training sample sets and test sample sets of all methods are randomly selected, as shown in Tables 2Table 3–4. Quantitative evaluation: Tables 7Table 8–9 report the CA, OA, and Kappa using all the mentioned methods for the IP, PU, and SA datasets, respectively. All algorithms are executed 30 times. The average results with the standard deviation obtained are reported to reduce random selection effects. The optimal results are denoted in bold. The evaluation data clearly show that the proposed SSMRN method performs the best. The SSMRN obtains the highest OA and Kappa. SSMRN also generates most of the highest class-specific accuracy, where the results of a few classes have slightly lower precisions than MRCNN, SSUN, and SSFTT. Particularly in the IP datasets, the results of SSMRN are higher than other methods, which shows that SSMRN can effectively learn the features of objects, especially under the condition of a small number of samples. The CA, OA, and Kappa of Bi-GRU are lower than other methods, specifically in Table 5. Because Bi-GRU only uses the spectral feature, and the IP datasets have lower spatial resolution and the bigger influence of the mixed pixel effect. MRCNN's results are second only to SSMRN, which shows that good results can be obtained using spatial features and proper deep network structure. Especially in the SA data set, the results of the MRCNN and SSMRN models are almost identical. The likely reason is that the ground objects of interest in the image are homogeneous, regular, and have a large area. The pixel-level supervised information can be better regarded as the patch-level supervised information. The scales between supervised information and spatial features match. The structures of SSUN and SSAN are similar to that of SSMRN, which belongs to the way of fusing two separate deep spectral features and deep spatial features. However, the reason why the results of SSUN and SSAN are not as good as SSMRN may be that the network depth of spectral and spatial FE is not enough. The structures of RSSAN, MorphCNN, and SSFTT belong to the way of directly extracting deep features from original data or several principal components of the original data. The RSSAN and SSFTT are powerful methods. The main limitation of RSSAN and SSFTT is that a certain number of samples are required, which may result in poor performance with small samples, such as in the IP data sets. The accuracy of classification results of MorphCNN is low and unstable in Tables 7 and 8. Because compared with the objects in PU and SA, the morphological feature contained in the patch is not obvious. Table 7Classification results of different methods for the IP data set. Bold indicates the best result.
Table 8Classification results of different methods for the PU data set. Bold indicates the best result.
Table 9Classification results of different methods for the SA data set. Bold indicates the best result.
As shown in Tables 7Table 8–9, Bi-GRU, SSUN, and SSFTT generally cost less time than MRCNN and other spectral-spatial feature methods. The reasons may be the grouping strategy of Bi-GRU, the grouping strategy and insufficient network depth of SSUN, and the transformer encoder module of SSFTT. The runtime of the MorphCNN is the longest. The reason is that network structure is more complex and deeper than other networks. Tables 10Table 11–12 show the OA of SSUN, SSAN, RSSAN, MorphCNN, SSFTT, and SSMRN with different training samples. Considering the stability and robustness of the proposed method under different training samples, 5, 10, 15, and 30 labeled samples of each class are randomly selected as training data for the IP and 30, 50, 100, and 200 for the PU and SA in the experiment. With the change in sample size, the results of MorphCNN fluctuate sharply. It further proves that the morphological feature is unstable. As the number of samples increases, the results of SSUN, SSAN, RSSAN, SSFTT, and SSMRN become better. And SSMRN significantly outperforms other methods under different training sample conditions. In the case of a small number of samples, our method still results in a good performance. In addition, when the number of samples of each class is 100 in PU and SA, the OAs of all the other methods are , but the accuracy of the SSMRN can reach 99.5%. These prove that SSMRN can effectively learn the features of objects under different training sample conditions. Table 10OA(%) of different methods under different training sample numbers of each class on the IP data set. Bold indicates the best result.
Table 11OAs (%) of different methods under different training sample numbers of each class on the PU data set. Bold indicates the best result.
Table 12OAs (%) of different methods under different training sample numbers of each class on the SA data set. Bold indicates the best result.
Qualitative evaluation: the classification maps of different methods are shown in Figs. 6Fig. 7–8. By visual comparison, the classification map obtained by SSMRN is the cleanest and the closest to the ground-truth map. Due to the lack of spatial features, classification maps of Bi-GRU suffer from the pepper noise and misclassification inside an object. Compared with spectral FE methods, spatial FE methods make full use of the continuity of the ground object and yield a cleaner classification map. The main problem of MRCNN lies in the over-smoothing phenomenon. SSRAN, MorphCNN, and SSFTT have the over-smoothing phenomenon, too. They belong to the way of directly extracting joint deep spectral-spatial features from original data or several principal components of the original data, and spectral features come from the patch scale. Meanwhile, SSMRN, SSUN, and SSAN can better retain the detailed boundary of different objects, and acquire more smooth and homogeneous results, especially within the white dashed box. The most likely reason is that they have spatial and spectral FE branches, and spectral features come from the pixel scale. But SSUN and SSAN do not consider the weight relationship between the two branches depending on the spatial resolution of images. The proposed SSMRN takes the weight between spectral and spatial features into consideration and can further reduce over-smoothing. 5.DiscussionThe experimental results of the three public data sets indicate that SSMRN has a more competitive performance in terms of three measurements (CA, OA, and Kappa) and classification maps than all the compared methods. This is due to:
6.ConclusionTo significantly reduce the over-smoothing effect and effectively learn the features of objects, a multi-task learning SSMRN has been proposed to extract spectral-spatial features. The experimental results of the three public data sets demonstrate that the method not only mitigates the over-smoothing phenomenon, but also has a better performance compared with the other methods in terms of CA, OA, and Kappa. Our method significantly outperforms other methods under different training sample conditions. Although we utilize the proposed band Bi-GRU and MRCNN as the spectral and spatial feature extractors in the implementation of the proposed SSMRN, other deep networks can also be introduced into our model, especially for spectral extractors. It deserves to be investigated in future work. AcknowledgmentsThis work was supported in part by the Major Science and Technology Program of Henan Province (Grant Nos. 222102320341, 212102311149, and 212102310432), by the key scientific research projects of colleges and universities in Henan Province (Grant No. 22B420004), by the Fundamental Research Funds for the Universities of Henan Province (Grant No. NSFRF210401), Doctoral Foundation of Henan Polytechnic University (Grant No. B2017-09, B2017-14, and B2015-22), by the National Natural Science Foundation of China (Grant No. 41801318). We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. ReferencesH. Sun et al.,
“Spectral–spatial attention network for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 58
(5), 3232
–3245 https://doi.org/10.1109/TGRS.2019.2951160 IGRSD2 0196-2892
(2020).
Google Scholar
Y. Xu et al.,
“Spectral–spatial unified networks for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 56
(10), 5893
–5909 https://doi.org/10.1109/TGRS.2018.2827407 IGRSD2 0196-2892
(2018).
Google Scholar
L. Zhang et al.,
“Simultaneous spectral-spatial feature selection and extraction for hyperspectral images,”
IEEE Trans. Cybern., 48
(1), 16
–28 https://doi.org/10.1109/TCYB.2016.2605044
(2018).
Google Scholar
M. Ahmad et al.,
“Hyperspectral image classification—traditional to deep models: a survey for future prospects,”
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 15 968
–999 https://doi.org/10.1109/JSTARS.2021.3133021
(2022).
Google Scholar
W. Liu et al.,
“A survey of deep neural network architectures and their applications,”
Neurocomputing, 234 11
–26 https://doi.org/10.1016/j.neucom.2016.12.038 NRCGEO 0925-2312
(2017).
Google Scholar
P. Zhou et al.,
“Learning compact and discriminative stacked autoencoder for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 57
(7), 4823
–4833 https://doi.org/10.1109/TGRS.2019.2893180 IGRSD2 0196-2892
(2019).
Google Scholar
M. Ahmad et al.,
“Multi-layer extreme learning machine-based autoencoder for hyperspectral image classification,”
in VISIGRAPP (4: VISAPP),
75
–82
(2019). https://doi.org/10.5220/0007258000750082 Google Scholar
B. Liu et al.,
“Spatial–spectral jointed stacked auto-encoder-based deep learning for oil slick extraction from hyperspectral images,”
J. Indian Soc. Remote Sens., 47
(12), 1989
–1997 https://doi.org/10.1007/s12524-019-01045-y
(2019).
Google Scholar
B. Ayhan and C. Kwan,
“Application of deep belief network to land cover classification using hyperspectral images,”
Lect. Notes Comput. Sci., 10261 269
–276 https://doi.org/10.1007/978-3-319-59072-1_32 LNCSD9 0302-9743
(2017).
Google Scholar
G. E. Hinton, S. Osindero and Y.-W. Teh,
“A fast learning algorithm for deep belief nets,”
Neural Computation, 18
(7), 1527
–1554 https://doi.org/10.1162/neco.2006.18.7.1527 NEUCEB 0899-7667
(2006).
Google Scholar
S. K. Roy et al.,
“Morphological convolutional neural networks for hyperspectral image classification,”
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 14 8689
–8702 https://doi.org/10.1109/JSTARS.2021.3088228
(2021).
Google Scholar
L. ZhangL. Zhang and B. Du,
“Deep learning for remote sensing data: a technical tutorial on the state of the art,”
IEEE Geosci. Remote Sens. Mag., 4
(2), 22
–40 https://doi.org/10.1109/MGRS.2016.2540798
(2016).
Google Scholar
J. Yang, Y.-Q. Zhao and J.C.-W. Chan,
“Learning and transferring deep joint spectral–spatial features for hyperspectral classification,”
IEEE Trans. Geosci. Remote Sens., 55
(8), 4729
–4742 https://doi.org/10.1109/TGRS.2017.2698503 IGRSD2 0196-2892
(2017).
Google Scholar
Z. Zhong et al.,
“Spectral–spatial residual network for hyperspectral image classification: a 3-D deep learning framework,”
IEEE Trans. Geosci. Remote Sens., 56
(2), 847
–858 https://doi.org/10.1109/TGRS.2017.2755542 IGRSD2 0196-2892
(2018).
Google Scholar
M. Zhu et al.,
“Residual spectral–spatial attention network for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 59
(1), 449
–462 https://doi.org/10.1109/TGRS.2020.2994057 IGRSD2 0196-2892
(2021).
Google Scholar
Y. Xu, B. Du and L. Zhang,
“Beyond the patchwise classification: spectral-spatial fully convolutional networks for hyperspectral image classification,”
IEEE Trans. Big Data, 6
(3), 492
–506 https://doi.org/10.1109/TBDATA.2019.2923243
(2020).
Google Scholar
S. Wan et al.,
“Hyperspectral image classification with context-aware dynamic graph convolutional network,”
IEEE Trans. Geosci. Remote Sens., 59
(1), 597
–612 https://doi.org/10.1109/TGRS.2020.2994205 IGRSD2 0196-2892
(2021).
Google Scholar
L. Sun et al.,
“Spectral–spatial feature tokenization transformer for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 60 1
–14 https://doi.org/10.1109/TGRS.2022.3144158 IGRSD2 0196-2892
(2022).
Google Scholar
R. Hang et al.,
“Cascaded recurrent neural networks for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 57
(8), 5384
–5394 https://doi.org/10.1109/TGRS.2019.2899129 IGRSD2 0196-2892
(2019).
Google Scholar
F. Zhou et al.,
“Hyperspectral image classification using spectral-spatial LSTMs,”
Neurocomputing, 328 39
–47 https://doi.org/10.1016/j.neucom.2018.02.105 NRCGEO 0925-2312
(2019).
Google Scholar
X. Mei et al.,
“Spectral-spatial attention networks for hyperspectral image classification,”
Remote Sens., 11
(8), 963 https://doi.org/10.3390/rs11080963
(2019).
Google Scholar
M. Seydgar et al.,
“3-D convolution-recurrent networks for spectral-spatial classification of hyperspectral images,”
Remote Sens., 11
(7), 883 https://doi.org/10.3390/rs11070883
(2019).
Google Scholar
M. V. Valueva et al.,
“Application of the residue number system to reduce hardware costs of the convolutional neural network implementation,”
Math. Comput. Simul., 177 232
–243 https://doi.org/10.1016/j.matcom.2020.04.031 MCSIDR 0378-4754
(2020).
Google Scholar
C. Zhang et al.,
“A study on overfitting in deep reinforcement learning,”
(2018). Google Scholar
Y. Xu, B. Du and L. Zhang,
“Assessing the threat of adversarial examples on deep neural networks for remote sensing scene classification: attacks and defenses,”
IEEE Trans. Geosci. Remote Sens., 59
(2), 1604
–1617 https://doi.org/10.1109/TGRS.2020.2999962 IGRSD2 0196-2892
(2021).
Google Scholar
C. Cheng et al.,
“Hyperspectral image classification via spectral-spatial random patches network,”
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 14 4753
–4764 https://doi.org/10.1109/JSTARS.2021.3075771
(2021).
Google Scholar
Y. Xu et al.,
“Hyperspectral image classification via a random patches network,”
ISPRS J. Photogramm. Remote Sens., 142 344
–357 https://doi.org/10.1016/j.isprsjprs.2018.05.014 IRSEE9 0924-2716
(2018).
Google Scholar
H.-C. Shin et al.,
“Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,”
IEEE Trans. Med. Imaging, 35
(5), 1285
–1298 https://doi.org/10.1109/TMI.2016.2528162 ITMID4 0278-0062
(2016).
Google Scholar
L. Ma et al.,
“Deep learning in remote sensing applications: a meta-analysis and review,”
ISPRS J. Photogramm. Remote Sens., 152 166
–177 https://doi.org/10.1016/j.isprsjprs.2019.04.015 IRSEE9 0924-2716
(2019).
Google Scholar
K. He et al.,
“Deep residual learning for image recognition,”
in Proc. IEEE Conf. Comput. Vis. and Pattern Recognit.,
770
–778
(2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar
L. Mou, P. Ghamisi and X. X. Zhu,
“Deep recurrent neural networks for hyperspectral image classification,”
IEEE Trans. Geosci. Remote Sens., 55
(7), 3639
–3655 https://doi.org/10.1109/TGRS.2016.2636241 IGRSD2 0196-2892
(2017).
Google Scholar
S. Li et al.,
“Deep learning for hyperspectral image classification: an overview,”
IEEE Trans. Geosci. Remote Sens., 57
(9), 6690
–6709 https://doi.org/10.1109/TGRS.2019.2907932 IGRSD2 0196-2892
(2019).
Google Scholar
BiographyShi He is an assistant professor at the Henan Polytechnic University. He received his BS degree from China Agricultural University in 2009, respectively, and his PhD in cartography and geographic information system from Beijing Normal University in 2016. His current research interests include optical remote sensing, machine learning, and image classification. Huazhu Xue received his PhD in cartography and geographic information systems in the area of quantitative remote sensing from the School of Geography, Beijing Normal University, Beijing, China, in 2012. Since 2012, he has been an associate professor at the School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China. His research interests include vegetation parameters inversion, satellite image processing, and GIS applications. Jiehai Cheng is an associate professor at Henan Polytechnic University. He received his PhD in cartography and geographic information system from Beijing Normal University in 2013. His current research interests include high-resolution remote sensing, deep learning, image classification, and GIS intelligent analysis. Lei Wang received his PhD in cartography and geographical information engineering from China University of Mining and Technology, Beijing, in 2016. Since 2016, he has been an associate professor at Henan Polytechnic University in Jiaozuo City. His current research interests include global GIS modeling, discrete global grid, and the application of remote sensing. Yaping Wang is an associate professor at the Henan Polytechnic University. She received her PhD in cartography and geographic information engineering from China University of Mining and Technology, Beijing, in 2014. Her current research interests include image classification, remote sensing of water resources, and GIS applications. |