The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. We introduce the high-resolution cloud detection network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature fusion module, and multiresolution pyramid pooling module to effectively capture complex cloud features. This architecture preserves detailed cloud texture information while facilitating feature exchange across different resolutions, thereby enhancing the overall performance in cloud detection. Additionally, an approach is introduced wherein a student view, trained on noisy augmented images, is supervised by a teacher view processing normal images. This setup enables the student to learn from cleaner supervisions provided by the teacher, leading to an improved performance. Extensive evaluations on three optical satellite image cloud detection datasets validate the superior performance of HR-cloud-Net compared with existing methods.
KEYWORDS: RGB color model, Image classification, Education and training, Transformers, Data modeling, Agriculture, Feature extraction, Deep learning, Machine learning, Matrices
Seed phenomics is a comprehensive assessment of complex seed traits, and seed classification is an indispensable step. Plant seed recognition is of great significance in agricultural production, ecological environment, and biodiversity. However, some traditional artificial plant seed classification methods are expensive, time consuming, and laborious. Therefore, there is a need that cannot be ignored for a method to improve the situation. Artificial intelligence is making a huge impact on various fields through its perception, reasoning, and learning capabilities. A challenge in pratacultural research, the rapid auto-identification of plant seeds, might be better resolved by the integration of computer vision. For the lack of a public seed dataset for the training of models, we established a dataset called LZUPSD, which includes images of 88 different species of seeds. We explored methods to achieve fine-grained seed classification using convolutional neural networks and tried to apply a transformer to it. The method has the highest accuracy of more than 95%. The method is able to identify plant seeds automatically with high speed, low cost, and high accuracy. It results in a more efficient plant seed recognition method. At the same time, we have established a platform where users can upload pictures to obtain seed information. In addition, our dataset will be released to the public in the next phase in order to share with interested researchers.
Digital camera sensors capture natural images in low-light conditions, resulting in poor imaging results. Existing low-light image enhancement (LLE) often yields unnatural results due to over-enhancement, artifacts, severe noise, etc. Prior studies either perform visual dual-pathway mechanisms or Retinex prior optimization for image enhancement. However, image enhancement based on the former generates artifacts because it directly stretches contrast in the structural layer with mixed high- and low-frequency information. The latter results in over-enhancement due to adding empirical prior items to the objective function. Thus, a unified three-pathway framework is proposed to address the aforementioned deficiencies for LLE. Specially, the proposed framework is composed of detail pathway, reflection pathway, and illuminance pathway. First, three information processing pathways can be obtained through different image decomposition strategies. Second, an indirect noise suppression strategy is developed in the computational flow of detail pathway and reflection pathway to address noise amplification problem of image enhancement. Third, the naturalness preservation enhancement task is conducted in the reflection pathway and illuminance pathway. Finally, the outputs of different pathways are weighted and fused to enhance low-light image. Moreover, qualitative and quantitative experimental results on two test datasets show that the proposed framework outperforms state-of-the-art methods.
Unsupervised domain adaptation (UDA) person re-identification (re-ID) aims to apply useful knowledge learned on labeled source domain datasets to unlabeled target domain datasets. Most successful UDA re-ID methods combine clustering to generate pseudo labels for feature learning and further fine-tuning on the target domain in an alternating manner. However, the interaction of the two steps is offline, which may make the noisy pseudo labels greatly hinder the classification and retrieval ability of the whole model. In order to purify these noisy pseudo labels, in this paper, a framework called Unsupervised Confident Co-promoting (UCC) is proposed. Specially, two peer teach-student co-training models are adopted simultaneously to online refine the noisy pseudo labels with the offline clustering algorithm and supervise each other during iterations. More significantly, we introduce a confidence strategy that greatly improves the confidence of generating pseudo labels in the case of multi-network collaboratively guided learning. The combination of the above two allows our final method to greatly improve noisy pseudo labels on re-ID task, achieve a huge performance boost and generalize to more deep learning domains. Moreover, our method achieves a significant improvement in common evaluation indicators in the four most common re-ID experiments compared to the state-of-the-art (SOTA) methods, and even on some results is comparable to the supervised learning method.
KEYWORDS: Data centers, Hyperspectral imaging, Image classification, Data modeling, Neural networks, Machine learning, Convolution, 3D modeling, Process modeling, Performance modeling
Graph convolutional networks (GCNs) have achieved great success in hyperspectral image (HSI) classification. However, there are also some difficulties that need to be solved, such as the lack of enough labeled samples in the training process and the difference between the numbers of different land-cover analogies is too large, which will lead to poor classification performance. In order to alleviate those problems, we propose a Mutual Learning Graph Convolutional Network (called MLGCN) with an imbalance loss, which trains two GCN simultaneously, and these two GCNs can learn from each other by using mutual learning strategy in the training process. Experiments on three real data sets show that the proposed MLGCN method achieves state-of-the-art results in terms of three classification evaluation metrics, including overall accuracy (OA), average accuracy (AA), kappa coefficient (κ),.
Hyperspectral images (HSIs) classification methods based on convolutional neural networks are becoming increasingly popular. Many proposed methods extract spatial and spectral features simultaneously, and the interaction between the two types of features leads to unsatisfied classification results. Moreover, most of the existing CNN-based methods mainly consider single-scale, which can cause some important information to be neglected. To address the aforementioned two issues, we propose a Attention-Based Multi-Scale Network (AMSN) for HSIs classification. First, the proposed network is based on 3D-CNN, through channel branch and spatial branch, the AMSN can capture more distinctive spatial and spectral features using different convolution kernels, respectively. Second, Local and global features are extracted by dense network. Then, two features are concatenated to make full use of multi-scale features. Third, attention blocks are used after each branch to achieve the most distinctive features. Experimental results on three HSIs datasets demonstrate that the proposed framework can achieve better classification performance than the several state-of-the-art methods.
Network pruning has achieved great success in the compression and acceleration of neural networks on resource- limited devices. Previous pruning algorithms utilize filter pruning or channel pruning with the definition of a specific global or local pruning rate. Conventional pruning only finds or considers global or local pruning rates. In the only consideration of global pruning works, they ignore the individual characteristics of each layer. Similarly, only consideration of local pruning works could lead to a fragmented connection between layers. In this paper, we propose a novel method named global and local pruning under knowledge distillation (GLKD) by a combination of filter pruning and channel pruning technology, which is trained with a mixture of global and local pruning rates. The proposed algorithm, GLKD, can accelerate the inference of ResNet-110 to 56.2% speed-up with 0.17% accuracy increase on the CIFAR-100 dataset, which has great trade-offs in accuracy and compression. Additionally, the experiments of GLKD on ImageNet with ResNet-56 and ResNet-110 are conducted to prove its effectiveness on the compressed model. Moreover, the knowledge distillation is adopted on the pruning step in GLKD algorithm and improves the accuracy of the pruned network.
We propose a new dual-level convolutional neural network model based on Inception modules and residual connections. First, Inception has filters with different kernel sizes, and its output feature maps contain different scales of receptive fields. The feature map with the wide receptive field receives the global information, while the feature map with the small receptive field contains some local information. The multiscale feature provides more comprehensive information. Second, with the help of residual connections, the training process is simple and can avoid overfitting. Third, the proposed network adopts two levels, i.e., a low level and a high level, and uses the feature fusion operation to take full advantage of the complementary and correlated information of the two levels. Fourth, we combine the spatial features and the spectral features of hyperspectral image (HSI). The pixels to be classified with their neighborhood information serve as the input of the neural network to realize spectral–spatial classification for HSI. Experimental results show that our model performs better than other state-of-the-art methods.
Most existing feature learning methods optimize inflexible handcrafted features and the affinity matrix is constructed by shallow linear embedding methods. Different from these conventional methods, we pretrain a generative neural network by stacking convolutional autoencoders to learn the latent data representation and then construct an affinity graph with them as a prior. Based on the pretrained model and the constructed graph, we add a self-expressive layer to complete the generative model and then fine-tune it with a new loss function, including the reconstruction loss and a deliberately defined locality-preserving loss. The locality-preserving loss designed by the constructed affinity graph serves as prior to preserve the local structure during the fine-tuning stage, which in turn improves the quality of feature representation effectively. Furthermore, the self-expressive layer between the encoder and the decoder is based on the assumption that each latent feature is a linear combination of other latent features, so the weighted combination coefficients of the self-expressive layer are used to construct a new refined affinity graph for representing the data structure. We conduct experiments on four datasets to demonstrate the superiority of the representation ability of our proposed model over the state-of-the-art methods.
An iterative joint bilateral filter is used to obtain a natural weight map. Images from different modalities are merged by a weighted-sum rule in the spatial domain. Saliency maps are determined by the gradient of the pairwise raw images. Comparing the pairwise values of saliency maps, a coarse weight map is attained to determine which pixel is preferred. Since such a coarse weight map obtained by pairwise comparison is not a natural weight map subjectively, i.e., it is inconsistent with human visual system, the weight map is modified by using an iterative joint bilateral filter. With the iterative joint bilateral filter, the weight map becomes natural. We use the refined weight map to obtain the fused image and we seamlessly merge images from different modalities effectively. Experiments were conducted on several pairs of multimodal images to verify the effectiveness and superiority of the proposed image fusion algorithm compared to the state-of-the-art methods.
A spatial domain multifocus image fusion method is proposed using a structure-preserving filter. In particular, the latest recursive filter (RF) is introduced as the structure-preserving filter in the proposed spatial domain method. Moreover, a focused region detection method is presented to determine initial weight maps based on an average low-pass filter. Then a fused image can be generated by the final weight maps, which are obtained using the RF to refine the initial weight maps and can well preserve the structures of source images. Experimental results show that the proposed method is superior to the state-of-the-art multifocus fusion methods in terms of subjective and objective evaluation.
We present a framework based on the development of adaptive scalable kernel (ASK) for hyperspectral image classification, which can achieve an excellent status in removing insignificant details and defending crucial features. The proposed method consists of three steps. First, the spectral feature extraction based on interval gradient and a fast morphological filter is used to reduce the high dimensionality. Second, a powerful spatial structure extraction method based on adaptive scale kernels is adopted to enhance the performance of structure-preserving filtering. Depending on patch-based statistics, this model identifies small-scale texture from large-scale structure and finds an optimal per-pixel smoothing scale. Third, the obtained spectral structure feature maps are classified with the large-margin distribution machine. The experimental results show that the proposed spatial structure extraction method based on ASK achieves the state-of-the-art performance in terms of classification accuracy and computational efficiency.
Image fusion aims at exploiting complementary information in multimodal images to create a single composite image with extended information content. An image fusion framework is proposed for different types of multimodal images with fast filtering in the spatial domain. First, image gradient magnitude is used to detect contrast and image sharpness. Second, a fast morphological closing operation is performed on image gradient magnitude to bridge gaps and fill holes. Third, the weight map is obtained from the multimodal image gradient magnitude and is filtered by a fast structure-preserving filter. Finally, the fused image is composed by using a weighed-sum rule. Experimental results on several groups of images show that the proposed fast fusion method has a better performance than the state-of-the-art methods, running up to four times faster than the fastest baseline algorithm.
Image intensity value is determined by both the albedo component and the shading component. The albedo component describes the physical nature of different objects at the surface of the earth, and land-cover classes are different from each other because of their intrinsic physical materials. We, therefore, recover the intrinsic albedo feature of the hyperspectral image to exploit the spatial semantic information. Then, we use the support vector machine (SVM) to classify the recovered intrinsic albedo hyperspectral image. The SVM tries to maximize the minimum margin to achieve good generalization performance. Experimental results show that the SVM with the intrinsic albedo feature method achieves a better classification performance than the state-of-the-art methods in terms of visual quality and three quantitative metrics.
The hyperchaotic sequence and the DNA sequence are utilized jointly for image encryption. A four-dimensional hyperchaotic system is used to generate a pseudorandom sequence. The main idea is to apply the hyperchaotic sequence to almost all steps of the encryption. All intensity values of an input image are converted to a serial binary digit stream, and the bitstream is scrambled globally by the hyperchaotic sequence. DNA algebraic operation and complementation are performed between the hyperchaotic sequence and the DNA sequence to obtain a robust encryption performance. The experiment results demonstrate that the encryption algorithm achieves the performance of the state-of-the-art methods in term of quality, security, and robustness against noise and cropping attack.
Support vector machine (SVM) classifiers are widely applied to hyperspectral image (HSI) classification and provide significant advantages in terms of accuracy, simplicity, and robustness. SVM is a well-known learning algorithm that maximizes the minimum margin. However, recent theoretical results pointed out that maximizing the minimum margin leads to a lower generalization performance than optimizing the margin distribution, and proved that the margin distribution is more important. In this paper, a large margin distribution machine (LDM) is applied to HSI classification, and optimizing the margin distribution achieves a better generalization performance than SVM. Since the raw HSI feature space is not the most effective space for representing HSI, we adopt factor analysis to learn an effective HSI feature and the learned features are further filtered by a structure-preserved filter to fully exploit the spatial structure information of HSI. The spatial structure information is integrated in the feature learning process to obtain a better HSI feature. Then we propose a multiclass LDM to classify the filtered HSI feature. Experimental results show that the proposed LDM with feature learning method achieves the classification performance of the state-of-the-art methods in terms of visual quality and three quantitative evaluations and indicates that LDM has a high generalization performance.
On the basis of the different strengths of synaptic connections between actual neurons, this paper proposes a heterogeneous pulse coupled neural network (HPCNN) algorithm to perform quantization on images. HPCNNs are developed from traditional pulse coupled neural network (PCNN) models, which have different parameters corresponding to different image regions. This allows pixels of different gray levels to be classified broadly into two categories: background regional and object regional. Moreover, an HPCNN also satisfies human visual characteristics. The parameters of the HPCNN model are calculated automatically according to these categories, and quantized results will be optimal and more suitable for humans to observe. At the same time, the experimental results of natural images from the standard image library show the validity and efficiency of our proposed quantization method.
We address the problem of fusing multifocus images based on the phase congruency (PC). PC provides a sharpness feature of a natural image. The focus measure (FM) is identified as strong PC near a distinctive image feature evaluated by the complex Gabor wavelet. The PC is more robust against noise than other FMs. The fusion image is obtained by a new fusion rule (FR), and the focused region is selected by the FR from one of the input images. Experimental results show that the proposed fusion scheme achieves the fusion performance of the state-of-the-art methods in terms of visual quality and quantitative evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.