Owing to the high spectral sampling, the spectral information in hyperspectral imagery (HSI) is often highly correlated and contains redundancy. Motivated by the recent success of sparsity preserving based dimensionality reduction (DR) techniques in both computer vision and remote sensing image analysis community, a novel supervised nonparametric sparse discriminant analysis (NSDA) algorithm is presented for HSI classification. The objective function of NSDA aims at preserving the within-class sparse reconstructive relationship for within-class compactness characterization and maximizing the nonparametric between-class scatter simultaneously to enhance discriminative ability of the features in the projected space. Essentially, it seeks for the optimal projection matrix to identify the underlying discriminative manifold structure of a multiclass dataset. Experimental results on one visualization dataset and three recorded HSI dataset demonstrate NSDA outperforms several state-of-the-art feature extraction methods for HSI classification.
Detection of anomalous targets of various sizes in hyperspectral data has received a lot of attention in reconnaissance and surveillance applications. Many anomaly detectors have been proposed in literature. However, current methods are susceptible to anomalies in the processing window range and often make critical assumptions about the distribution of the background data. Motivated by the fact that anomaly pixels are often distinctive from their local background, in this letter, we proposed a novel hyperspectral anomaly detection framework for real-time remote sensing applications. The proposed framework consists of four major components, sparse feature learning, pyramid grid window selection, joint spatial-spectral collaborative coding and multi-level divergence fusion. It exploits the collaborative representation difference in the feature space to locate potential anomalies and is totally unsupervised without any prior assumptions. Experimental results on airborne recorded hyperspectral data demonstrate that the proposed methods adaptive to anomalies in a large range of sizes and is well suited for parallel processing.
This paper extends the ground-level visual attributes to high resolution remote sensing imagery to demonstrate the useful-ness of visual attributes for remote sensing tasks such as image classification. Visual attributes have been introduced as the semantic properties that transcend the categories. We train predictors from the largest ground-level attributes datasets, SUN, for 102 visual attributes, which is well defined in SUN. We first form an attribute-based representation for the remote sensing imagery with the output of trained attribute predictors. We then evaluate the classification performances of the attribute-based representation against traditional features. Extensive experiments on the ground-level baseline dataset scene 15 and remote sensing dataset UCMLU shows that ground-level visual attributes outperform the traditional low-level features in the classification problem, and the combination of ground-level visual attribute and low-level features obtains best classification rate. Moreover, we demonstrate that attribute-based representation is much more semantically powerful than the low-level features.
In this paper, we introduce and study a novel unsupervised domain adaptation (DA) algorithm, called latent subspace sparse representation based domain adaptation, based on the fact that source and target data that lie in different but related low-dimension subspaces. The key idea is that each point in a union of subspaces can be constructed by a combination of other points in the dataset. In this method, we propose to project the source and target data onto a common latent generalized subspace which is a union of subspaces of source and target domains and learn the sparse representation in the latent generalized subspace. By employing the minimum reconstruction error and maximum mean discrepancy (MMD) constraints, the structure of source and target domain are preserved and the discrepancy is reduced between the source and target domains and thus reflected in the sparse representation. We then utilize the sparse representation to build a weighted graph which reflect the relationship of points from the different domains (source-source, source- target, and target-target) to predict the labels of the target domain. We also proposed an efficient optimization method for the algorithm. Our method does not need to combine with any classifiers and therefore does not need train the test procedures. Various experiments show that the proposed method perform better than the competitive state of art subspace-based domain adaptation.
Semantic classification of very high resolution (VHR) remote sensing images is of great importance for land use or land cover investigation. A large number of approaches exploiting different kinds of low level feature have been proposed in the literature. Engineers are often frustrated by their conclusions and a systematic assessment of various low level features for VHR remote sensing image classification is needed. In this work, we firstly perform an extensive evaluation of eight features including HOG, dense SIFT, SSIM, GIST, Geo color, LBP, Texton and Tiny images for classification of three public available datasets. Secondly, we propose to transfer ground level scene attributes to remote sensing images. Thirdly, we combine both low-level features and mid-level visual attributes to further improve the classification performance. Experimental results demonstrate that i) Dene SIFT and HOG features are more robust than other features for VHR scene image description. ii) Visual attribute competes with a combination of low level features. iii) Multiple feature combination achieves the best performance under different settings.
This paper presents a robust pedestrian detection algorithm that works on infrared imageries. Our algorithm is
applicable to images captured from surveillance infrastructure as well as moving platforms. Firstly, we introduce a
local binary pattern (LBP) texture feature for infrared pedestrian representation. Secondly, motivated by the recent
success of multiple cues pedestrian detection in visual imagery, we combine both shape and binary pattern texture
features for effective infrared pedestrian description, providing a level of robustness to variations in pedestrian
shape and appearance in infrared images. Finally, a support vector machine (SVM) classifier is utilized to classify
sub-windows into pedestrians or background. Experimental results demonstrate the robustness and effectiveness of
our method.
We present a hybrid generative-discriminative learning method for human action recognition from video sequences. Our model combines a bag-of-words component with supervised latent topic models. A video sequence is represented as a collection of spatiotemporal words by extracting space-time interest points and describing these points using both shape and motion cues. The supervised latent Dirichlet allocation (sLDA) topic model, which employs discriminative learning using labeled data under a generative framework, is introduced to discover the latent topic structure that is most relevant to action categorization. The proposed algorithm retains most of the desirable properties of generative learning while increasing the classification performance though a discriminative setting. It has also been extended to exploit both labeled data and unlabeled data to learn human actions under a unified framework. We test our algorithm on three challenging data sets: the KTH human motion data set, the Weizmann human action data set, and a ballet data set. Our results are either comparable to or significantly better than previously published results on these data sets and reflect the promise of hybrid generative-discriminative learning approaches.
An unsupervised learning algorithm based on topic models is presented for lane detection in video sequences observed by uncalibrated moving cameras. Our contributions are twofold. First, we introduce the maximally stable extremal region (MSER) detector for lane-marking feature extraction and derive a novel shape descriptor in an affine invariant manner to describe region shapes and a modified scale-invariant feature transform descriptor to capture feature appearance characteristics. MSER features are more stable compared to edge points or line pairs and hence provide robustness to lane-marking variations in scale, lighting, viewpoint, and shadows. Second, we proposed a novel location-enhanced probabilistic latent semantic analysis (pLSA) topic model for simultaneous lane recognition and localization. The proposed model overcomes the limitation of a pLSA model for effective topic localization. Experimental results on traffic sequences in various scenarios demonstrate the effectiveness and robustness of the proposed method.
We present a novel unsupervised learning algorithm for discovering objects and their location in videos from moving cameras. The videos can switch between different shots, and contain cluttered background, occlusion, camera motion, and multiple independently moving objects. We exploit both appearance consistency and spatial configuration consistency of local patches across frames for object recognition and localization. The contributions of this paper are twofold. First, we propose a combined approach for simultaneous spatial context and temporal context generation. Local video patches are extracted and described using the generated spatial-temporal context words. Second, a dynamic topic model, based on the representation of a bag of spatial-temporal context words, is introduced to learn object category models in video sequences. The proposed model can categorize and localize multiple objects in a single video. Objects leaving or entering the scene at multiple times can also be handled efficiently in the dynamic framework. Experimental results on the CamVid data set and the VISAT™ data set demonstrate the effectiveness and robustness of the proposed method.
A scale-invariant feature transform (SIFT)-based particle filter algorithm is presented for joint detection and tracking of independently moving objects in stereo sequences observed by uncalibrated moving cameras. The major steps include feature detection and matching, moving object detection based on multiview geometric constraints, and tracking based on particle filter. Our contributions are first, a novel closed-loop mapping (CLM) multiview matching scheme proposed for stereo matching and motion tracking. CLM outperforms several state-of-the-art SIFT matching methods in terms of density and reliability of feature correspondences. Our second contribution is a multiview epipolar constraint derived from the relative camera positions in pairs of consecutive stereo views for independent motion detection. The multiview epipolar constraint is able to detect moving objects followed by moving cameras in the same direction, a configuration where the epipolar constraint fails. Our third contribution is a proposed dimensional variable particle filter for joint detection and tracking of independently moving objects. Multiple moving objects entering or leaving the field of view are handled effectively within the proposed framework. Experimental results on real-world stereo sequences demonstrate the effectiveness and robustness of our method.
Local invariant features such as scale invariant feature transform (SIFT) have received considerable attention in recent
years. Despite its tremendous success in computer vision applications, SIFT matching alone is not sufficient for remote
sensing image registration because of low detection repeatablility and nonlinear intensity changes. In this paper, we
introduce a remote sensing image registration algorithm that combines local affine frames (LAF) together with SIFT
matching. Firstly, distinctive SIFT keypoints and maximally stable extremal regions (MSER) are detected independently
in the reference image and the sensed image. Contrast reversal invariant SIFT descriptor is constructed for describing
texture patches around SIFT keypoints and shape descriptor defined in LAF is constructed for describing MSER contour.
Nearest neighbor distance ratio matching with confidence measurement is then adopted to match both descriptors.
Tentative correspondences are ranked according to their confidence measurements. Finally, random sample consensus
(RANSAC) is performed in the top ranked matched features to obtain a global set of transform parameters. Experimental
results demonstrate the robustness and accuracy of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.