View-based indexing schemes for 3D object retrieval are gaining popularity since they provide good retrieval results.
These schemes are coherent with the theory that humans recognize objects based on their 2D appearances. The viewbased
techniques also allow users to search with various queries such as binary images, range images and even 2D
sketches.
The previous view-based techniques use classical 2D shape descriptors such as Fourier invariants, Zernike moments,
Scale Invariant Feature Transform-based local features and 2D Digital Fourier Transform coefficients. These methods
describe each object independent of others. In this work, we explore data driven subspace models, such as Principal
Component Analysis, Independent Component Analysis and Nonnegative Matrix Factorization to describe the shape
information of the views. We treat the depth images obtained from various points of the view sphere as 2D intensity
images and train a subspace to extract the inherent structure of the views within a database. We also show the benefit of
categorizing shapes according to their eigenvalue spread. Both the shape categorization and data-driven feature set
conjectures are tested on the PSB database and compared with the competitor view-based 3D shape retrieval algorithms.
In this work, we present a pose-invariant shape matching methodology for complete 3D object models. Our approach is based on first
describing the objects with shape descriptors and then minimizing the distance between descriptors over an appropriate set of
geometric transformations. Our chosen shape description methodology is the density-based framework (DBF), which is
experimentally shown to be very effective in 3D object retrieval [1]. In our earlier work, we showed that density-based descriptors
exhibit a permutation property that greatly reduces the equivocation of the eigenvalue-based axis labeling and moments-based polarity
assignment in a computationally very efficient manner. In the present work, we show that this interesting permutation property is a
consequence of the symmetry properties of regular polyhedra. Furthermore, we extend the invariance scheme to arbitrary 3D rotations
by a discretization of the infinite space of 3D rotations followed by a nearest neighbor based approximate procedure employed to
generate the necessary permutations.
We provide a survey of hand biometric techniques in the
literature and incorporate several novel results of hand-based personal
identification and verification. We compare several feature
sets in the shape-only and shape-plus-texture categories, emphasizing
the relevance of a proper hand normalization scheme in the
success of any biometric scheme. The preference of the left and
right hands or of ambidextrous access control is explored. Since the
business case of a biometric device partly hinges on the longevity of
its features and the generalization ability of its database, we have
tested our scheme with time-lapse data as well as with subjects that
were unseen during the training stage. Our experiments were conducted
on a hand database that is an order of magnitude larger than
any existing one in the literature.
This paper explores the morphosyntactic tools for text watermarking and develops a syntax-based natural language
watermarking scheme. Turkish, an agglutinative language, provides a good ground for the syntax-based natural language
watermarking with its relatively free word order possibilities and rich repertoire of morphosyntactic structures. The
unmarked text is first transformed into a syntactic tree diagram in which the syntactic hierarchies and the functional
dependencies are coded. The watermarking software then operates on the sentences in syntax tree format and executes
binary changes under control of Wordnet to avoid semantic drops. The key-controlled randomization of morphosyntactic
tool order and the insertion of void watermark provide a certain level of security. The embedding capacity is calculated
statistically, and the imperceptibility is measured using edit hit counts.
In this paper, we focus on blind source cell-phone identification problem. It is known various artifacts in the image processing pipeline, such as pixel defects or unevenness of the responses in the CCD sensor, black current noise, proprietary interpolation algorithms involved in color filter array [CFA] leave telltale footprints. These artifacts, although often imperceptible, are statistically stable and can be considered as a signature of the camera type or even of the individual device. For this purpose, we explore a set of forensic features, such as binary similarity measures, image quality measures and higher order wavelet statistics in conjunction SVM classifier to identify the originating cell-phone type. We provide identification results among 9 different brand cell-phone cameras. In addition to our initial results, we applied a set of geometrical operations to original images in order to investigate how much our proposed method is robust under these manipulations.
Techniques and methodologies for validating the authenticity of digital images and testing for the presence of doctoring and manipulation operations on them has recently attracted attention. We review three categories of forensic features and discuss the design of classifiers between doctored and original images. The performance of classifiers with respect to selected controlled manipulations as well as to uncontrolled manipulations is analyzed. The tools for image manipulation detection are treated under feature fusion and decision fusion scenarios.
We propose and compare three different automatic landmarking methods for near-frontal faces. The face information is
provided as 480x640 gray-level images in addition to the corresponding 3D scene depth information. All three methods
follow a coarse-to-fine suite and use the 3D information in an assist role. The first method employs a combination of
principal component analysis (PCA) and independent component analysis (ICA) features to analyze the Gabor feature
set. The second method uses a subset of DCT coefficients for template-based matching. These two methods employ
SVM classifiers with polynomial kernel functions. The third method uses a mixture of factor analyzers to learn Gabor
filter outputs. We contrast the localization performance separately with 2D texture and 3D depth information. Although
the 3D depth information per se does not perform as well as texture images in landmark localization, the 3D information
has still a beneficial role in eliminating the background and the false alarms.
In this paper, we investigate recognition performances of various projection-based features applied on registered 3D scans of faces. Some features are data driven, such as ICA-based features or NNMF-based features. Other features are obtained using DFT or DCT-based schemes. We apply the feature extraction techniques to three different representations of registered faces, namely, 3D point clouds, 2D depth images and 3D voxel. We consider both global and local features. Global features are extracted from the whole face data, whereas local features are computed over the blocks partitioned from 2D depth images. The block-based local features are fused both at feature level and at decision level. The resulting feature vectors are matched using Linear Discriminant Analysis. Experiments using different combinations of representation types and feature vectors are conducted on the 3D-RMA dataset.
A biometric scheme based on the silhouettes and/or textures of the hands is developed. The crucial part of the algorithm is the accurate registration of the deformable shape of the hands since subjects are not constrained in pose or posture during acquisition. A host of shape and texture features are comparatively evaluated, such as Independent component features (ICA features), Principal Component Analysis (PCA features), Angular Radial Transform (ART features) and the distance transform (DT) based features. Even with a limited number of training data it is shown that this biometric
scheme can perform reliably for populations up to several hundreds.
We propose measures to evaluate the performance of video object segmentation and tracking methods quantitatively without
ground-truth segmentation maps. The proposed measures are based on
spatial differences of color and motion along the boundary of the
estimated video object plane and temporal differences between the
color histogram of the current object plane and its neighbors.
They can be used to localize (spatially and/or temporally) regions
where segmentation results are good or bad; and/or combined to
yield a single numerical measure to indicate the goodness of the
boundary segmentation and tracking results over a sequence. The
validity of the proposed performance measures without ground
truth have been demonstrated by canonical correlation analysis of
the proposed measures with another set of measures it with
ground-truth on a set of sequences (where ground truth
information is available). Experimental results are presented to
evaluate the segmentation maps obtained from various sequences
using different segmentation and tracking algorithms.
Classification of audio documents as bearing hidden information or not is a security issue addressed in the context of steganalysis. A cover audio object can be converted into a stego-audio object via steganographic methods. In this study we present a statistical method to detect the presence of hidden messages in audio signals. The basic idea is that, the distribution of various statistical distance measures, calculated on cover audio signals and on stego-audio signals vis-à-vis their denoised versions, are statistically different. The design of audio steganalyzer relies on the choice of these audio quality measures and the construction of a two-class classifier. Experimental results show that the proposed technique can be used to detect the presence of hidden messages in digital audio data.
Semi-fragile watermarking techniques aim to prevent tampering and fraudulent use of modified images. A semi-fragile watermark monitors the integrity of the content of the image but not its exact representation. Thus the watermark is designed so that if the content of the image has not been tampered with, and so long as the correct key is known and the image ha sufficiently high quality, the integrity is proven. However if some parts of the image is replaced by someone who does not possess the key, the watermark information will not be reliably detected, which can be taken as evidence of forgery. In this paper we compare the performance of nine semi-fragile watermarking algorithms in terms of their miss probability under forgery attack, and in terms of false alarm probability under mild, hence non-malicious signal processing operations that preserve the content and quality of the image. We propose desiderata for semi-fragile watermarking algorithms and indicate the promising algorithms among existing ones.
KEYWORDS: Digital watermarking, Error control coding, Lead, Error analysis, Modulation, Video compression, Video, Signal to noise ratio, Binary data, Analytical research
The watermark signals are weakly inserted in images due to imperceptibility constraints which makes them prone to errors in the extraction stage. Although the error correcting codes can potentially improve their performance one must pay attention to the fact that the watermarking channel is in general very noisy. We have considered the trade-off of the BCH codes and repetition codes in various concatenation modes. At the higher rates that can be encountered in watermarking channels such as due to low-quality JPEG compression, codes like the BCH codes cease being useful. Repetition coding seems to be the last resort at these error rates of 25% and beyond. It has been observed that there is a zone of bit error rate where their concatenation turns out to be more useful. In fact the concatenation of repetition and BCH codes judiciously dimensioned, given the available number of insertion sites and the payload size, achieves a higher reliability level.
In this paper, we present techniques for steganalysis of images that have been potentially subjected to a watermarking algorithm. Our hypothesis is that a particular watermarking scheme leaves statistical evidence or structure that can be exploited for detection with the aid of proper selection of image features and multivariate regression analysis. We use some sophisticated image quality metrics as the feature set to distinguish between watermarked and unwatermarked images. To identify specific quality measures, which provide the best discriminative power, we use analysis of variance (ANOVA) techniques. The multivariate regression analysis is used on the selected quality metrics to build the optimal classifier using images and their blurred versions. The idea behind blurring is that the distance between an unwatermarked image and its blurred version is less than the distance between a watermarked image and its blurred version. Simulation results with a specific feature set and a well-known and commercially available watermarking technique indicates that our approach is able to accurately distinguish between watermarked and unwatermarked images.
The control of the integrity and authentication of medical images is becoming ever more important within the Medical Information Systems (MIS). The intra- and interhospital exchange of images, such as in the PACS (Picture Archiving and Communication Systems), and the ease of copying, manipulation and distribution of images have brought forth the security aspects. In this paper we focus on the role of watermarking for MIS security and address the problem of integrity control of medical images. We discuss alternative schemes to extract verification signatures and compare their tamper detection performance.
We present a technique that provides progressive transmission and near-lossless compression in one single framework. The proposed technique produces a bitstream that results in progressive reconstruction of the image just like what one can obtain with a reversible wavelet codec. In addition, the proposed scheme provides near-lossless reconstruction with respect to a given bound after each layer of the successively refinable bitstream is decoded. We formulate the image data compression problem as one of asking the optimal questions to determine, respectively, the value or the interval of the pixel, depending on whether one is interested in lossless or near-lossless compression. New prediction methods based on the nature of the data at a given pass are presented and links to the existing methods are explored. The trade-off between non- causal prediction and data precision is discussed within the context of successive refinement. Context selection for prediction in different passes is addressed. Finally, experimental results for both lossless and near-lossless cases are presented, which are competitive with the state-of-the-art compression schemes.
This paper presents a new distortion measure for multi-band image vector quantization. The distortion measure penalizes the deviation in the ratios of the components. We design a VQ coder for the proposed ratio distortion measure. We then give experimental results that demonstrate that the new VQ coder yields better component ratio preservation than conventional techniques. For sample images, the proposed scheme outperforms SPIHT, JPEG and conventional VQ in color ratio preservation.
This study presents design of 2D nonseparable Perfect Reconstruction Filter Bank (PRFB) for two different sampling lattices: the quincuncial and rectangular. In quincunx case z-domain PR conditions are mapped into Bernstein-x domain. Desired power spectrum of 2D nonseparable filter is approximated by using Bernstein polynomial. Since we introduce mapping from complex periodic domain to real polynomial domain, PRFB design in Bernstein-x domain is much easier to handle. The parametric solution for 2D nonseparable design technique is obtained with desired regularity for quincunx sampling lattices. This technique allows us to design of 2D wavelet transform. For rectangular downsampling, the use of signed shuffling operations to obtain a PRFB from a low pass filter enables the reduction of PR conditions. This design technique leads us to efficient implementation structure since all the filters in the bank have the same coefficients with sign and position changes. This structure overcomes the high complexity problem that is the major shortcoming of 2D nonseparable filter banks. Designed filter banks are tested on 2D image models and real images in terms of compaction performance. It has been shown that nonseparable design can outperform separable ones in the application of data compression.
This paper presents a new design technique for obtaining optimum M channel orthogonal subband coders where M equals 2i. All filters that constitute the subband coder are FIR type filters with linear phase. We carry out the design in time domain, based on time domain orthonormality constraints that the filters must satisfy. Once a suitable low pass filter h0(n) is found, the remaining (M-1) filters of the coder are obtained through the use of shuffling operators on that filter. Since all resulting subband filters use the same numerical coefficient values (in different shift positions), this technique leads to a set of filters that can be implemented very efficiently. If, on the other hand, maximization of the coding gain is more important consideration than efficient implementation, the set of impulse responses obtained through shuffling can be further processed to remove the correlation between the subbands. This process leads to a new set of orthonormal linear phase filters that no longer share the same numerical coefficient values. In general, coding gain performance for this new set is better compared to the initial design. This uncorrelated decomposition can be thought of as counterpart of Karhunen-Loeve transform in an M channel filter bank.
A system for segmentation of head-and-shoulder scenes into semantic regions, to be applied in a model-based coding scheme on video telephony, is described. The system is conceptually divided into three levels of processing and uses successive semantic regions of interest to locate the speaker, the face and the eyes automatically. Once candidate regions have been obtained by the low level segmentation modules, higher level modules perform measurements on these regions and compare these with expected values to extract the specific region searched for. Fuzzy membership functions are used to allow deviations from the expected values. The system is able to locate satisfactorily the facial region and the eye regions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.