We propose a new method to compute the approximate nearest-neighbors field (ANNF) between image pairs using random kd-tree and patch set sub-sampling. By exploiting image coherence we demonstrate that it is possible to reduce the number of patches on which we compute the ANNF, while maintaining high overall accuracy on the final result. Information on missing patches is then recovered by interpolation and propagation of good matches. The introduction of the sub-sampling factor on patch sets also allows for setting the desired trade off between accuracy and speed, providing a flexibility that lacks in state-of-the-art methods. Tests conducted on a public database prove that our algorithm achieves superior performance with respect to PatchMatch (PM) and Coherence Sensitivity Hashing (CSH) algorithms in a comparable computational time.
The recently standardized Scalable Video Coding(SVC) extension of H.264/AVC allows bitstream scalability
with improved rate-distortion efficiency with respect to the classical Simulcasting approach, at the cost of an
increased computational complexity of the encoding process. So one critical issue related to practical deployment
of SVC is the complexity reduction, fundamental to use it in consumer applications. In this paper, we present a
fully scalable fast motion estimation algorithm that enables an excellent complexity performance.
In this paper a new fully scalable - wavelet based - video coding architecture is proposed, where motion compensated temporal filtered subbands of spatially scaled versions of a video sequence can be used as base layer for inter-scale predictions. These predictions take place between data at the same resolution level without the need of interpolation. The prediction residuals are further transformed by spatial wavelet decompositions. The resulting multi-scale spatiotemporal wavelet subbands are coded thanks to an embedded morphological dilation technique and context based arithmetic coding. Dyadic spatio-temporal scalability and progressive SNR scalability are achieved. Multiple adaptation
decoding can be easily implemented without the need of knowing a predefined set of operating points. The proposed coding system allows to compensate some of the typical drawbacks of current wavelet based scalable video coding architectures and shows interesting visual results even when compared with the single operating point video coding standard AVC/H.264.
In the domain of video indexing, one of the research topics is the automatic extraction of information to reach the objective of automatically describing and organizing the content. Thinking of a video stream, different kinds of information can be taken into account, but we can suppose that most of the information is contained in the foreground objects so that number of objects, their shape, their contours and so on, can constitute a good guess for the content description. This paper describes a new approach to extract foreground objects in MPEG2 video stream, in the framework of "rough
indexing paradigm" we define. This paradigm leads us to reach the purpose in near real time, nevertheless maintaining a good level of details.
This paper presents some ideas which extend the functionality and the application fields of a spatially selective coding within a JPEG2000 framework. At first, the image quality drop between the Regions of Interest (ROI) and the background (BG) is considered. In a conventional approach, the reconstructed image quality steeply drops along the ROI boundary; however, this effect could be considered or perceived objectionable in some use cases. A simple quality decay management is proposed here, which makes use of concentric ROI with different scaling factors. This allows the technique to be perfectly consistent with the JPEG2000 part 2 ROI definition and description. Another considered issue is the extension of the selective ROI coding to a 3D Volume of Interest coding. This extension is currently under consideration for the part 10 of JPEG2000, JP3D. An easy and effective 2D to 3D extension for the VOI definition and description is proposed here: a VOI is defined by a set composition of ROI generated solids, where ROI are defined along one or more volume cutting direction, and is described by the relative set of ROI parameters. Moreover, the quality decay management can be applied to this extension. The proposed techniques could have a significant impact on the selective coding of medical images and volumes. Image quality issues are very important but very critical factors in that field, which also constitutes the dominant market for 3D applications. Therefore, some experiments are presented on medical
images and volumes in order evaluate the benefits of the proposed
approaches in terms of diagnostic quality improvement with respect to
a conventional ROI coding usage.
The problem of content characterization of sports videos is of
great interest because sports video appeals to large audiences and
its efficient distribution over various networks should contribute
to widespread usage of multimedia services. In this paper we
analyze several techniques proposed in literature for content
characterization of sports videos. We focus this analysis on the
typology of the signal (audio, video, text captions, ...) from
which the low-level features are extracted. First we consider the
techniques based on visual information, then the methods based on
audio information, and finally the algorithms based on
audio-visual cues, used in a multi-modal fashion. This analysis
shows that each type of signal carries some peculiar information,
and the multi-modal approach can fully exploit the multimedia
information associated to the sports video. Moreover, we observe
that the characterization is performed either considering what
happens in a specific time segment, observing therefore the
features in a "static" way, or trying to capture their "dynamic"
evolution in time. The effectiveness of each approach depends
mainly on the kind of sports it relates to, and the type of
highlights we are focusing on.
Interactivity is a main requirement for 3D visualization of
medical images in a variety of clinical applications. The good
matching between segmentation and rendering techniques allows
to design easy to use interactive systems which assist the physicians
in dynamically creating and manipulating 'diagnostically relevant'
images from volumetric data sets. In this work we consider the
above problem within an original interactive visualization
paradigm. By this paradigm we want to highlight the twofold
clinical requirement of a) detecting and visualizing structures of
diagnostic interest (SoDI's) and b) adding to the 3D scene some other
structures to create a meaningful visual context. Being the
opacity modulation of the different structures a crucial point,
we propose an opacity management which reflects the paradigm
ideas and operates by means of a twofold indexed look-up table (2iLUT). The 2iLUT consists of a combination of attribute based and object based opacity management and is here designed and tested in order to combine the time interaction benefits of an indexed opacity setting with the effective handling of the above classification and visualization clinical requirements.
Current wavelet-based image coders obtain high performance thanks to the identification and the exploitation of the statistical properties of natural images in the transformed domain. Zerotree-based algorithms, as Embedded Zerotree Wavelets (EZW) and Set Partitioning In Hierarchical Trees (SPIHT), offer high Rate-Distortion (RD) coding performance and low computational complexity by exploiting statistical dependencies among insignificant coefficients on hierarchical subband structures. Another possible approach tries to predict the clusters of significant coefficients by means of some form of morphological dilation. An example of a morphology-based coder is the Significance-Linked Connected Component Analysis (SLCCA) that has shown performance which are comparable to the zerotree-based coders but is not embedded. A new embedded bit-plane coder is proposed here based on morphological dilation of significant coefficients and context based arithmetic coding. The algorithm is able to exploit both intra-band and inter-band statistical dependencies among wavelet significant coefficients. Moreover, the same approach is used both for two and three-dimensional wavelet-based image compression. Finally we the algorithms are tested on some 2D images and on a medical volume, by comparing the RD results to those obtained with the state-of-the-art wavelet-based coders.
The segmentation of video sequences into regions underlying a coherent motion is one of the most important processing in video analysis and coding. In this paper, we propose a reliability measure that indicates to what extent an affine motion model represents the motion of an image region. This reliability measure is then proposed as a criterion to coherently merge moving image regions in a Minimum Description Length (MDL) framework. To overcome the region-based motion estimation and segmentation chicken and egg problem, the motion field estimation and the segmentation task are treated separately. After a global motion compensation, a local motion field estimation is carried out starting from a translational motion model. Concurrently, a Markov Random Field model based algorithm provides for an initial static image partition. The motion estimation and segmentation problem is then formulated in the view of the MDL principle. A merging stage based on a directed weighted graph gives the final spatio-temporal segmentation. The simulation results show the effectiveness of the proposed algorithm.
This work addresses the delicate problem of lossy compression of medical images. More specifically, a selective allocation of coding resources is introduced based on the concept of 'diagnostic interest' and an interactive methodology based on a new measure of 'diagnostic quality'. The selective allocation of resources is made possible by a selection a priori of regions of specific interest for diagnostic purpose. The idea is to change the precision of representation in a transformed domain of region of particular interest, through a weighting procedure by an on- line user-defined quantization matrix. The overall compression method is multi-resolution, provides for an embedded generation of the bit-stream and guarantees for a good rate-distortion trade-off, at various bit-rates, with spatially varying reconstruction quality. This work also analyzes the delicate issue of a professional usage of lossy compression in a PACS environment. The proposed compression methodology gives interesting insights in favor of using lossy compression in a controlled fashion by the expert radiologist. Most of the ideas presented in this work have been confirmed by extensive experimental simulations involving medical expertise.
The optical flow (OF) can be used to perform motion-based segmentation or 3D reconstruction. Many techniques have been developed to estimate the OF. Some approaches are based on global assumptions; others deal with local information. ALthough OF has been studied for more than one decade, reducing the estimation error is still a difficult problem. Generally, algorithms to determine the OF are based on an equation, which links the gradient components of the luminance signal, so as to impose its invariance over time. Therefore, to determine the OF, it is usually necessary to calculate the gradient components in space and time. A new way to approximate this gradient information from a spatio- temporal wavelet decomposition is proposed here. In other words, assuming that the luminance information of the video sequences be represented in a multiresolution structure for compression or transmission purposes, we propose to estimate the luminance gradient components directly from the coefficients of the wavelet transform. Using a multiresolution formalism, we provide a way to estimate the motion field at different resolution levels. OF estimates obtained at low resolution can be projected at higher resolution levels so as to improve the robustness of the estimation to noise and to better locate the flow discontinuities, while remaining computationally efficient. Results are shown for both synthetic and real-world sequences, comparing it with a non multiresolution approach.
Indexing and retrieval of image sequences are fundamental steps in video editing and film analysis. Correlation-based matching methods are known to be very expensive when used with large amounts of data. As the size of sequence database grows, traditional retrieval methods fail. Exhaustive search quickly breaks down as an efficient strategy for sequence databases. Moreover, traditional indexing with labels has a lot of drawbacks since it requires a human intervention. New advanced correlation filters are being proposed so as to decrease the computational load of the task. A new method for retrieval of images sequences in large database based on a spatio-temporal wavelet decomposition is proposed here. It will be shown how the use of the multiresolution approach can lead to good results in terms of computationally efficiency and robustness to noise. We will assume that the query sequence may not be contained in the database for different reasons: the presence of a noise signal on the query, or different digitation process, or the query is only similar to sequences in the database. As a consequence we are providing have developed a new efficient retrieval strategy that analyses the database in order to extract the most similar sequences to a given query. The wavelet transform has been chose as the framework to implement the multiresolution formalism, because of its good compression capabilities, especially for embedded schemes. And the good features it provides for signal analysis. This paper describes the principles of a multiresolution sequence matching strategy and outlines its performance through a series of experimental simulations.
In this abstract, we present a novel technique to encode video sequences, that performs a region-based motion compensation of each frame to be encoded so as to generate a predicted frame. The set of regions to be motion compensated for a given frame has been obtained through a quadtree segmentation of the motion field estimated between a single reference frame (representing a typical projection of the scene) and the frame to be encoded. This way, no DPCM loop in the temporal domain is introduced, avoiding the feedback of the quantization errors. Under the assumption that the projection of the scene on the image plane remains nearly constant, only slight deformations of the reference frame occur from one frame to the next, so that very limited information needs to be coded: (1) the segmentation shape; (2) the motion information. Temporal correlation is used to predict both types of information so as to further reduce any left redundancy. As the segmentation may not be perfect, spatial correlation may still exist between neighboring regions. This is used in the strategy designed to encode the motion information. The motion and segmentation information are estimated on the basis of a two stage process using the frame to be encoded and the reference frame: (1) a hierarchical top-down decomposition, followed by (2) a bottom-up merging strategy. This procedure can be nicely embedded in a quadtree representation, which ensures a computationally efficient but rather robust segmentation strategy. We show how the proposed method can be used to encode QCIF video sequences with a reasonable quality at a 10 frame/s rate using roughly 20 kbit/s. Different schemes for prediction are compared pointing the advantage of the single reference frame for both prediction and compensation.
An image coding technique based on symmetry extraction and Binary Space Partitioning (BSP) tree representation for still pictures is presented. Axes of symmetry, detected through a principal axis of inertia approach and a coefficient of symmetry measure, are used to divide recursively an input image into a finite number of convex regions. This recursive partitioning results in the BSP tree representation of the image data. The iterative partition occurs whenever the current left/right node of the tree cannot be represented `symmetrically' by its counterpart, i.e., the right/left node. This splitting process may also end whenever the region associated with a given node has homogeneous characteristics or its size falls below a certain threshold. Given a BSP tree partition for a given input image, and the `seed' leaf nodes (i.e., those that cannot be generated by mirroring their counterparts), the remaining leaf nodes of the tree are reconstructed using a predictive scheme with respect to the `seed' leaf nodes.
A novel coding technique which proposes the use of symmetry to reduce redundancy in images is presented. Axes of symmetry are extracted using the Principal Axes of Inertia theory and the technique is extended to non-symmetric images by the introduction of a Coefficient of Symmetry. One part of the images is then linearly predicted with respect to the chosen axis. The method is implemented in a block-based fashion in order to adapt to local symmetries on the image data. An image representation and a coding strategy is illustrated, and results are presented on real static images.
This paper presents a method for the adaptive quantization of the motion field obtained from multiple reference frames. The motion estimation is obtained through a block-matching technique, with the assumptions of pure translational motion and uniform motion within a block. The regular assumption of constant intensity along the motion trajectory in the spatio-temporal path is relaxed. On one hand, this allows for illumination changes between the reference frames and the frame to be motion-compensated. On the other hand, it allows for additional freedom to reduce the prediction error after motion-compensation when the translational and rigid body motion assumptions are violated. The matching function represents the difference for each block of the luminance signal in the frame to be encoded with respect to a linear combination of 2 displaced blocks of same size in the reference frames. The adaptive quantization mechanism is based on evaluating, on a block basis, the local sensitivity of the displaced frame difference signal to a quantization of the motion field parameters. It is shown how such sensitivity depends only on the reference frame signals, which allows to keep it below a desired threshold without additional information to be sent to a receiver. Simulations are carried out on standard CIF (240x360) source material provided to the ISO/MPEG. Results are discussed to show the improvement with respect to the strategy suggested in the draft recommendation of the ISO MPEG for interactive video at 1.5 Mbps.
In this work, we present a Least-Square-Error (LSE), recursive method for generating piecewise -con stant approximations of images. The method is developed using an optimization approach to minimize a cost function. The cost function, proposed here, is based on segmenting the image, recursively, using Binary Space Partitionings (BSPs) of the image domain. We derive a LSE necessary condition for the optimum piece wise-constant approximation, and use this condition to develop an algorithm for generating the LSE, BSP-based approximation. The proposed algorithm provides a significant reduction in the computational expense when compared with a brute force method. As shown in the paper, the LSE algorithm generates efficient segmentations of simple as well as complex images. This shows the potential of the LSE approximation approach for image coding applications. Moreover, the BSP-based segmentation provides a very simple (yet flexible) description of the regions resulting from the partitioning. This makes the proposed approximation method useful for performing image affine transformations (e.g., rotation and scaling) which are common in computer graphics applications.
KEYWORDS: Quantization, Image processing, Visual communications, Computer programming, Distortion, Parallel processing, Signal processing, Televisions, Digital signal processing, Standards development
High Definition Television (HDTV) promises to offer wide-screen, much better quality pictures as compared to the today’s television. However, without compression a digital HDTV channel may cost up to one Gbits/sec transmission bandwidth. We suggest a parallel processing structure using the proposed international standard for visual telephony (CCITT Px64 kbs standard) as processing elements, to compress the digital HDTV pictures. The basic idea is to partition an HDTV picture into smaller sub-pictures and then compress each sub-picture using a CCITT Px64kbs coder, which is cost-effective, by today’s technology, only on small size pictures.
Since each sub-picture is processed by an independent coder, without coordination these coded sub-pictures may have unequal picture quality. To maintain a uniform quality HDTV picture, the following two issues are studied: (l) sub-channel control strategy (bits allocated to each sub-picture), and (2) quantization and buffer control strategy for individual sub-picture coder. Algorithms to resolve the above problems and their computer simulations are presented.
Representation of two and three-dimensional objects by tree structures has been used extensively in solid modeling, computer graphics, computer vision and image processing. (See for example [Mantyla] [Chen] [Hunter] [Rosenfeld] [Leonardi].) Quadtrees, which are used to represent objects in 2-D space, and octrees, which are the extension of quadtrees in 3-D space, have been studied thoroughly for applications in graphics and image processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.