In this paper, we propose a low complexity prioritized bit-plane coding scheme to improve the rate-distortion performance of cyclical block coding in MPEG-21 scalable video coding. Specifically, we use a block priority assignment algorithm to firstly transmit the symbols and the blocks with potentially better rate-distortion performance. Different blocks are allowed to be coded unequally in a coding cycle. To avoid transmitting priority overhead, the encoder and the decoder refer to the same context to assign priority. Furthermore, to reduce the complexity, the priority assignment is done by a look-up-table and the coding of each block is controlled by a simple threshold comparison mechanism. Experimental results show that our prioritized bit-plane coding scheme can offer up to 0.5dB PSNR improvement over the cyclical block coding described in the joint scalable verification model (JSVM).
Content-based image search has long been considered a difficult task.
Making correct conjectures on the user intention (perception) based
on the query images is a critical step in the content-based search.
One key concept in this paper is how we find the user preferred image
characteristics from the multiple positive samples provided by the
user. The second key concept is that when the user does not provide
a sufficient number of samples, how we generate a set of consistent
"pseudo images". The notion of image feature stability is thus
introduced. The third key concept is how we use negative images as
pruning criterion. In realizing the preceding concepts, an image search scheme is developed using the weighted low-level image features. At the end, quantitative simulation results are used to show the effectiveness of these concepts.
An image watermark parameter optimization procedure is proposed for selecting the most effective DCT coefficients for watermark embedding. Using this set of coefficients improves the watermark robustness and reliability against attack while it maintains the transparency of the embedded watermark. With the aid of prior knowledge of attacks, the visual masking effect and the attack distortion on each (DCT) transform coefficient are pre-calculated so that a maximum strength watermark within visual threshold can be inserted. There are two stages in the design phase. First, taking into account the combined effect of watermark embedding and attack, we pick up the robust coefficients that resist a specific type of attacks and in the meanwhile we keep the distortion lower than the visual threshold. Although typically the watermark detection reliability increases with the increasing number of embedded coefficients, the less effective coefficients may degrade the overall detection performance. Thus, in the second stage, some initially selected coefficients are discarded by an iterative process to reduce the overall error detection probability. Since digital images are often compressed for efficient storage and transmission, we adopt JPEG compression as the attacking source. The simulation results show that the detection error probability is significantly reduced when the selected robust coefficients are in use. These coefficients with watermark embedded on them can also survive color reduction, Gaussian filtering, and frequency mode Laplacian removal (FMLR) attacks.
We design and implement a research friendly software platform, which aims at the flexibility and the abstraction of MPEG-7 application prototyping. We studied and analyzed the MPEG-7 standard, including a typical scenario of using MPEG-7. In order to fulfill to needs of researches, in addition to the normative parts of MPEG-7, additional requirements are included. By examining these requirements, we propose a research friendly software platform. The architecture consists of a framework, utility units, and the descriptors. Because this system is implemented using Java, it also incorporates the features of the Java environment, and thus it is flexible for developing new components and prototyping applications. We demonstrate the flexibility of this testbed by constructing an example program which allows users to manipulate image related descriptors.
KEYWORDS: Digital watermarking, Feature extraction, Digital imaging, Signal processing, Optical filters, Digital filtering, Linear filtering, Filtering (signal processing), Gaussian filters, Multimedia
A novel robust digital image watermarking scheme which combines image feature extraction and image normalization is proposed. The goal is to resist both geometrical and signal processing attacks. We adopt a feature extraction method called Mexican Hat wavelet scale interaction. The extracted feature points can survive various attacks such as common signal processing, JPEG compression, and geometric distortions. Thus, these feature points can be used as reference points for both watermark embedding and detection. The normalized image of a rotated image (object) is the same as the normalized version of the original image. As a result, the watermark detection task can be much simplified when it is done on the normalized image without referencing to the original image. However, because image normalization is sensitive to image local variation, we apply image normalization to non-overlapped image disks separately. The center of each disk is an extracted feature point. Several copies of one 16-bit watermark sequence are embedded in the original image to improve the robustness of watermarks. Simulation results show that our scheme can survive low quality JPEG compression, color reduction, sharpening, Gaussian filtering, median filtering, printing and scanning process, row or column removal, shearing, rotation, scaling, local warping, cropping, and linear transformation.
To deal with the correspondence problem in stereo imaging, a new approach is presented to find the disparity information on a newly defined dissimilarity map (DSMP). Base don an image formation model of stereo images and some statistical observations, two constraints and four assumptions are adopted. In addition, a few heuristic criteria are developed to define a unique solution. All these constraints, assumptions and criteria are applied to the DSMP to find the correspondence. At first, the Epipolar Constraint, the Valid Pairing Constraint and the Lambertian Surface Assumption are applied to DSMP to locate the Low Dissimilarity Zones (LDZs). Then, the Opaque Assumption and the Minimum Occlusion Assumption are applied to LDZs to obtain the admissible LDZ sets. Finally, the Depth Smoothness Assumption and some other criteria are applied to the admissible LDZ sets to produce the final answer. The focus of this paper is to find the constraints and assumptions in the stereo correspondence problem and then properly convert these constraints and assumptions into executable procedures on the DSMP. In addition to its ability in estimating occlusion accurately, this approach works well even when the commonly used monotonic ordering assumption is violated. The simulation results show that occlusions can be properly handled and the disparity map can be calculated with a fairly high degree of accuracy.
KEYWORDS: Video, Multiplexers, Asynchronous transfer mode, Stars, Acquisition tracking and pointing, Computer programming, Smoothing, Statistical multiplexing, Video processing, Video coding
The goal of this paper is to provide a feasible and flexible mechanism for variable bit rate (VBR) video transmission and to achieve high network utilization with statistical Quality of Service (QoS). In this paper, we employ a piece-wise constant rate smoothing algorithm to smooth the video coder outputs and propose a simple algorithm to determine the renegotiation schedule for the smoothed streams. In order to transmit video streams with renegotiation-based VBR service, we suggest a connection admission control (CAC) based on Chernoff bound using a simple yet quite accurate 'binomial' traffic model. The experimental results show that our proposed method provides an easy and robust mechanism to support real- time video transmission in both homogeneous and heterogeneous connection environments.
In this paper, we propose a robust quantizer design for image coding. Because the bits representing the reconstruction levels are transmitted directly to the channel, the proposed quantizer can be viewed as a compound of a quantizer, a VLC coder, and a channel coder. The conventional combined source/channel design produces a source coder designed for a channel with a specific channel noise. Our proposed quantizer is designed within a noise range. In comparison with the ordinary JPEG coder, simulation results show that our proposed scheme has a much more graceful distortion behavior within the designed noise range.
Our goal in this study is to construct a 3D model from a pair of (stereo) images and then project this 3D model to image planes at new locations and orientations. We first compute the disparity map from a pair of stereo images. Although the disparity map may contain defects, we can still calculate the depth map of the entire scene by filling in the missing (occluded) pixels using bilinear interpolation. One step further, we synthesize new stereo images at different camera locations using the 3D information obtained from the given stereo pair. The disparity map used to generate depth information is one key step in constructing 3D scenes. Therefore, in this paper we investigate various types of occlusion to help analyzing the disparity map errors and methods that can provide consistent disparity estimates. The edge-directed Modified Dynamic Programming scheme with Adaptive Window, which significant improves the disparity map estimates, is thus proposed. Our preliminary simulations show quite promising results.
Object scalability is a new trend in video coding research. This paper presents an attempt of designing an object- oriented coding based on the notion of irregular mesh and image warping. Three techniques are employed: motion compensation based on image warping, image segmentation and object tracking based on (mesh) nodal point adjustment, and nonrectangular DCT coding performed on the irregular mesh. The preliminary simulation results of the PSNR values and the subjective coded image quality indicate that this coding scheme is suitable for object scalable coding at low bit rates.
The purpose of this project is to develop a simplified DAVIC server on the Sun workstations under Unix and X window environment. DAVIC 1.0 is a comprehensive set of standards that define various types of end-to-end multi-media communication systems. More precisely, we implement only the high level server-client protocols and server service elements specified in DAVIC. This system can provide browsing service, download a file or a portion of it, and play back an MPEG sequence with VCR-like control. Limited by time, manpower and tools, not all the DAVIC specified elements are fully implemented. However, an implementation of a simple video server based on the DAVIC concept has been completed and demonstrated.
This paper presents a novel adaptive interpolation method for digital images. This new method can reduce dramatically the blurring and jaggedness artifacts on the high-contrast edges, which are generally found in the interpolated images using conventional methods. This high performance is achieved via two proposed operators: a fuzzy-inference based edge preserving interpolator and an edge-shifted matching scheme. The former synthesizes the interpolated pixel to match the image local characteristics. Hence, edge integrity can be retained. However, due to its small footage, it does not work well on the sharply curved edges that have very sharp angles against one of the coordinates. Therefore, the edge-shifted matching technique is developed to identify precisely the orientation of sharply curved edges. By combining these two techniques, the subjective quality of the interpolated images is significantly improved, particularly along the high-contrast edges. Both synthesized images (such as letters) and natural scenes have been tested with very promising results.
This paper presents an evaluation of several block-matching motion estimation algorithms from a system-level VLSI design viewpoint. Because a straightforward block-matching algorithm (BMA) demands a very large amount of computing power, many fast algorithms have been developed. However, these fast algorithms are often designed to merely reduce arithmetic operations without considering their overall performance in VLSI implementation. In this paper, three criteria are used to compare various block-matching algorithms: (1) silicon area, (2) input/output requirement, and (3) image quality. Several well-known motion estimation algorithms are analyzed under the above criteria. The advantages/disadvantages of these algorithms are discussed. Although our analysis is limited by the preciseness of our silicon area estimation model, it should provide valuable information in selecting a BMA for VLSI implementation.
Motion-compensated estimation is an effective means in reducing the interframe correlation for image sequence coding. Therefore, it is adopted by international video coding standards, CCITT H.261 and ISO MPEG-1 and MPEG-2. This paper provides a comprehensive survey of the motion estimation techniques that are pertinent to video coding standards.
Three popular groups of motion estimation methods are presented: i) block matching methods, ii) differential (gradient) methods, and iii) Fourier methods. However, not all of them are suitable for the block-based motion compensation structure specified by the aforementioned standards. Our focus in this paper is to review those techniques that would fit into the standards. In addition to the basic operations of these techniques, issues discussed are their extensions, their performance limit, their relationships with each other, and the other advantages or disadvantages of these methods.
A new technique called motion restoration method for estimating the global motion due to zoom and pan of the camera is proposed. It is composed of three steps: (1) block-matching motion estimation, (2) object assignment, and (3) global motion restoration. In this method, each image is first divided into a number of blocks. Step (1) may employ any suitable block- matching motion estimation algorithm to produce a set of motion vectors which capture the compound effect of zoom, pan, and object movement. Step (2) groups the blocks which share common global motion characteristics into one object. Step (3) then extracts the global motion parameters (zoom and pan) corresponding to each object from the compound motion vectors of its constituent blocks. The extraction of global motion parameters is accomplished via singular value decomposition. Experimental results show that this new technique is efficient in reducing the entropy of the block motion vectors for both zooming and panning motions and may also be used for image segmentation.
The asynchronous-transfer-mode (ATM) transmission has been adopted by most computer networks and the broad-band integrated-services digital network. Many researches have been conducted to investigate the transmission of video services over ATM networks. The previous studies often concentrate on designing video coders or on designing network regulating policies that reduce packet loss effect. But our ultimate goal should be reducing the overall distortion on the reconstructed images at the receiver and this distortion contains two components: (1) source coding error due to compression and (2) channel error due to network packet loss. In general, a high output rate at a source encoder leads to a smaller compression error; however, this high bit rate may also increase lost packets and thus increase the channel error. In this paper, a popular two-layer coding structure is considered and the optimal quantizer step size for the enhancement layer has been studied under the consideration of the source-plus-channel distortion. Through both theoretical analysis and image simulation, we indeed find an optimal operating point that achieves the lowest total mean squared error at the receiver.
Data compression algorithms are developed to transmit massive image data under limited channel capacity. When a channel rate is not sufficient to transmit good quality compressed images, a degraded image after compression is reconstructed at the decoder. In this situation, a postprocessor can be used to improve the receiver image quality. Ideally, the objective of postprocessing is to restore the original pictures from the received distorted pictures. However, when the received pictures are heavily distorted, there may not exist enough information to restore the original images. Then, what a postprocessor can do is to reduce the subjective artifact rather than to minimize the differences between the received and the original images. In this paper, we propose two post processing techniques, namely, error pattern compensation and inter-block transform coefficient adjustment. Since Discrete Cosine Transform (DCT) coding is widely adopted by the international image transmission standards, our postprocessing schemes are proposed in the DCT domain. When the above schemes are applied to highly distorted images, quite noticeable subjective improvement can be observed.
Motion estimation techniques are widely used in today's video coding systems. The most often-used techniques are block (template) matching method and differential method (pel recursive). In this paper, we would like to study this topic from a viewpoint different from the above methods to explore the fundamental limits and tradeoffs in image motion estimation. The underlying principles behind the two conflict requirements, accuracy and inambiguity, become clear when they are analyzed using this tool--frequency component analysis. This analysis may lead us to invent new motion estimation algorithms and suggest ways to improve the existing algorithms.
KEYWORDS: Quantization, Image processing, Visual communications, Computer programming, Distortion, Parallel processing, Signal processing, Televisions, Digital signal processing, Standards development
High Definition Television (HDTV) promises to offer wide-screen, much better quality pictures as compared to the today’s television. However, without compression a digital HDTV channel may cost up to one Gbits/sec transmission bandwidth. We suggest a parallel processing structure using the proposed international standard for visual telephony (CCITT Px64 kbs standard) as processing elements, to compress the digital HDTV pictures. The basic idea is to partition an HDTV picture into smaller sub-pictures and then compress each sub-picture using a CCITT Px64kbs coder, which is cost-effective, by today’s technology, only on small size pictures.
Since each sub-picture is processed by an independent coder, without coordination these coded sub-pictures may have unequal picture quality. To maintain a uniform quality HDTV picture, the following two issues are studied: (l) sub-channel control strategy (bits allocated to each sub-picture), and (2) quantization and buffer control strategy for individual sub-picture coder. Algorithms to resolve the above problems and their computer simulations are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.