Wyner-Ziv video coding is committed to the compression of video with low computing resources. Since the encoding
may stop at any time for mobile devices with limited computing resources and bandwidth, it is also desirable to have the
scalable Wyner-Ziv video coding. The bit-plane coding is an inherent solution of scalable video coding. However, the
conventional bit-plane representation in hybrid video coding does not work well in the scenario of Wyner-Ziv coding.
Since the bit-plane representation is closely related to quantization, we propose a new bit-plane representation with
optimal quantization at any bit-plane in terms of Wyner-Ziv coding. In particular, for the DCT-domain Wyner-Ziv video
coding, the distribution of DCT coefficients and the conditional distribution given side information can be modeled with
symmetric Laplacian functions. Accordingly, a simplified adaptive bit-plane representation is proposed without pre-knowing
the Laplacian distribution parameters. A DCT-domain scalable Wyner-Ziv video coding scheme is then
developed, in which the encoding can stop at any bit-plane and the bit-stream can also be flexibly truncated. The testing
has shown that there is no performance penalty due to the unpredicted bit-plane truncation.
In existing video coding schemes with spatial scalability based on pyramid frame representation, such as the ongoing H.264/MPEG-4 SVC (scalable video coding) standard, video frame at a high resolution is mainly predicted either from the lower-resolution image of the same frame or from the temporal neighboring frames at the same resolution. Most of these prediction techniques fail to exploit the two correlations simultaneously and efficiently. This paper extends the in-scale prediction technique developed for wavelet video coding to a generalized in-scale motion compensation framework for H.264/MPEG-4 SVC. In this framework, for a video frame at a high resolution layer, the lowpass content is predicted from the information already coded in lower resolution layer, but the highpass content is predicted by exploiting the neighboring frames at current resolution. In this way, both the cross-resolution correlation and temporal correlation are exploited simultaneously, which leads to much higher efficiency in prediction. Preliminary experimental results demonstrate that the proposed framework improves the spatial scalability performance of current H.264/MPEG-4 SVC. The improvement is significant especially for high-fidelity video coding. In addition, another advantage over wavelet-based in-scale scheme is achieved that the proposed framework can support arbitrary down-sampling and up-sampling filters.
In this paper, we present a new image compression scheme, which is specially designed for computer generated compound color images. First we classify the image content into two kinds: text/graphic content and picture content. Then two different compression schemes are applied blocks of different types. We propose a two stage segmentation scheme which combines thresholding block features and rate-distortion optimization. The text/graphics blocks compression scheme consists of two parts: color quantization and lossless coding of quantized images. The input images will first be color quantized and converted to codebooks and labels, introducing constraint distortion to the color quantization images. Then generated labels and codebooks are lossless compressed respectively. We proposed a rate-distortion optimized color quantization algorithm for text/graphic content, which introduces distortion to text content and minimizes the bit rate produced by the following lossless entropy compression algorithm. The picture content is compressed using conventional image algorithms like JPEG. The results show that the proposed scheme achieves better coding performance than other images compression algorithms such as JPEG2000 and DjVu.
In this paper, we propose a joint power-distortion optimization scheme for real-time H.264 video encoding under the
power constraint. Firstly, the power constraint is translated to the complexity constraint based on DVS technology.
Secondly, a computation allocation model (CAM) with virtual buffers is proposed to facilitate the optimal allocation of
constrained computational resource for each frame. Thirdly, the complexity adjustable encoder based on optimal motion
estimation and mode decision is proposed to meet the allocated resource. The proposed scheme takes the advantage of
some new features of H.264/AVC video coding tools such as early termination strategy in fast ME. Moreover, it can
avoid suffering from the high overhead of the parametric power control algorithms and achieve fine complexity
scalability in a wide range with stable rate-distortion performance. The proposed scheme also shows the potential of a
further reduction of computation and power consumption in the decoding without any change on the existing decoders.
There are mainly two key points which can affect the efficiency of multi-view video capture and transmission system largely: communication between cameras and computing complexity of encoder. In this paper, we propose a practical framework of distributed multi-view video coding, in which inter-camera communication is avoided and the large computing complexity is moved from encoder to decoder. In this scheme, multi-camera video sources are encoded separately and decoded dependently, and the traditional inter frame is replaced by Wyner-Ziv frame. To reach this goal, Wyner-Ziv theory on source coding with side information is employed as the basic coding principle. A Wyner-Ziv coding method based on wavelet transform and turbo codes is used as the core of the scheme. To further improve the coding performance, we also consider exploiting the large redundancy between adjacent views. A more flexible prediction method that can jointly use temporal and view correlations is proposed to generate the side information at the decoder. The experimental results show that the coding performance of proposed DMVC scheme is very promising compared to the traditional intra coding.
The free viewpoint switching is one of the most important features of multi-view video streaming. The key problem lies in how to achieve the best performance when the camera processing capability and the network bandwidth are limited. In this paper, we propose a novel free viewpoint switching scheme for multi-view video scenario, in which the distributed video coding technique is employed. In this scheme, the multi-camera video sources are encoded separately with the traditional hybrid video coding scheme, and meanwhile an alternative bitstream is produced for every frame based on the Wyner-Ziv coding method for the purpose of error correction when the viewpoint switching occurs. When switching happens, the Wyner-Ziv bits corresponding to the actual reference frame at the switching point is transmitted and used to recover the true reference. Instead of completely removing the mismatch, the proposed switching scheme tries to reduce the mismatch to an acceptable level so as to save the bits for the switching frame. A wavelet transform domain Wyner-Ziv coding method is proposed to produce the Wyner-Ziv bits for the switching frame. Conclusively, with the proposed scheme, the inter-camera communication can be avoided and the drifting error can be controlled efficiently when the viewpoint switching occurs.
In this paper, we present a learning-based approach to mining the capture intention of camcorder users, aiming at providing a novel viewpoint in terms of home video content analysis. In contrast to existing approaches to video analysis designed from the viewers' standpoint, this approach models the capture intention from a camcorder user's point of view, by investigating a set of effective intention oriented features. With this approach, not only the capture intention is effectively mined, but also a set of intention probability curves are produced for efficient browsing of home video content. The experimental evaluations indicate that the intention based approach is an effective complement to existing home video content analysis schemes.
The shift-variant property of discrete wavelet transform leads to the intrinsic coupling across various spatial subbands during motion aligned temporal filtering. It brings a big challenge how to achieve a better trade-off between subband independency and motion alignment efficiency in the context of providing spatial scalability in 3D wavelet video coding. This paper first investigates the issue of subband coupling in-depth. From the investigations on analysis and synthesis filters, we verify the existence and causes of subband coupling phenomenon and illustrate the subband leakage due to motion shift. Furthermore, we propose a method to measure the strength of this kind of coupling. Based on these investigations, we focus on schemes which preserving most subband coupling relationship and recommend that spatial highpass subbands should not be dropped completely. The issue of rate allocation for spatial highpass subbands is considered in this paper. An error propagation model is proposed to describe the effect of subband coupling in video reconstruction. The synthesis gain of each subband is estimated according to this model and it is finally adopted to guide the rate allocation algorithm. Experimental results have fully demonstrated the promising of the proposed techniques in improving both objective and subjective qualities of low resolution video, especially at middle and high bit rates, for 3D video coding schemes with spatial scalability.
This paper makes a comparative study on the various spatial scalable coding frameworks. The frameworks with multiple image-domain motion aligned temporal filtering at various spatial resolutions, named as multi-T+2D, are mainly investigated. First we investigate a multi-T+2D scheme based on redundant frame representation. The cross spatial layer redundancy and prediction methods are discussed. The redundancy brings significant performance loss for schemes providing wide range SNR spatial scalability. To remove the redundancy produced in the multi-resolution temporal filtering while retaining the advantage of spatial-domain motion compensation, a novel non-redundant multi-T+2D scheme is proposed. Performance comparison is given among the discussed frameworks and it shows that the proposed non-redundant multi-T+2D framework has a good performance for fully scalable video coding. We also verify that the redundant multi-T+2D framework with cross spatial layer reconstruction feedback is practical in providing narrow range SNR scalability for each spatial layer.
With the intensifying academic interest in video streaming over peer-to-peer(P2P) network, more and more streaming protocols have been proposed to address different problems, such as QoS, load balancing, transmission reliability, bandwidth effciency, etc. However, to the best of our knowledge, an important component of any practical P2P streaming system, the streaming service discovery, which is to discover potential peers from which a newcomer could receive the requested stream, is rarely granted particular consideration. In this paper, we are inspired from a special data structure, namely Segment Tree(ST), and propose a protocol to address this problem. Furthermore, we fully decouple the control plane and data plane in video streaming, and hence provide more flexibilities in designing protocols on both of them.
Scalable coding is a technology that encodes a multimedia signal in a scalable manner where various representations can be extracted from a single codestream to fit a wide range of applications. Many new scalable coders such as JPEG 2000 and MPEG-4 FGS offer fine granularity scalability to provide near continuous optimal tradeoff between quality and rates in a large range. This fine granularity scalability poses great new challenges to the design of encryption and authentication systems for scalable media in Digital Rights Management (DRM) and other applications. It may be desirable or even mandatory to maintain a certain level of scalability in the encrypted or signed codestream so that no decryption or re-signing is needed when legitimate adaptations are applied. In other words, the encryption and authentication should be scalable, i.e., adaptation friendly. Otherwise secrets have to be shared with every intermediate stage along the content delivery system which performs adaptation manipulations. Sharing secrets with many parties would jeopardize the overall security of a system since the security depends on the weakest component of the system. In this paper, we first describe general requirements and desirable features for an encryption or authentication system for scalable media, esp. those not encountered with the non-scalable case. Then we present an overview of the current state of the art of technologies in scalable encryption and authentication. These technologies include full and selective encryption schemes that maintain the original or coarser granularity of scalability offered by an unencrypted scalable codestream, layered access control and block level authentication that reduce the fine granularity of scalability to a block level, among others. Finally, we summarize existing challenges and propose future research directions.
The image authentication system SARI proposed by Lin and Chang passes JPEG compression and rejects other malicious manipulations. Some vulnerabilities of the system have been reported recently. In this paper, we propose two new attacks that can compromise the SARI system. The first attack is called a histogram attack which modifies DCT coefficients yet maintains the same relationship between any two DCT coefficients and the same mean values of DCT coefficients. Such a modified image can pass the SARI authentication system. The second attack is an oracle attack which uses an oracle to efficiently find the secret pairs used by SARI in its signature generation. A single image plus an oracle is needed to launch the oracle attack. Fixes to thwart the proposed attacks are also proposed in this paper.
In this paper we introduce a new Inter Macroblock type within the H.264 (or MPEG-4 AVC) video coding standard that can further improve coding efficiency by exploiting the temporal correlation of motion within a sequence. This leads to a reduction in the bits required for encoding motion information, while retaining or even improving quality under a Rate Distortion Optimization Framework. An extension of this concept within the skip macroblock type of the same standard is also presented. Simulation results show that the proposed semantic changes can lead to up to 7.6% average bitrate reduction or equivalently 0.39dB quality improvement over the current H.264 standard.
This paper proposes an adaptive block-size motion alignment technique in 3D wavelet coding to further exploit temporal correlations across pictures. Similar to B picture in traditional video coding, each macroblock can motion align from forward and/or backward for temporal wavelet de-composition. In each direction, a macroblock may select its partition from one of seven modes - 16x16, 8x16, 16x8, 8x8, 8x4, 4x8 and 4x4 - to allow accurate motion alignment. Furthermore, the rate-distortion optimization criterions are proposed to select motion mode, motion vectors and partition mode. Although the proposed technique greatly improves the accuracy of motion alignment, it does not directly bring the coding efficiency gain because of smaller block size and more block boundaries. Therefore, an overlapped block motion alignment is further proposed to cope with block boundaries and to suppress spatial high-frequency components. The experimental results show the proposed adaptive block-size motion alignment with the overlapped block motion alignment can achieve up to 1.0 dB gain in 3D wavelet video coding. Our 3D wavelet coder outperforms the MC-EZBC for most sequences by 1~2dB and we are doing up to 1.5 dB better than H.264.
In this paper, we introduce a new motion vector prediction method that could be used within multiple picture reference codecs, such as the H.264 (MPEG-4 AVC) video coding standard. Our method considers for each candidate motion vector the temporal distance of its corresponding reference picture compared to the current one for the generation of the predictor motion vector. This allows for more accurate motion vector prediction, and better exploitation of the temporal correlation that may exist within a video sequence. Furthermore, we also introduce a modification to the SKIP motion vector macroblock mode, according to which not only the motion vectors but also the reference indices are adaptively generated. Simulation results suggest that our proposed methods, combined with an improved Rate Distortion optimization strategy, if implemented within the existing H.264 codec, can allow for a considerable performance improvement of up to 8.6% bitrate reduction compared to the current H.264 standard.
This paper proposes an advanced motion-threading technique to improve the coding efficiency of the 3D wave-let coding. We extend the original motion-threading technique to the lifting wavelet structure. This extension solves the artificial motion thread truncation problem in long support temporal wavelet filtering, and enables the accuracy of motion alignment to be fractional-pixel with guaranteed perfect reconstruction. Furthermore, the mismatch problem in the motion-threading caused by occlusion or scene-change is considered. In general, the temporal wavelet decomposition consists of multiple layers. Unlike the original motion-threading scheme, in the proposed scheme each layer owns one set of motion vectors so as to achieve both high coding efficiency and temporal scalability. To reduce the motion cost, direct mode is used to exploit the motion vector correlation. An R-D optimized technique is introduced to estimate motion vectors and select proper prediction modes for each macroblock. The proposed advanced motion-threading scheme can outperform the original motion-threading scheme up to 1.5~5.0 dB. The experimental results also demonstrate that the 3D wavelet coding scheme can be competitive with the start-of-the-art JVT video standard on coding efficiency.
In this paper, we propose a bitstream switching scheme for progressive fine granularity scalable (PFGS) video coding that aims to improve the streaming performance of the scalable bitstreams. By switching among PFGS video bitstreams with different enhancement layer’s reference rates based on the same base layer, significant performance gain can be achieved, especially when the bandwidth fluctuates broadly. Furthermore, compared with other bitstream switching schemes, the proposed scheme shows its flexibilities and various advantages due to using the common base layer. The experimental results illustrate that the proposed scheme can significantly improve the streaming performance.
In this paper, a multi-metric evaluation protocol is proposed to evaluate performance of user-assisted video object extraction systems. Evaluation metrics are the essential element in performance evaluation methodology. Recent works on video object segmentation/extraction are mostly restricted to a single objective metric to judge the overall performance of algorithms. Motivated by a novel framework for performance evaluation on image segmentation using Pareto front, we propose a multi-metric evaluation protocol, including metrics for contour-based spatial matching, temporal consistency, user workload and time consumption. Taking the characteristic of a user-assisted video object extraction system into consideration, we formulate the metrics in a way simple but close to the assessment of human visual system. For spatial matching, we define three types of errors: sharp error, smooth error and mass error, which can precisely score an extraction result. The time consistency is introduced to evaluate the stability over time of a system. In addition, as far as a user-assisted system is concerned, the workload of users is also in our metric. Incorporating multi-metric into one 4-D fitness space, we adopt the Pareto front to find the best choice of a system with optimal parameters. The tests of our evaluation method show that the multi-metric protocol is effective.
A flexible and effective macroblock-based framework for hybrid spatial and fine-grain SNR scalable video coding is proposed in this paper. In the proposed framework, the base layer is of low resolution and is generally encoded at low bit rates with traditional prediction based coding schemes. Two enhancement layers, i.e., the low-resolution enhancement layer and the high-resolution enhancement layer, are generated to improve the video quality of the low-resolution base layer and evolve smoothly from low resolution to high resolution video with increasingly better quality, respectively. Since bit plane coding and drifting control techniques are applied to the two enhancement layers, each enhancement bitstream is fine-grain scalable and can be arbitrarily truncated to fit in the available channel bandwidth. In order to improve the coding efficiency and reduce the drifting errors at the high-resolution enhancement layer, five macroblock coding modes with different forms of motion compensation and reconstruction, are proposed in this paper. Furthermore, a mode decision algorithm is developed to select the appropriate coding mode for each macroblock at the high-resolution enhancement layer. Compared with the traditional spatial scalable coding scheme, the proposed framework not only provides the spatial scalability but also provides the fine granularity quality scalability at the same resolution.
3-D wavelet-based scalable video coding provides a viable alternative to standard MC-DCT coding. However, many current 3-D wavelet coders experience severe boundary effects across group of picture (GOP) boundaries. This paper proposes a memory efficient transform technique via lifting that effectively computes wavelet transforms of a video sequence continuously on the fly, thus eliminating the boundary effects due to limited length of individual GOPs. Coding results show that the proposed scheme completely eliminates the boundary effects and gives superb video playback quality.
In this paper, we present an object-based coding scheme using 3D shape-adaptive discrete wavelet transforms (SA- DWT). Rather than straightforward extension of 2D SA-DWT, a novel way to handle the temporal wavelet transform using a motion model is proposed to achieve higher coding efficiency. Corresponding to this transform scheme, we use a 3D entropy coding algorithm called Motion-based Embedded Subband Coding with Optimized Truncation (ESCOT) to code the wavelet coefficients. Results show that ESCOT can achieve comparable coding performance with the state-of-the-art MPEG-4 verification model 13.0 while having the scalability and flexibility of the bitstream in low bit-rate object- based video coding. And in relative higher bit-rate, our coding approach outperforms MPEG-4 VM 13.0 by about 2.5 dB.
This paper presents a shape adaptive discrete wavelet transform (SA-DWT) scheme for coding arbitrarily shaped texture. The proposed SA-DWT can be used for object-oriented image coding. The number of coefficients after SA-DWT is identical to the number of pels contained in the arbitrarily shaped image objects. The locality property of wavelet transform and self-similarity among subbands are well preserved throughout this process.For a rectangular region, the SA-DWT is identical to a standard wavelet transform. With SA-DWT, conventional wavelet based coding schemes can be readily extended to the coding of arbitrarily shaped objects. The proposed shape adaptive wavelet transform is not unitary but the small energy increase is restricted at the boundary of objects in subbands. Two approaches of using the SA-DWT algorithm for object-oriented image and video coding are presented. One is to combine scalar SA-DWT with embedded zerotree wavelet (EZW) coding technique, the other is an extension of the normal vector wavelet coding (VWC) technique to arbitrarily shaped objects. Results of applying SA-VWC to real arbitrarily shaped texture coding are also given at the end of this paper.
Vector quantization (VQ) always outperforms scalar quantization. Recently, vector transform coding (VTC) has been introduced to better take advantage of signal processing for vector quantization and shown to achieve better performance in image coding. How much performance advantage in terms of rate-distortion can a vector transform coding scheme gain over the other coding schemes? What is the optimal vector transform (VT) with complexity constraint on VQ? These are the questions we try to answer in this paper. Based on the results from high-resolution or asymptotic (in rate) quantization theory, we obtain a general rate- distortion formula for signal processing combined with vector quantization for first-order Gaussian-Markov source. We prove that VTC indeed has better performance than other existing coding schemes with the same or less complexity based on the rate-distortion measurement. A new mirror-sampling based vector transform which only involves additions and subtractions is proposed. For high rate case, we show that the new VTC scheme achieves the optimal performance under the complexity constraint. A 2D version of the new vector transform is applied to image coding, and the results show that the new vector transform consistently outperforms the subsampling-based vector transform.