Distributed video coding (DVC) is an emerging video coding paradigm for systems that require low-complexity encoders that are supported by high-complexity decoders as required, for example, in real-time video capture and streaming from one mobile phone to another. Under the assumption of an error-free transmission channel, the coding efficiency of current DVC systems is still below that of the latest video codecs, such as H.264/AVC. In order to increase the coding efficiency, we propose that every Wyner-Ziv frame be downsampled by a factor of two prior to encoding and the subsequent transmission. However, this would necessitate upsampling in conjunction with interpolation at the decoder. Simple interpolation (e.g., a bilinear or bicubic filter) would be insufficient because the high-frequency (HF) spatial image content would be missing. Instead, we propose the incorporation of a super-resolution (SR) technique based upon the example-based scene-specific method to allow this HF content to be recovered. The SR technique will add computational complexity to the decoder side of the DVC system, which is allowable within the DVC framework. Rate-distortion curves show that this novel combination of SR and DVC improves the system's peak signal-to-noise ratio (PSNR) performance by up to several decibels and can actually exceed the performance of the H.264/AVC codec when GOP = IP for some video sequences.
The increasing popularity of 3D TV creates the desire for more 3D video content. Unfortunately, it will take much time
for there to be an abundance of 3D video content derived from stereoscopic cameras. However, there currently exists a
vast quantity of 2D video material that can potentially be converted to 3D. Converting 2D into 3D is a complex process,
and so can be costly. Thus, an automated solution that can be achieved with low-complexity would be desirable. Our
past research work has already resulted in a real-time 2D-to-3D conversion technique, but this generates a surrogate
depth map that results in pseudo-3D and not necessarily accurate 3D. Our current research focuses on improving the
accuracy of the 3D effect by implementing a technique composed of a multi-step process to determine the depth-order of
objects, with respect to the camera, in each frame of a video sequence, and incorporating into our existing technique.
The multi-step process can be summarized as follows: detect pixels that belong to an edge; use block-based motion
estimation to determine if an edge pixel is moving and thus belongs to a moving edge (i.e., occlusion boundary);
determine which of either the left or right side block moves with the moving edge pixel, and by deduction determines the
occluding object; select seed points from the moving edge pixels; implement color-only region growing from each seed;
cluster regions into objects based on their proximity; globally assign depth-order to the objects based on perceived
viewing perspective of a frame; and modify the original surrogate depth map to create a more accurate depth map. Test
results show that this is a very effective and fast technique for deriving the depth-order of objects and generating more
accurate depth map values.
In the stereoscopic frame-compatible format, the separate high-definition left and high-definition right views are reduced
in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal.
This format has been suggested for 3DTV since it does not require additional transmission bandwidth and entails only
small changes to the existing broadcasting infrastructure. In some instances, the frame-compatible format might be used
to deliver both 2D and 3D services, e.g., for over-the-air television services. In those cases, the video quality of the 2D
service is bound to decrease since the 2D signal will have to be generated by up-converting one of the two views. In this
study, we investigated such loss by measuring the perceptual image quality of 1080i and 720p up-converted video as
compared to that of full resolution original 2D video. The video was encoded with either a MPEG-2 or a H.264/AVC
codec at different bit rates and presented for viewing with either no polarized glasses (2D viewing mode) or with
polarized glasses (3D viewing mode). The results confirmed a loss of video quality of the 2D video up-converted
material. The loss due to the sampling processes inherent to the frame-compatible format was rather small for both 1080i
and 720p video formats; the loss became more substantial with encoding, particularly for MPEG-2 encoding. The 3D
viewing mode provided higher quality ratings, possibly because the visibility of the degradations was reduced.
Distributed Video Coding (DVC) is an emerging video coding paradigm for the systems that require encoders having
low complexity that are supported by decoders having high complexity as would be required for, say, real-time video
capture and streaming from one mobile phone to display on another. Under the assumption of an error-free transmission
channel, the coding efficiency of current DVC systems is still below that of the latest conventional video codecs, such as
H.264/AVC. To increase coding efficiency we propose in this paper that either every second Key frame or every
Wyner-Ziv frame is downsampled by a factor of two in both dimensions prior to encoding and subsequent transmission.
However, this would necessitate upsampling coupled with interpolation at the decoder. Simple interpolation (e.g.,
bilinear or FIR filter) would not suffice since high-frequency (HF) spatial image content would be missing. Instead, we
propose the incorporation of a super-resolution (SR) technique that is based upon using example High Resolution images
with content that are specific to the Low Resolution scene that needs its HF content to be recovered. The example-based
scene-specific SR technique will add computational complexity to the decoder side of the DVC system, which is
allowable within the DVC framework. Rate-distortion curves will show that this novel combination of SR with DVC
improves the system performance by up to several decibels as measured by the PSNR, and can actually exceed the
performance of an H.264/AVC codec, using GOP=IP, for some video sequences.
Distributed Video Coding (DVC) is an emerging video coding paradigm for the systems that require low complexity
encoders supported by high complexity decoders. A typical real world application for a DVC system is mobile phones
with video capture hardware that have a limited encoding capability supported by base-stations with a high decoding
capability. Generally speaking, a DVC system operates by dividing a source image sequence into two streams, key
frames and Wyner-Ziv (W) frames, with the key frames being used to represent the source plus an approximation to the
W frames called S frames (where S stands for side information), while the W frames are used to correct the bit errors in
the S frames. This paper presents an effective algorithm to reduce the bit errors in the side information of a DVC
system. The algorithm is based on the maximum likelihood estimation to help predict future bits to be decoded. The
reduction in bit errors in turn reduces the number of parity bits needed for error correction. Thus, a higher coding
efficiency is achieved since fewer parity bits need to be transmitted from the encoder to the decoder. The algorithm is
called inter-bit prediction because it predicts the bit-plane to be decoded from previously decoded bit-planes, one bitplane
at a time, starting from the most significant bit-plane. Results provided from experiments using real-world image
sequences show that the inter-bit prediction algorithm does indeed reduce the bit rate by up to 13% for our test
sequences. This bit rate reduction corresponds to a PSNR gain of about 1.6 dB for the W frames.
This paper presents a fast implementation of a wavelet-based video codec. The codec consists of motion-compensated temporal filtering (MCTF), 2-D spatial wavelet transform, and SPIHT for wavelet coefficient coding. It offers compression efficiency that is competitive to H.264. The codec is implemented in software running on a general purpose PC, using C programming language and streaming SIMD extensions intrinsics, without assembly language. This high-level software implementation allows the codec to be portable to other general-purpose computing platforms. Testing with a Pentium 4 HT at 3.6GHz (running under Linux and using the GCC compiler, version 4), shows that the software decoder is able to decode 4CIF video in real-time, over 2 times faster than software written only in C language. This paper describes the structure of the codec, the fast algorithms chosen for the most computationally intensive elements in the codec, and the use of SIMD to implement these algorithms.
We report about a hierarchical design for extracting ship features and recognizing ships from SAR images, and which will eventually feed a multisensor data fusion system for airborne surveillance. The target is segmented from the image background using directional thresholding and region merging processes. Ship end-points are then identified through a ship centerline detection performed with a Hough transform. A ship length estimate is calculated assuming that the ship heading and/or the cross-range resolution are known. A high-level ship classification identifies whether the target belongs to Line (mainly combatant military ships) or Merchant ship categories. Category discrimination is based on the radar scatterers' distribution in 9 ship sections along the ship's range profile. A 3-layer neural network has been trained on simulated scatterers distributions and supervised by a rule- based expert system to perform this task. The NN 'smoothes out' the rules and the confidence levels on the category declaration. Line ship type (Frigate, Destroyer, Cruiser, Battleship, Aircraft Carrier) is then estimated using a Bayes classifier based on the ship length. Classifier performances using simulated images are presented.