In depth image based rendering, video sequences and their associated depth maps are used to render new camera
viewpoints for stereoscopic applications. In this study, we examined the effect of temporal downsampling of the
depth maps on stereoscopic depth quality and visual comfort. The depth maps of four eight-second video sequences
were temporally downsampled by dropping all frames, except the first, for every 2, 4, or 8 consecutive frames. The
dropped frames were then replaced by the retained frame. Test stereoscopic sequences were generated by using the
original image sequences for the left-eye view and the rendered image sequences for the right-eye view. The
downsampled versions were compared to a reference version with full depth maps that were not downsampled.
Based on the data from 21 viewers, ratings of depth quality for the downsampled versions were lower. Importantly,
ratings depended on the content characteristics of the stereoscopic video sequences. Results were similar for visual
comfort, except that the differences in ratings between sequences were larger. The present results suggest that more
processing, such as interpolation of depth maps, might be required to counter the negative effects of temporal
downsampling, especially beyond a downsampling of two.
We investigate the issue of efficient data organization and representation of the curved wavelet coefficients [curved wavelet transform (WT)]. We present an adaptive zero-tree structure that exploits the cross-subband similarity of the curved wavelet transform. In the embedded zero-tree wavelet (EZW) and the set partitioning in hierarchical trees (SPIHT), the parent-child relationship is defined in such a way that a parent has four children, restricted to a square of 2×2 pixels, the parent-child relationship in the adaptive zero-tree structure varies according to the curves along which the curved WT is performed. Five child patterns were determined based on different combinations of curve orientation. A new image coder was then developed based on this adaptive zero-tree structure and the set-partitioning technique. Experimental results using synthetic and natural images showed the effectiveness of the proposed adaptive zero-tree structure for encoding of the curved wavelet coefficients. The coding gain of the proposed coder can be up to 1.2 dB in terms of peak SNR (PSNR) compared to the SPIHT coder. Subjective evaluation shows that the proposed coder preserves lines and edges better than the SPIHT coder.
Depth image based rendering (DIBR) is useful for multiview autostereoscopic systems because it can produce a set of new images with different camera viewpoints, based on a single two-dimensional (2D) image and its corresponding depth map. In this study we investigated the role of object boundaries in depth maps for DIBR. Using a standard subjective assessment method, we asked viewers to evaluate the depth and the image quality of stereoscopic images in which the view for the right eye was rendered using (a) full depth maps, (b) partial depth maps containing full depth information but that was only located at object boundaries and edges, and (c) partial depth maps containing binary depth information at object boundaries and edges. Results indicate that depth quality was enhanced and image quality was slightly reduced for all test conditions, compared to a reference condition consisting of 2D images. The present results confirm previous observations indicating that depth information at object boundaries is sufficient in DIBR to create new views such as to produce a stereoscopic effect. However, depth ratings for the partial depth maps tended to be slightly lower than those generated with the full depth maps. The present study also indicates that more research is needed to increase the depth and image quality of the rendered stereoscopic images based on DIBR before the technique can be of wide and practical use.
Recursive wavelet filters and an alternative algorithm for implementing wavelet transform are presented in this paper. The recursive filters use previously calculated (past) wavelet coefficients as inputs to calculate the current wavelet coefficient, and provide the same transform results as convolutional FIR and lifting wavelet filters. The coefficients of the recursive filters are derived from those of conventional FIR wavelet filters. The wavelet transform with recursive filters requires a smaller amount of memory and is easy to implement in hardware. Another important advantage of the recursive filters is that perfect reconstruction can be easily achieved using recursive wavelet filters if a sequence of pixels to be transformed is extended by boundary pixel repetition. Boundary pixel repetition can be more efficient than the widely used method of symmetric extension for image and video coding.
A dense disparity map is required in the application of intermediate view reconstruction from stereoscopic images. A popular approach to obtaining a dense disparity map is maximum a-posteriori (MAP) disparity estimation. The MAP approach requires statistical models for modeling both a likelihood term and an a-priori term. Normally, a Gaussian model is used. In this contribution, block-wise MAP disparity estimation using different statistical models are compared in terms of Peak Signal-to-Noise Ratio (PSNR) of disparity-compensation errors and number of corresponding matches. It was found that, among the Cauchy, Laplacian, and Gaussian models, the Laplacian model is the best for the likelihood term while the Cauchy model is the best for the a-priori term. Experimental results show that reconstruction algorithm with the MAP disparity estimation using the determined models can improve image quality of the intermediate views reconstructed from stereoscopic image pairs.
The curved wavelet transform performs 1-D filtering along curves and exploits orientation features of edges and lines in an image to improve the compactness of the wavelet transform. This paper investigates the issue of efficient data organization and representation of the curved wavelet coefficients. We present an adaptive zero-tree structure that exploits the cross-subband similarity of the curved wavelet transform. The child positions in the adaptive zero-tree structure are not restricted to a square of 2x2 pixels and they vary with the curves along which the WT is performed. Five child patterns have been determined according to different combination of curve orientations. A new image coder, using the curved wavelet transform, is then developed based on this adaptive zero-tree structure and the set partitioning technique. Experimental results using synthetic and natural images show the effectiveness of the proposed adaptive zero-tree structure for encoding of the curved wavelet coefficients. The coding gain of the proposed coder can be as higher as 1.2dB in terms of PSNR compared to the SPIHT coder.
In a technique called depth image based rendering, new images are generated using information from an original source image and its corresponding depth map, such that the new images appear to have been taken from different camera viewpoints. This technique is bandwidth-efficient and is ideal for multiview display systems, such as autostereoscopic 3D-TV. In a previous study, we demonstrated that uniform smoothing of depth maps through Gaussian filtering helps improve the image quality of the rendered images. In the present study we investigated the potential benefits of two non-uniform smoothing methods: asymmetric smoothing, where the horizontal extent of smoothing was smaller than that in the vertical direction, and adaptive smoothing, where the level and extent of smoothing was based on the local depth magnitude. In this vein, ten viewers assessed image quality and depth quality of four stereoscopic images in which the view to one eye was a rendered image based on one of the three smoothing methods: uniform, asymmetric, or adaptive. The experimental results showed an improvement in ratings of image quality for all three methods as the level of smoothing was increased. The results also indicated a slight advantage in image quality for asymmetric smoothing over the other two methods. Ratings of overall depth quality were significantly higher than corresponding non-stereoscopic references for all three methods, although the ratings decreased at the highest level of smoothing that was used in the present study. In general, ratings of depth quality tended to be marginally lower for the asymmetric method.
A technique to improve the image quality of stereoscopic pictures generated from depth maps (depth image based rendering or DIBR) is examined. In general, there are two fundamental problems with DIBR: a depth map could contain artifacts (e.g., noise or "blockiness") and there is no explicit information on how to render newly exposed regions ("holes") in the rendered image as a result of new virtual camera positions. We hypothesized that smoothing depth maps before rendering will not only minimize the effects of noise and distortions in the depth maps but will also reduce areas of newly exposed regions where potential artifacts can arise. A formal subjective assessment of four stereoscopic sequences of natural scenes was conducted with 23 viewers. The stereoscopic sequences consisted of source images for the left-eye view and rendered images for the right-eye view. The depth maps were smoothed with a Gaussian blur filter at different levels of strength before depth image based rendering. Results indicated that ratings of perceived image quality improved with increasing levels of smoothing of the depth maps. Even though the depth maps were smoothed, a negative effect on ratings of overall perceived depth quality was not found.
Current binocular stereoscopic displays cause visual discomfort when the objects with large disparities are present in the scene. With this technique, the improvement of visual comfort has been reported by blurring far background and foregrounds in the scene. However, this technique has a drawback of degrading overall image quality. To lesson visual discomfort caused by large disparities while maintaining high-perceived image quality, we use a novel disparity-based asymmetrical filtering technique. Asymmetrical filtering, which refers to the filtering applied to the image of one eye only, has been showen to maintain the sharpness of a stereoscopic image, provided that the amount of filtering is low. Disparity-based asymmetrical filtering usese the disparity information in a stereoscopic image for controlling the severity of blurring. We investigated the effects of this technique on stereoscopic video by measuring visual comfort and apparent sharpness. Our results indicate that disparity-based asymmetrical filtering does not always improve visual comfort but it maintains image quality.
Automatic extraction of facial feature points is one of the main problems for semantic coding of videophone sequences at very low bit rates In this contribution, an approach for estimation of the eye and mouth corner point positions is presented. For this proposition, the location informations of the face model are exploited to define search areas for estimation of the eye and mouth corner point positions. Then, the eye and mouth corner point positions are estimated based on a template matching technique with eye and mouth corner templates. Finally, in order to verify these estimated corner point positions, some geometric conditions between the corner point positions and the center point positions of the eyes and the mouth are exploited. The proposed algorithm has been applied to test sequences Claire and Miss America with a spatial resolution corresponding to CIF and a frame rate of 10 Hz.