Nowadays, a precise video quality assessment (VQA) model is essential to maintain the quality of service (QoS). However, most existing VQA metrics are designed for specific purposes and ignore the spatiotemporal features of nature video. This paper proposes a novel general-purpose no-reference (NR) VQA metric adopting Long Short-Term Memory (LSTM) modules with the masking layer and pre-padding strategy, namely VQA-LSTM, to solve the above issues. First, we divide the distorted video into frames and extract some significant but also universal spatial and temporal features that could effectively reflect the quality of frames. Second, the data preprocessing stage and pre-padding strategy are used to process data to ease the training for our VQA-LSTM. Finally, a three-layer LSTM model incorporated with masking layer is designed to learn the sequence of spatial features as spatiotemporal features and learn the sequence of temporal features as the gradient of temporal features to evaluate the quality of videos. Two widely used VQA database, MCL-V and LIVE, are tested to prove the robustness of our VQA-LSTM, and the experimental results show that our VQA-LSTM has a better correlation with human perception than some state-of-the-art approaches.
With the recent advances in high computing capabilities and reliable delivery networks, it is realistic to expect emerging immersive media applications such as virtual reality (VR) and 360-deg video. ISO/IEC Motion Picture Experts Group (MPEG) has launched an MPEG-I project that focuses on not just the current but also the next-generation immersive media applications. Light field (LF) is one of the candidates under consideration by MPEG-I. A LF lenslet image contains plentiful repeating patterns since each microlens captures the same scene. On the other hand, intrablock copy (IBC) in screen content coding (SCC) increases the coding efficiency of screen content (SC) by finding the repeating patterns. SCC can be an efficient encoder for LF image compression. However, SCC is not optimized for compressing lenslet images. Therefore, we propose an efficient lenslet image coding (LIC) model using SCC to encode lenslet images based on the lenslet image characteristics. Experimental results show that our LIC model further improves the coding efficiency of SCC on lenslet images by 3.72% on average and up to 7.77% in bitrate reduction. For fast approach, LIC is faster than SCC by 44.12%. With our proposed LIC model, the encoded bitstreams are compliant with the SCC standard with an even faster decoding time.
A screen content coding (SCC) extension to high-efficiency video coding has been developed to improve the coding efficiency for videos with computer-generated text and graphics. It employs additional coding tools such as intrablock copy and palette modes for intraprediction. Although these modes can provide high coding efficiency for screen content videos, they cause an increase in computational complexity. This paper proposes a fast intraprediction algorithm for SCC by content analysis and dynamic thresholding. To skip unnecessary modes for a coding unit (CU), a rough CU classification is performed as a preprocessing step. Then, two early mode decision techniques are proposed by performing a fine-granular CU classification and deriving a content-dependent rule with adaptive thresholding based on the background color. To make early termination of CU partitions, another content-dependent rule with adaptive thresholding according to rate-distortion cost is derived to skip unnecessary partitions. In addition, a scene change detection method is adopted to update all content-dependent thresholds for different scenes in a sequence. Experimental results show that the proposed algorithm achieves 35.95% computational complexity reduction on average with 1.20% Bjøntegaard delta bitrate loss under all intraconfiguration, which outperforms all state-of-the-art algorithms in the literature.
Weighted prediction (WP) is an efficient video coding tool that was introduced since the establishment of the H.264/AVC video coding standard, for compensating the temporal illumination change in motion estimation and compensation. WP parameters, including a multiplicative weight and an additive offset for each reference frame, are required to be estimated and transmitted to the decoder by slice header. These parameters cause extra bits in the coded video bitstream. High efficiency video coding (HEVC) provides WP parameter prediction to reduce the overhead. Therefore, WP parameter prediction is crucial to research works or applications, which are related to WP. Prior art has been suggested to further improve the WP parameter prediction by implicit prediction of image characteristics and derivation of parameters. By exploiting both temporal and interlayer redundancies, we propose three WP parameter prediction algorithms, enhanced implicit WP parameter, enhanced direct WP parameter derivation, and interlayer WP parameter, to further improve the coding efficiency of HEVC. Results show that our proposed algorithms can achieve up to 5.83% and 5.23% bitrate reduction compared to the conventional scalable HEVC in the base layer for SNR scalability and 2× spatial scalability, respectively.
The three-dimensional (3-D) video extension of high-efficiency video coding is an emerging coding standard for multiple-view-plus-depth that allows view synthesis for multiple displays with depth information. In order to avoid mixing between the foreground and background, the depth discontinuities defined at the object boundary should be retained. To solve this issue, a depth intramode, i.e., the depth-modeling mode (DMM), is introduced in 3-D-high-efficiency video coding as an edge predictor. The test model HTM 8.1 includes DMM1 and DMM3. However, the mode-decision strategy of DMM increases the complexity drastically. Therefore, we propose a fast DMM1 decision algorithm that estimates sharp edges by a subregional search method. The optimal wedgelet pattern of DMM1 is then searched only in the most probable region. Additionally, another fast method is raised to skip DMM3 when mismatch occurs between the depth prediction unit (PU) and its colocated texture PU. Simulation results show that the proposed algorithm has slightly better performance in terms of complexity reduction compared with the wedgelet-pattern-reducing algorithm from the literature while better maintaining the coding performance. In addition, the proposed algorithm has a performance similar to that of an existing DMM-skipping algorithm. Moreover, it could be integrated with that category of algorithms for additional time savings.
Multiple reference frame motion estimation (MRF-ME) is one of the most crucial tools in H.264/AVC to improve coding efficiency. However, it disciplines an encoder by giving extra computational complexity. The required computation proportionally expands when the number of reference frames used for motion estimation increases. Aiming to reduce the computational complexity of the encoder, various motion vector (MV) composition algorithms for MRF-ME have been proposed. However, these algorithms only perform well in a limited range of reference frames. The performance deteriorates when motion vector composition is processed from the current frame to a distant reference frame. In this paper, a reliable tracking mechanism for MV composition is proposed by utilizing only the relevant areas in the target macroblock and taking different paths through a novel selection process from a set of candidate motion vectors. The proposed algorithm is especially suited for temporally remote reference frames in MRF-ME. Experimental results show that compared with the existing MV composition algorithms, the proposed one can deliver a remarkable improvement on the rate-distortion performance with similar computational complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.