The growing demand for immersive experiences has significantly influenced research in the quality assessment of light field images (LFIs). However, LFIs are susceptible to distortion during encoding, transmission, and compression, making accurate distortion measurement a current concern. In the full-reference (FR) quality assessment of LFIs, the discrepancy between the distorted and reference images is typically learned for network acquisition, overlooking the potential to enhance learning efficiency and accuracy by incorporating error map input. To address this, we propose a framework for the quality evaluation of FR LFIs based on multi-feature interactive fusion. The framework comprises three components: feature encoding network, spatial angle interactive fusion network, and score regression network, aimed at obtaining the quality score of LFIs. The feature encoding network leverages different extraction networks to capture spatial, angular, and error information, enabling the network to focus on key areas. In the spatial angle interactive fusion network, the feature fusion network integrates features from different networks, enriching and unifying information, the spatial angle interactive network further extracts distortion information from the enriched feature maps. The resulting framework demonstrates strong subjective and objective agreement in the quality assessment of LFIs, offering theoretical and practical implications for algorithm optimization and application.
With similar textures, dark backgrounds, and complex scenes, RGB images are usually unable to provide discriminative information for model training, which often leads to inaccurate prediction results. Compared with RGB salient object detection (SOD) methods, RGB-T SOD have thermal infrared (TIR) information as an informational supplement. As a result, RGB-T SOD can adapt to more complex environments and achieve better results. However, existing methods do not efficiently integrate features between different modalities and do not fully exploit spatial information from the shallow-level features. Accordingly, we propose an EDGE-Net. First, we propose an edge extraction module to capture the edge information in the shallow-level features and use it to guide the subsequent decoding. The original edge features are then weighted after channel attention processing. Second, to additionally suppress the noise of the shallow-level features, we design a global information extraction module. In this module, multiple convolutions are used instead of single convolutions to reduce the computational effort, and convolutions with different dilation rates are used to obtain different receptive fields. We conduct extensive experiments on the RGB-T dataset and show that the proposed method achieves superior performance compared to several state-of-the-art algorithms. The code and results of our method are available in a Github repository available at: https://github.com/BorreloadD/EDGE-Net.git.
Light field (LF) imaging, which can capture spatial and angular information of light-rays in one shot, has received increasing attention. However, the well-known LF spatio-angular trade-off problem has restricted many applications of LF imaging. In order to alleviate this problem, this paper put forward a dual-level LF reconstruction network to improve LF angular resolution with sparselysampled LF inputs. Instead of using 2D or 3D LF representation in reconstruction process, this paper propose an LF directional EPI volume representation to synthesize the full LF. The proposed LF representation can encourage an interaction of spatial-angular dimensions in convolutional operation, which is benefit for recovering the lost texture details in synthesized sub-aperture images (SAIs). In order to extract the high-dimensional geometric features of the angular mapping from low angular resolution inputs to high angular full LF, a dual-level deep network is introduced. The proposed deep network consists of an SAI synthesis sub-network and a detail refinement sub-network, which allows LF reconstruction in a dual-level constraint (i.e., from coarse to fine). Our network model is evaluated on several real-world LF scenes datasets, and extensive experiments validate that the proposed model outperforms the state-of-the-arts and achieves a better reconstruct SAIs perceptual quality as well.
Three-dimensional (3-D) holoscopic imaging is a candidate promising 3-D technology that can overcome some drawbacks of current 3-D technologies. Due to the particular optical structure, a holoscopic image consists of an array of two-dimensional microimages (MIs) that represent different perspectives of the scene. To address the data-intensive characteristics and specific structure of holoscopic images, efficient coding schemes are of utmost importance for efficient storage and transmission. We propose a 3-D holoscopic image-coding scheme using a sparse viewpoint image (VI) array and disparities. In the proposed scheme, a holoscopic image is decomposed into a VI array totally and the VI array is sampled into a sparse VI array. To reconstruct the full holoscopic image, disparities between adjoining MIs are calculated. Based on the remainder set of VIs and disparities, a full holoscopic image is reconstructed and encoded as a reference frame for the coding of the full holoscopic image. As an outcome of the representation, we propose a multiview plus depth compression scheme for 3-D holoscopic images coding. Experimental results show that the proposed coding scheme can achieve an average of 51% bit-rate reduction compared with high efficiency video coding intracoding.
KEYWORDS: 3D image processing, Image compression, Statistical analysis, Video coding, 3D displays, Visualization, Error analysis, Computer programming, Prototyping, Imaging systems
Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.