Removing shadows in a single image has been a challenging problem because shadows can appear in various forms due to complex physical situations, influenced by many factors such as light sources and the material’s transparency. In order to remove shadows precisely, most previous works utilized shadow mask information, which indicates the shadow region in a given image using binary representation. However, shadow mask utilization inevitably induces multiple problems, including shadow removal performance dependency and additional shadow detection process requirements. To solve these problems, the proposed algorithm is based on an image-to-image translation algorithm, which does not require additional shadow mask information. In this deep neural network , the convergence of fast learning is induced by utilizing various normalization layers. However, in a case that is very sensitive to various spatial features of an input image, such as shadow removal, the normalization process causes a problem of losing a large amount of information existing in the input image data. So, we utilize spatially adaptive denormalization(SPADE) to prevent loss of spatial features of input image data. Therefore, not only does it fundamentally solve the problem that various feature information constituting the input image is lost in the normalization process, but also enables precise shadow region removal by combining the feature map of multi-resolutions with the feature map of the decoder. In evaluation, the proposed algorithm shows that it exceeds the existing approach by about 20~30% in both PSNR and RMSE based on the ISTD large data set.
Eliminating reflections on a single-image has been a challenging issue in image processing and computer vision, because defining an elaborate physical model to separate irregular reflections is almost impossible. In fact, while human vision can automatically focus on the transmitted object, basic deep neural networks even have a limitation to learn the attentive mechanism. In this paper, to solve this problem, a Generative Adversarial Networks guided by using Depth of Field (DoF) is proposed. The DoF is formulated by using image statistics and indicates the focused region of image. Thus, by adding this information to both generative and discriminative networks, the generator focuses on the transmitted layer and the discriminator will be able to estimate the local consistency of the restored areas. Since it is intractable to obtain the ground-truth transmitted layer in real images, a dataset with synthetic reflection is considered for quantitative evaluation. The experimental results demonstrate that the proposed method outperforms the existing approaches in both PSNR and SSIM. The visual outputs indicate that the proposed network convincingly eliminates the reflection and produce sufficient transmitted layers as compared to the previous methods.
In background subtraction, principal component analysis (PCA) based algorithm has shown remarkable ability to decompose foreground and background in video acquired by static camera. The algorithm via closed form solution of L1-norm Tucker-2 decomposition is one of the real-time background subtraction algorithms. The closed form solution can be obtained from linear combination of video frame vectors and coefficient vector which composed of only +1 and -1. However, since the optimal coefficient vector is unknown, the method cannot help to be a complicated combinatorial optimization problem, when the number of input frame is large. In this paper, to solve this problem, Bayesian optimization (BayesOpt) which is a black-box derivative-free global optimization based background subtraction method is proposed. This method finds the optimal coefficient combination without considering the linear combination of all possible coefficient-combinations, using Bayesian statistical model and Expected Improvement (EI) acquisition function. Here the Bayesian statistical modeling is the method that measures the uncertainty of unsampled coefficient combination points and the EI function is a surrogate function which indicates the next sampling coefficient combination points. The experimental results confirm the efficiency of the proposed method.
Color constancy is the feature of the human vision system (HVS) that ensures the relative constancy of the perceived color of objects under varying illumination conditions. The Retinex theory of machine vision systems is based on the HVS. Among Retinex algorithms, the physics-based algorithms are efficient; however, they generally do not satisfy the local characteristics of the original Retinex theory because they eliminate global illumination from their optimization. We apply the sparse source separation technique to the Retinex theory to present a physics-based algorithm that satisfies the locality characteristic of the original Retinex theory. Previous Retinex algorithms have limited use in image enhancement because the total variation Retinex results in an overly enhanced image and the sparse source separation Retinex cannot completely restore the original image. In contrast, our proposed method preserves the image edge and can very nearly replicate the original image without any special operation.
Intra coding of an RGB video is important to many high fidelity multimedia applications because video acquisition is mostly done in RGB space, and the coding of decorrelated color video loses its virtue in high quality ranges. In order to improve the compression performance of an RGB video, this paper proposes an inter color prediction using adaptive weights. For making full use of spatial, as well as inter color correlation of an RGB video, the proposed scheme is based on a residual prediction approach, and thus the incorporated prediction is performed on the transformed frequency components of spatially predicted residual data of each color plane. With the aid of efficient prediction employing frequency domain inter color residual correlation, the proposed scheme achieves up to 24.3% of bitrate reduction, compared to the common mode of H.264/AVC high 4:4:4 intra profile.
The performance of intra coding is important in many broadcasting and professional applications, because repeated insertion of intra frames usually requires a large portion of video data. In order to improve such intra coding performance, this paper proposes an advanced intra prediction scheme, systematically increasing prediction precision via optimal linear predictors for weakly-predicted subspace. Experimental results show that the proposed scheme outperforms the conventional and the recent best intra prediction methods by up to 11.41% and 9.08% of bitrate reduction, respectively.
The main difficulty in segmenting a cell image occurs when there are red blood cells touching the leukocyte. Similar brightness of the touched red blood cells with the leukocytes make the separation of the cytoplasm from the red blood cells quite difficult. Conventional approaches were based on the search of the concavities created by contact of two round boundaries as two points to be connected for the separation. Here, we exploit the fact that the boundary of the leukocytes normally has a round shape and a small portion of it is disconnected due to the touching red blood cells. Specifically, at an initial central point of the nucleus in the leukocyte, we can generate the largest possible circle that covers a circular portion of the composite of nucleus and cytoplasm areas. Then, by perturbing the initial central points and selecting only those central points that do not cross the boundary, we can cover most of interior regions in the nucleus and the cytoplasm, separating the leukocyte from the touching red blood cells.
A two-stage algorithm is proposed for locating smooth and detailed disparity vector fields in a stereo image pair. The algorithm consists of hierarchical disparity estimation using a region-dividing technique and edge-preserving regularization. The hierarchical region-dividing disparity estimation increases the efficiency and reliability of the estimation process. At the second stage, the vector fields are regularized with an energy model that produces smooth fields while preserving discontinuities resulting from object boundaries. The minimization problem is addressed by solving a corresponding partial differential equation using a finite-difference method. Experiments show that the proposed algorithm provides accurate and spatially correlated disparity vector fields in various types of stereo images, even in the case of images with large displacements.
Transcoding Digital Video (DV) for Digital Video Cassette Recorder (DVCR) into MPEG-2 intra coding is performed in the DCT domain to reduce conversion steps. Multiplying matrix by transformed data is used for 4:1:1-to-4:2:0 chroma format conversion and 2-4-8 DCT mode to 8-8 DCT mode conversion for parallel processing. M_quant of MPEG-2 rate control is computed in the DCT domain. For MPEG-2 inter coding, fast motion estimations taking advantage of data in the DCT domain are studied for transcoding. Among them, ME with overlapped search range shows better PSNR performance than ME without overlapping.
This paper proposes the algorithm which removes blocking artifacts in DCT-based images and is able to be implemented in DSP chip. In general, the low pass filter is used for removing blocking artifacts. However this filter has the disadvantage of removing the information in the high frequency region, and then degrades the quality of images. Conventional approaches of image restoration improve the quality of images, but have too much computation. Thus, they are not proper for the real-time process. CLS filter is one of the conventional approaches of image restoration. This filter enhances degraded images very well, however because this filter requires too much computations and sometimes uses the iterative method. Thus, it is not suitable for the real-time process. Therefore, this paper modifies CLS filter in the FIR filter form. To acquire good quality of images, blocks are classified by the characteristics of the blocks and then are filtered. The proposed algorithm removes blocking artifacts of JPEG, MPEG, H.261 and H.263, etc. To implement the proposed algorithm, TMS320C31 which is a DSP chip is used.
In this paper, we present a panorama system which extends the camera's field of view to the human's one. As the camera's field of view is much smaller than the human's, object could not be captured in a single frame in many cases. To solve this problem a fish-eye lens can be used. However the images obtained in this way have problems in image quality. Another solution is the panorama by frame-aligning and frame-pasting video frames in video sequence. Panorama System gets real time images from the panning video camera and takes align-paste process using registration algorithm. This System can show real time front scene in the human's field of view and can be used in the area-observation application. Experimental results prove the effectiveness and feasibility of the system.