Proc. SPIE. 10795, Electro-Optical and Infrared Systems: Technology and Applications XV
KEYWORDS: Signal to noise ratio, Imaging systems, Sensors, Image processing, Image sensors, Signal processing, Modulation transfer functions, Performance modeling, Electro optical modeling, Systems modeling
Image-based Electro-Optical system simulation including an end-to-end performance test is a powerful tool to characterize a camera system before it has been built. In particular, it can be used in the design phase to make an optimal trade-off between performance on the one hand and SWaPC (Size, Weight, Power and Cost) criteria on the other. During the design process, all components can be simulated in detail, including optics, sensor array properties, chromatic and geometrical lens corrections, signal processing, and compression. Finally, the overall effect on the outcome can be visualized, evaluated and can be optimized. In this study, we developed a detailed model of the CMOS camera system imaging chain (including scene, image processing and display). The model simulation was evaluated by comparing simulated (display) imagery with recorded image using both physical (SNR) and psychophysical measures (acuity and contrast thresholds using the TOD-methodology with a human observer) for a range of conditions: different light levels, moving stimuli with different speeds, movies and single frames. The performance analysis show that the model simulations are largely in line with the recorded sensor images with some minor deviations. The result of the study is a detailed, validated and powerful sensor performance prediction model. This project has received funding from the Electronic Component Systems for European Leadership Joint Undertaking.
In numerous applications, such as surveillance, industrial inspection, medical imaging and security, high resolution is of crucial importance for the performance of the computer vision systems. Besides spatial resolution, high frame rate is also of high importance in these applications. While the resolution of CMOS imaging sensors is following Moores law, for optics it is becoming increasingly challenging to follow such a development.
In order to follow the pixel size reduction, lenses have to be constructed with much more precision, while the physical size increases dramatically. Moreover, expertise needed to construct a lens of sufficient quality is available at a very few locations in the world. The use of lower quality lenses with high resolution imaging sensors, lead to numerous artifacts.
Due to the different light refraction indexes for different wavelengths, primary color components do not reach their targeted pixels in the sensor plane, which causes lateral chromatic aberration artifacts. These artifacts manifest as false colors in high contrast regions around the edges. Moreover, due to the variable refraction indexes, light rays do not focus on the imaging sensor plain, but in front or behind it, which leads to blur due to the axial aberration. Due to the increased resolution, the size of the pixel is significantly reduced, which reduces the amount of light it receives. As a consequence, the amount of noise increases dramatically. The amount of noise further increases due to the high frame rate and therefore shorter exposure times. In order to reduce the complexity and the price, most cameras today are built using one imaging sensor with spatial color multiplexing filter arrays. This way, camera manufacturers avoid using three imaging sensors and beam splitters, which significantly reduces the price of the system. Since not all color components are present at each pixel location it is necessary to interpolate them, i.e. to perform demosaicking. In the presence of lateral chromatic aberration, this task becomes more complex, since many pixels in the CFA do not receive a proper color, which creates occlusions which in turn create additional artifacts after demosaicing. To prevent this type of artifacts, occlusion inpainting has to be performed.
In this paper we propose a new method for simultaneous correction of all artifacts mentioned above. We define operators representing spatially variable blur, subsampling and noise applied to the unknown artifacts free image, and perform reconstruction of the artifacts free image. First we perform lens calibration step in order to acquire the lens point spread (PSF) function at each pixel in the image using point source. Once we obtained PSFs we perform joint deconvolution using variable kernels obtained from the previous step.
The human visual system registers electromagnetic waves lying in a 390 to 700 nm wavelength range. While visible light provides humans with sufficient guidance for everyday activities, a large amount of information remains unregistered. However, electromagnetic radiation outside the visible range can be registered using cameras and sensors. Due to the multiplexing of visible light and additional wavelengths, the resolution drops significantly. To improve the resolution, we propose a GPU based joint method for demosaicking, denoising and superresolution. In order to interpolate missing pixel values for all four wavelengths, we first extract high pass image features from all types of pixels in the mosaic. Using this information we perform directional interpolation, to preserve continuities of edges present in all four component images. After the initial interpolation, we introduce high spatial content from other frequency bands, giving preference to original over the interpolated edges. Moreover, we perform the refinement and upsampling of the demosaicked image by introducing information from previous frames. Motion compensation relies on a subpixel block-based motion estimation algorithm, relying on all 4 chromatic bands, and performs regularization to reduce estimation errors and related artifacts in the interpolated images. We perform experiments using the mosaic consisting of red, green, blue and near-infrared pixels (850nm). The proposed algorithm is implemented on Jetson TX 2 platform, achieving 120 fps at QVGA resolution. It operates recursively, requiring only one additional frame buffer for the previous results. The results of the proposed method compared favorably to the state-of-the-art multispectral demosaicing methods.
High-resolution video capture is crucial for numerous applications such as surveillance, security, industrial inspection, medical imaging and digital entertainment. In the last two decades, we are witnessing a dramatic increase of the spatial resolution and the maximal frame rate of video capturing devices.
In order to achieve further resolution increase, numerous challenges will be facing us. Due to the reduced size of the pixel, the amount of light also reduces, leading to the increased noise level. Moreover, the reduced pixel size makes the lens imprecisions more pronounced, which especially applies to chromatic aberrations. Even in the case when high quality lenses are used some chromatic aberration artefacts will remain. Next, noise level additionally increases due to the higher frame rates.
To reduce the complexity and the price of the camera, one sensor captures all three colors, by relying on Color Filter Arrays. In order to obtain full resolution color image, missing color components have to be interpolated, i.e. demosaicked, which is more challenging than in the case of lower resolution, due to the increased noise and aberrations.
In this paper, we propose a new method, which jointly performs chromatic aberration correction, denoising and demosaicking. By jointly performing the reduction of all artefacts, we are reducing the overall complexity of the system and the introduction of new artefacts. In order to reduce possible flicker we also perform temporal video enhancement. We evaluate the proposed method on a number of publicly available UHD sequences and on sequences recorded in our studio.
High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera.
Realistic visualization is crucial for a more intuitive representation of complex data, medical imaging, simulation, and entertainment systems. In this respect, multiview autostereoscopic displays are a great step toward achieving the complete immersive user experience, although providing high-quality content for these types of displays is still a great challenge. Due to the different characteristics/settings of the cameras in the multiview setup and varying photometric characteristics of the objects in the scene, the same object may have a different appearance in the sequences acquired by the different cameras. Images representing views recorded using different cameras, in practice, have different local noise, color, and sharpness characteristics. View synthesis algorithms introduce artifacts due to errors in disparity estimation/bad occlusion handling or due to an erroneous warping function estimation. If the input multiview images are not of sufficient quality and have mismatching color and sharpness characteristics, these artifacts may become even more disturbing. Accordingly, the main goal of our method is to simultaneously perform multiview image sequence denoising, color correction, and the improvement of sharpness in slightly defocused regions. Results show that the proposed method significantly reduces the amount of the artifacts in multiview video sequences, resulting in a better visual experience.
Realistic visualization is crucial for more intuitive representation of complex data, medical imaging, simulation and entertainment systems. Multiview autostereoscopic displays are great step towards achieving complete immersive user experience. However, providing high quality content for this type of displays is still a great challenge. Due to the different characteristics/settings of the cameras in the multivew setup and varying photometric characteristics of the objects in the scene, the same object may have different appearance in the sequences acquired by the different cameras. Images representing views recorded using different cameras in practice have different local noise, color and sharpness characteristics. View synthesis algorithms introduce artefacts due to errors in disparity estimation/bad occlusion handling or due to erroneous warping function estimation. If the input multivew images are not of sufficient quality and have mismatching color and sharpness characteristics, these artifacts may become even more disturbing. The main goal of our method is to simultaneously perform multiview image sequence denoising, color correction and the improvement of sharpness in slightly blurred regions. Results show that the proposed method significantly reduces the amount of the artefacts in multiview video sequences resulting in a better visual experience.
The digital revolution has reached hospital operating rooms, giving rise to new opportunities such as tele-surgery and tele-collaboration. Applications such as minimally invasive and robotic surgery generate large video streams that demand gigabytes of storage and transmission capacity. While lossy data compression can offer large size reduction, high compression levels may significantly reduce image quality. In this study we assess the quality of compressed laparoscopic video using a subjective evaluation study and three objective measures. Test sequences were full High-Definition videos captures of four laparoscopic surgery procedures acquired on two camera types. Raw sequences were processed with H.264/AVC IPPP-CBR at four compression levels (19.5, 5.5, 2.8, and 1.8 Mbps). 16 non-experts and 9 laparoscopic surgeons evaluated the subjective quality and suitability for surgery (surgeons only) using Single Stimulus Continuous Quality Evaluation methodology. VQM, HDR-VDP-2, and PSNR objective measures were evaluated. The results suggest that laparoscopic video may be lossy compressed approximately 30 to 100 times (19.5 to 5.5 Mbps) without sacrificing perceived image quality, potentially enabling real-time streaming of surgical procedures even over wireless networks. Surgeons were sensitive to content but had large variances in quality scores, whereas non-experts judged all scenes similarly and over-estimated the quality of some sequences. There was high correlation between surgeons’ scores for quality and “suitability for surgery”. The objective measures had moderate to high correlation with subjective scores, especially when analyzed separately by camera type. Future studies should evaluate surgeons’ task performance to determine the clinical implications of conducting surgery with lossy compressed video.
In this paper we present a new method for superresolution of depth video sequences using high resolution
color video. Here we assume that the depth sequence does not contain outlier points which can be present
in the depth images. Our method is based on multiresolution decomposition, and uses multiple frames to
search for a most similar depth segments to improve the resolution of the current frame. First step is the
wavelet decomposition of both color and depth images. Scaling images of the depth wavelet decomposition,
are superresolved using previous and future frames of the depth video sequence, due to their different nature.
On the other side wavelet band are improved using both previous frames of the wavelet bands and wavelet
bands of color images since similar edges might appear in both images. Our method shows significant
improvements over some recent depth images interpolation methods.
This paper presents a new method for unsupervised video segmentation based on mean shift clustering in spatio-temporal
domain. The main novelties of the proposed approach are dynamic temporal adaptation of clusters
due to which the segmentation evolves quickly and smoothly over time. The proposed method consists of a short
initialization phase and an update phase. The proposed method significantly reduce the computation load for
the mean shift clustering. In the update phase only the positions of relatively small number of cluster centers are
updated and new frames are segmented based on the segmentation of previous frames. The method segments
video in real-time and tracks video objects effectively.
In this paper we present a new method for joint denoising of depth and luminance images produced by
time-of-flight camera. Here we assume that the sequence does not contain outlier points which can be
present in the depth images. Our method first performs estimation of noise and signal covariance matrices
and then performs vector denoising. Luminance image is segmented into similar contexts using k-means
algorithm, which are used for calculation of covariance matrices. Denoising results are compared with the
ground truth images obtained by averaging of the multiple frames of the still scene.
Optical coherence tomography produces high resolution medical images based on spatial and temporal coherence
of the optical waves backscattered from the scanned tissue. However, the same coherence introduces
speckle noise as well; this degrades the quality of acquired images.
In this paper we propose a technique for noise reduction of 3D OCT images, where the 3D volume is
considered as a sequence of 2D images, i.e., 2D slices in depth-lateral projection plane. In the proposed
method we first perform recursive temporal filtering through the estimated motion trajectory between
the 2D slices using noise-robust motion estimation/compensation scheme previously proposed for video
denoising. The temporal filtering scheme reduces the noise level and adapts the motion compensation on
it. Subsequently, we apply a spatial filter for speckle reduction in order to remove the remainder of noise
in the 2D slices. In this scheme the spatial (2D) speckle-nature of noise in OCT is modeled and used for
spatially adaptive denoising. Both the temporal and the spatial filter are wavelet-based techniques, where
for the temporal filter two resolution scales are used and for the spatial one four resolution scales.
The evaluation of the proposed denoising approach is done on demodulated 3D OCT images on different
sources and of different resolution. For optimizing the parameters for best denoising performance fantom
OCT images were used. The denoising performance of the proposed method was measured in terms of
SNR, edge sharpness preservation and contrast-to-noise ratio. A comparison was made to the state-of-the-art
methods for noise reduction in 2D OCT images, where the proposed approach showed to be advantageous
in terms of both objective and subjective quality measures.
It is believed by many that three-dimensional (3D) television will be the next logical development toward a
more natural and vivid home entertaiment experience. While classical 3D approach requires the transmission
of two video streams, one for each view, 3D TV systems based on depth image rendering (DIBR) require a
single stream of monoscopic images and a second stream of associated images usually termed depth images
or depth maps, that contain per-pixel depth information. Depth map is a two-dimensional function that
contains information about distance from camera to a certain point of the object as a function of the image
coordinates. By using this depth information and the original image it is possible to reconstruct a virtual
image of a nearby viewpoint by projecting the pixels of available image to their locations in 3D space
and finding their position in the desired view plane. One of the most significant advantages of the DIBR
is that depth maps can be coded more efficiently than two streams corresponding to left and right view
of the scene, thereby reducing the bandwidth required for transmission, which makes it possible to reuse
existing transmission channels for the transmission of 3D TV. This technique can also be applied for other
3D technologies such as multimedia systems.
In this paper we propose an advanced wavelet domain scheme for the reconstruction of stereoscopic
images, which solves some of the shortcommings of the existing methods discussed above. We perform the
wavelet transform of both the luminance and depth images in order to obtain significant geometric features,
which enable more sensible reconstruction of the virtual view. Motion estimation employed in our approach
uses Markov random field smoothness prior for regularization of the estimated motion field.
The evaluation of the proposed reconstruction method is done on two video sequences which are typically
used for comparison of stereo reconstruction algorithms. The results demonstrate advantages of the proposed
approach with respect to the state-of-the-art methods, in terms of both objective and subjective performance