The development of an image processing pipeline for each new camera design can be time-consuming. To speed
camera development, we developed a method named L3 (Local, Linear, Learned) that automatically creates an
image processing pipeline for any design. In this paper, we describe how we used the L3 method to design and
implement an image processing pipeline for a prototype camera with five color channels. The process includes
calibrating and simulating the prototype, learning local linear transforms and accelerating the pipeline using
graphics processing units (GPUs).
The human visual system is an exquisitely engineered system that can serve as a model and inspiration for the design of many imaging systems. Optics and optical engineering play a key role in developing new techniques and approaches for both the study of human vision and the design of novel imaging systems. For example, advances in optical sensing and imaging have led to important discoveries about retinal image processing, and optical design tools are necessary for improving vision in patients. While advances in optics are improving our understanding of the human visual system, this understanding has also led to improvements in artificial vision systems, image processing algorithms, visual displays, and even modern optical elements and systems.
We describe a model for underwater illumination that is based on how light is absorbed and scattered by water,
phytoplankton and other organic and inorganic matter in the water. To test the model, we built a color rig using a
commercial point-and-shoot camera in an underwater housing and a calibrated color target. We used the measured
spectral reflectance of the calibration color target and the measured spectral sensitivity of the camera to estimate the
spectral power of the illuminant at the surface of the water. We then used this information, along with spectral basis
functions describing light absorbance by water, phytoplankton, non-algal particles (NAP) and colored dissolved organic
matter (CDOM), to estimate the spectral power of the illuminant and the amount of scattered light at each depth. Our
results lead to insights about color correction, as well as the limitations of consumer digital cameras for monitoring water
The high density of pixels in modern color sensors provides an opportunity to experiment with new color filter
array (CFA) designs. A significant bottleneck in evaluating new designs is the need to create demosaicking,
denoising and color transform algorithms tuned for the CFA. To address this issue, we developed a method(local,
linear, learned or L3) for automatically creating an image processing pipeline. In this paper we describe the L3 algorithm and illustrate how we created a pipeline for a CFA organized as a 2×2 RGB/Wblock containing a clear
(W) pixel. Under low light conditions, the L3 pipeline developed for the RGB/W CFA produces images that are
superior to those from a matched Bayer RGB sensor. We also use L3 to learn pipelines for other RGB/W CFAs
with different spatial layouts. The L3 algorithm shortens the development time for producing a high quality
image pipeline for novel CFA designs.
A set of hyperspectral image data are made available, intended for use in modeling of imaging systems. The set contains
images of faces, landscapes and buildings. The data cover wavelengths from 0.4 to 2.5 micrometers, spanning the
visible, NIR and SWIR electromagnetic spectral ranges. The images have been recorded with two HySpex line-scan
imaging spectrometers covering the spectral ranges 0.4 to 1 micrometers and 1 to 2.5 micrometers. The hyperspectral
data set includes measured illuminants and software for converting the radiance data to estimated reflectance. The
images are being made available for download at http://scien.stanford.edu
Computer simulations have played an important role in the design and evaluation of imaging sensors with applications in
remote sensing and consumer photography. In this paper, we provide an example of computer simulations used
to guide the design of imaging sensors for a biomedical application: We consider how sensor design, illumination,
measurement geometry, and skin type influence the ability to detect blood oxygen saturation from non-invasive
measurements of skin reflectance. The methodology we describe in this paper can be used to design, simulate and
evaluate the design of other biomedical imaging systems.
The availability of multispectral scene data makes it possible to simulate a complete imaging pipeline for digital
cameras, beginning with a physically accurate radiometric description of the original scene followed by optical
transformations to irradiance signals, models for sensor transduction, and image processing for display. Certain scenes
with animate subjects, e.g., humans, pets, etc., are of particular interest to consumer camera manufacturers because of
their ubiquity in common images, and the importance of maintaining colorimetric fidelity for skin. Typical multispectral
acquisition methods rely on techniques that use multiple acquisitions of a scene with a number of different optical filters
or illuminants. Such schemes require long acquisition times and are best suited for static scenes. In scenes where animate
objects are present, movement leads to problems with registration and methods with shorter acquisition times are
needed. To address the need for shorter image acquisition times, we developed a multispectral imaging system that
captures multiple acquisitions during a rapid sequence of differently colored LED lights. In this paper, we describe the
design of the LED-based lighting system and report results of our experiments capturing scenes with human subjects.
In this study, we evaluated the effect that pixel size has upon people's preferences for images. We used multispectral
images of faces as the scene data and simulated the response from sensors with different pixel size while the other
sensor parameters were kept constant. Subjects were asked to choose between pairs of images; we found that
preference judgments were primarily influenced by the visibility of uncorrelated noise in the images. We used the SCIELAB
metric (ΔE) to predict the visibility of the uncorrelated image noise. The S-CIELAB difference between a
test image and an ideal reference image was monotonically related to the preference score.
We introduce a new metric, the visible signal-to-noise ratio (vSNR), to analyze how pixel-binning and resizing methods
influence noise visibility in uniform areas of an image. The vSNR is the inverse of the standard deviation of the SCIELAB
representation of a uniform field; its units are 1/ΔE. The vSNR metric can be used in simulations to predict
how imaging system components affect noise visibility. We use simulations to evaluate two image rendering methods:
pixel binning and digital resizing. We show that vSNR increases with scene luminance, pixel size and viewing distance
and decreases with read noise. Under low illumination conditions and for pixels with relatively high read noise, images
generated with the binning method have less noise (high vSNR) than resized images. The binning method has
noticeably lower spatial resolution. The binning method reduces demands on the ADC rate and channel throughput.
When comparing binning and resizing, there is an image quality tradeoff between noise and blur. Depending on the
application users may prefer one error over another.
As the number of imaging pixels in camera phones increases, users expect camera phone image quality to be comparable to digital still cameras. The mobile imaging industry is aware, however, that simply packing more pixels into the very limited camera module size need not improve image quality. When the size of a sensor array is fixed, increasing the number of imaging pixels decreases pixel size and thus photon count. Attempts to compensate for the reduction in light sensitivity by increasing exposure durations increase the amount of handheld camera motion blur which effectively reduces spatial resolution. Perversely, what started as an attempt to increase spatial resolution by increasing the number of imaging pixels, may result in a reduction of effective spatial resolution. In this paper, we evaluate how the performance of mobile imaging systems changes with shrinking pixel size, and we propose to replace the widely misused "physical pixel count" with a new metric that we refer to as the "effective pixel count" (EPC). We use this new metric to analyze design tradeoffs for four different pixel sizes (2.8um, 2.2um, 1.75um and 1.4um) and two different imaging arrays (1/3.2 and 1/8 inch). We show that optical diffraction and camera motion make 1.4 um pixels less perceptually effective than larger pixels and that this problem is exacerbated by the introduction of zoom optics. Image stabilization optics can increase the effective pixel count and are, therefore, important features to include in a mobile imaging system.
We describe a method for simulating the output of an image sensor to a broad array of test targets. The method uses a modest set of sensor calibration measurements to define the sensor parameters; these parameters are used by an integrated suite of Matlab software routines that simulate the sensor and create output images. We compare the simulations of specific targets to measured data for several different imaging sensors with very different imaging properties. The simulation captures the essential features of the images created by these different sensors. Finally, we show that by specifying the sensor properties the simulations can predict sensor performance to natural scenes that are difficult to measure with a laboratory apparatus, such as natural scenes with high dynamic range or low light levels.
Simulation of the imaging pipeline is an important tool for the design and evaluation of imaging systems. One of
the most important requirements for an accurate simulation tool is the availability of high quality source scenes.
The dynamic range of images depends on multiple elements in the imaging pipeline including the sensor, digital
signal processor, display device, etc. High dynamic range (HDR) scene spectral information is critical for an
accurate analysis of the effect of these elements on the dynamic range of the displayed image. Also, typical digital
imaging sensors are sensitive well beyond the visible range of wavelengths. Spectral information with support
across the sensitivity range of the imaging sensor is required for the analysis and design of imaging pipeline
elements that are affected by IR energy. Although HDR scene data information with visible and infrared content
are available from remote sensing resources, there are scarcity of such imagery representing more conventional
everyday scenes. In this paper, we address both these issues and present a method to generate a database of
HDR images that represent radiance fields in the visible and near-IR range of the spectrum. The proposed
method only uses conventional consumer-grade equipment and is very cost-effective.
When the size of a CMOS imaging sensor array is fixed, the only way to increase sampling density and spatial resolution is to reduce pixel size. But reducing pixel size reduces the light sensitivity. Hence, under these constraints, there is a tradeoff between spatial resolution and light sensitivity. Because this tradeoff involves the interaction of many different system components, we used a full system simulation to characterize performance. This paper describes system simulations that predict the output of imaging sensors with the same dye size but different pixel sizes and presents metrics that quantify the spatial resolution and light sensitivity for these different imaging sensors.
In many imaging applications, there is a tradeoff between sensor spatial resolution and dynamic range. Increasing sampling density by reducing pixel size decreases the number of photons each pixel can capture before saturation. Hence, imagers with small pixels operate at levels where photon noise limits image quality. To understand the impact of these noise sources on image quality we conducted a series of psychophysical experiments. The data revealed two general principles. First, the luminance amplitude of the noise standard deviation predicts threshold, independent of color. Second, this threshold is 3-5% of the mean background luminance across a wide range of background luminance levels (ranging from 8 cd/m2 to 5594 cd/m2). The relatively constant noise threshold across a wide range of conditions has specific implications for the imaging sensor design and image process pipeline. An ideal image capture device, limited only by photon noise, must capture at least 1000 photons/pixel (1/sqrt(103) ~= 3%) to render photon noise invisible. The ideal capture device should also be able to achieve this SNR or higher across the whole dynamic range.
The Image Systems Evaluation Toolkit (ISET) is an integrated suite of software routines that simulate the capture and processing of visual scenes. ISET includes a graphical user interface (GUI) for users to control the physical characteristics of the scene and many parameters of the optics, sensor electronics and image processing-pipeline. ISET also includes color tools and metrics based on international standards (chromaticity coordinates, CIELAB and others) that assist the engineer in evaluating the color accuracy and quality of the rendered image.
The commercial success of color sequential displays is limited by the
fact that people perceive multiple color images during pursuit and
saccadic eye movements. We conducted a psychophysical experiment to
quantify visibility of these color artifacts for different
saccadic speeds, display background brightness, and target size. An Infocus sequential-color projector was placed behind a projection screen to simulate a normal desktop display. Saccadic eye movements were induced by requiring subjects to recognize text targets displayed at two different screen locations in rapid succession. The speed of saccadic movements was varied by manipulating the distance between the two target locations. A white bar, either with or without a yellow and red color fringe on the right edge, was displayed as subjects moved their eyes for the text recognition task. The two versions of the white bar will not be distinguishable if color break-up is present, thus performance of this task can be used as a measure of color break-up. The visibility of sequential color breakup decreases with background intensity and size of the white target, and increases with saccadic speed.
When rendering photographs, it is important to preserve the gray tones despite variations in the ambient illumination. When the illuminant is known, white balancing that preserves gray tones can be performed in many different color spaces; the choice of color space influences the renderings of other colors. In this behavioral study, we ask whether users have a preference for the color space where white balancing is performed. Subjects compared images using a white balancing transformation that preserved gray tones, but the transformation was applied in one of the four different color spaces: XYZ, Bradford, a camera sensor RGB and the sharpened RGB color space. We used six scenes types (four portraits, fruit, and toys) acquired under three calibrated illumination environments (fluorescent, tungsten, and flash). For all subjects, transformations applied in XYZ and sharpened RGB were preferred to those applied in Bradford and device color space.
This paper summarizes the results of a visual psychophysical investigation of the relationship between two important printer parameters: addressability (expressed in terms of dots per inch or DPI) and grayscale capability (expressed in terms of the number of graylevels per pixel). The photographic image quality of print output increases with both the printer DPI and the number of graylevels per pixel. The experiments described in this paper address the following questions: At what point is there no longer a perceptual advantage of DPI or graylevels, and how do these two parameters tradeoff?
We describe computational experiments to predict the perceived quality of multilevel halftone images. Our computations were based on a spatial color difference metric, S-CIELAB, that is an extension of CIELAB, a widely used industry standard. CIELAB predicts the discriminability of large uniform color patches. S-CIELAB includes a pre- processing stage that accounts for certain aspects of the spatial sensitivity to different colors. From simulations applied to multilevel halftone images, we found that (a) for grayscale image, L-spacing of the halftone levels results in better halftone quality than linear-spacing of the levels; (b) for color images, increasing the number of halftone levels for magenta ink results in the most significant improvement in halftone quality. Increasing the number of halftone levels of the yellow ink resulted in the least improvement.
This paper describes the performance of an image capture simulator. The general model underlying the simulator assumes that (a) the image capture device contains multiple classes of sensors with different spectral sensitivities and (b) that each sensor responds linearly to light intensity over most of its operating range.We place no restrictions on the number of sensor classes,their spectral sensitivities, or their spatial arrangement. The input to the simulator is a set of narrow-band images of the scene taken with a custom-designed hyperspectral camera system. The parameters for the simulator are the number of sensor classes, the sensor spectral sensitivities, the noise statistics and number of quantization levels for each sensor class, the spatial arrangement of the sensors, and the exposure direction. The output of the simulator is the ray image data that would have ben acquired by the simulated image capture device. To test the simulator, we acquired images of the same scene both with our hyperspectral camera and with a calibrated Kodak DCS-200 digital color camera. We used the simulator to predict the DCS-200 output from the hyperspectral data. The agreement between simulated and acquired images validated the image capture response model, the spectral calibrations, and our simulator implementation. We believe the simulator will provide a useful tool for understanding the effect of varying the design parameters of an image capture device.
GPA is an expression that describes how the number of dots/inch and the number of graylevels/dot tradeoff in determining the number of graylevels per area (GPA). The metric is based on the assumption that anything that falls within a visual angle of approximately 3 minutes of arc will be spatially-integrated by the optical blur properties of our eyes.
A simple method of converting scanner (RGB) responses to estimates of object tristimulus (XYZ) coordinates is to apply a linear transformation to the RGB values. The transformation parameters are selected subject to minimization of some relevant error measure. While the linear method is easy, it can be quite imprecise. Linear methods are only guaranteed to work when the scanner sensor responsivities are within a linear transformation of the human color- matching functions. In studying the linear transformation methods, we have observed that the error distribution between the true and estimated XYZ values is often quite regular: plotted in tristimulus coordinates, the error cloud is a highly eccentric ellipse, often nearly a line. We will show that this observation is expected when the collection of surface reflectance functions is well-described by a low-dimensional linear model, as is often the case in practice. We will discuss the implications of our observation for scanner design and for color correction algorithms that encourage operator intervention.
We describe a Iinearscannermodelthatprovides a useful
characterization of the response of a scanner to diffusely reflecting surfaces. We show how the linear model can be used to estimate that portion of the scanner sensor responsivities that fall within the
linear space spanned by the input signals. We also describe how the model can be extended to characterize a scanner's response to surfaces that fluoresce under the scanner illuminant.
SC762: Device Simulation for Image Quality Evaluation
Customers judge the image quality of a digital camera by viewing the final rendered output. Achieving a high quality output depends on the multiple system components, including the optical system, imaging sensor, image processor and display device. Consequently, analyzing components singly, without reference to the characteristics of the other components, provides only a limited view of the system performance. An integrated simulation environment, that models the entire imaging pipeline, is a useful tool that improves understanding and guides design.
This course will introduce computational models to simulate the scene, optics, sensor, processor, display, and human observer. Example simulations of calibrated devices and imaging algorithms will be used to clarify how specific system components influence the perceived quality of the final output.