In this paper, we propose an eye tracking and gaze estimation system for mobile phone. We integrate an eye detector,
cornereye center and iso-center to improve pupil detection. The optical flow information is used for eye tracking. We
develop a robust eye tracking system that integrates eye detection and optical-flow based image tracking. In addition, we
further incorporate the orientation sensor information from the mobile phone to improve the eye tracking for accurate
gaze estimation. We demonstrate the accuracy of the proposed eye tracking and gaze estimation system through
experiments on some public video sequences as well as videos acquired directly from mobile phone.
Magnetic resonance (MR) imaging is frequently used to diagnose abnormalities in the spinal intervertebral discs. Owing to the non-isotropic resolution of typical MR spinal scans, physicians prefer to align the scanner plane with the disc in order to maximize the diagnostic value and to facilitate comparison with prior and follow-up studies. Commonly a planning scan is acquired of the whole spine, followed by a diagnostic scan aligned with selected discs of interest. Manual determination of the optimal disc plane is tedious and prone to operator variation. A fast and accurate method to automatically determine the disc alignment can decrease examination time and increase the reliability of diagnosis. We present a validation study of an automatic spine alignment system for determining the orientation of intervertebral discs in MR studies. In order to measure the effectiveness of the automatic alignment system, we compared its performance with human observers. 12 MR spinal scans of adult spines were tested. Two observers independently indicated the intervertebral plane for each disc, and then repeated the procedure on another day, in order to determine the inter- and intra-observer variability associated with manual alignment. Results were also collected for the observers utilizing the automatic spine alignment system, in order to determine the method's consistency and its accuracy with respect to human observers. We found that the results from the automatic alignment system are comparable with the alignment determined by human observers, with the computer showing greater speed and consistency.
In this paper, we present a printed circuit board (PCB) inspection system based on using Hausdorff distance for image alignment and defect detection. In addition, we apply support vector machine (SVM) for the defect classification and the metal classification in this system. The three major components in the proposed PCB inspection system consist of image alignment, defect detection, and defect classification. In image alignment, a coarse-to-fine search technique is applied to accelerate the speed of finding the minimal Hausdorff distance between the reference and the inspection images. For defect detection, we calculate the Hausdorff distance of every pixel in the inspection image as the first step and compare the result with a predefined threshold. For the cases where the computed Hausdorff distance is greater than the threshold, the location of that pixel is labeled as a defect suspect. The existence of defect then can be confirmed by merging the nearby suspects into one object. For defect classification, the local image features are extracted and passed to support vector machine for training and identifying defect types. In this work, we focus on distinguishing the type of a defect as one of open, short, pinhole, over-etch, or under-etch types. Support vector machine can be applied to metal classification as well. At the current stage, we supply support vector machine with RGB color information as the feature vector for metal classification. Experimental results show that the Hausdorff distance based method detects defects in a printed circuit board efficiently and accurately, and the support vector machine approach also gives satisfactory results for both defect and metal classifications.
In this paper, we propose a coarse-to-fine image comparison algorithm based on Hausdorff distance for PCB inspection. The Hausdorff distance can be used in a geometrics-based inspection framework for comparing binary edge maps extracted from the inspection images. To use the Hausdorff distance for image alignment, we need to compute the edge map from the input image as the first step. In some cases, one may use directed Hausdorff distance as a similarity measure in order to reduce the computational cost during the image alignment. Moreover, a modified version of directed Hausdorff distance is employed to enforce robustness against random noises introduced by edge detection. The search for the optimal alignment by minimizing the associated Hausdorff distance is accomplished by an efficient multi-resolutional downhill simplex search algorithm. In addition to the image alignment, we also apply a modified Hausdorff distance to detect defects in PCB. In our inspection system, we apply the partial Hausdorff distance in a local circuit window to reduce the inspection area dramatically, thus making it very efficient for PCB inspection. Experimental results on some PCB inspection examples are shown to demonstrate the accuracy and efficiency of the proposed Hausdorff-distance based inspection system.
Three-dimensional digital preservation of historical treasure has become a major focus of research in computer vision and graphics recently. It possesses the advantages of invariant preservation, remote display, ease of browsing and study, 3D model copy, etc. It is particularly important for digital library systems that have been successfully established in many countries. Furthermore, there have been some pioneering researches on preserving cultural and historical relics, e.g. famous pictures, stone carvings, and well-known architectures and landscape. There are many priceless Chinese treasures of jadeite material, but existing 3D scanning techniques are unable to be applied to such curio because of the semi-transparent and reflective material properties as well as the safety consideration. In this paper, we present a novel semi-automatic system to reconstruct three-dimensional models of jadeite material from image sequences. There are two major challenges from the 3D model reconstruction for treasures of jadeite material from uncalibrated image sequences. The first is the semi-diaphaneity as well as the highly specular property of jadeite materials and the other is the unknown camera information from given image sequences, including intrinsic (calibration) and extrinsic (position and orientation) parameters.
The proposed modeling process first recovers the camera information and the rough structure through a structure from motion algorithm, and then further extracts the fine details of the model from dense correspondences between image patches. We have developed three techniques for this challenging task, including structure from motion algorithm, image registration, and dense depth computation
First of all, for the highly specular material, we manually select some corresponding feature points between adjacent images, because it is very difficult to reliably establish the correspondences from the image sequences. These established correspondences supply the information to recover the camera parameters and the initial guess for the dense matching of the image patches. For the structure from motion algorithm, it consists of two steps; the first step is the projective reconstruction and the second step is the self-calibration and the metric update. Considering the high feature missing rate due to the highly specular material property, we proposed a robust method for projective reconstruction to recover the missing points, which greatly reduces the traditional error accumulation problem. The self-calibration and metric update process makes use of the image acquisition assumptions to obtain the camera parameters. It iteratively performs the following two steps; the first is the closed-form solution from the linear constraints on camera calibration matrix based on absolute conic, and the second is an optimization process to fit the nonlinear constraints. Then the obtained solution offers the initial guess for the strategic bundle adjustment algorithm. As to the image registration, existing techniques failed due to the complex lighting effect on jadeite material. By including the brightness variation factors into the model and considering the reflective highlight effect, we developed a novel optical flow computation technique to reliably compute the dense matching through the image patches. Based on the extracted camera information and registered image patches, the dense depth information of the jadeite object can be computed. And we can refine the original rough model by the supplied dense depth information through subdivision and adaptation of the 3D rough mesh model. Finally, experimental results of 3D model reconstruction from the image sequence of the Chinese treasure, Jadeite Cabbage with Insects, are shown to demonstrate the performance of the developed system.
In this paper, we present a 3D object reconstruction system that recovers 3D models of general objects from video. We assume the video of the object is captured from multiple viewpoints. The proposed system is composed of the following components: feature trajectory extraction, 3D structure from motion, surface reconstruction, and texture computation. In the feature trajectory extraction, we compute dense optical flow fields between adjacent frames and aggregate them at the interest points to obtain reliable feature trajectories. In the next structure from motion stage, we develop a robust algorithm to recover the dense 3D structures from several viewpoints for uncalibrated image sequences. For the surface reconstruction from the recovered 3D data points, we develop a new cluster-based radial-basis-function (RBF) algorithm, which overcomes the extensive computational cost limit in a divide-and-conquer manner. For the last texture computation process, we combine multi-view images to form the texture map of the 3D object model. Finally, experimental results are given to show the performance or the proposed 3D reconstruction system.
Camera motion estimation is very important for indexing and retrieving video information. In this paper, we propose a robust camera motion estimation and classification algorithm. Our camera motion estimation algorithm consists of optical flow estimation, iterative RANSAC (RANdom SAmple Consensus) multiple motion estimation, and long-term camera motion estimation through a shortest-path search. In this approach, we first estimate multiple global affine motions from the computed optical flow field for every frame in the video sequence. Then, the long-term camera motion is determined from searching a shortest path in a graph of cascaded nodes of global motions. After the camera motion is determined for the whole video, we apply an artificial neural network to classify the camera motion type. This neural network is trained from a large set of different types of camera motion data. We show accurate camera motion classification results through experiments on real videos.
Statistical modeling of signal/image data has been used extensively for recognition and estimation. Principal component analysis was very popular for the statistical signal modeling and analysis. In this paper, we present a system to build a 3D statistical head model from incomplete data. In this system, we first transformed the 3D head scan data points into a cylindrical coordinate to obtain 2D surface maps. After these 2D surface maps were aligned, we computed the associated mean vector and covariance matrix. Then, the principal component analysis technique was applied to compute the principal components and the corresponding eigenvalues of the covariance matrix. Experimental results are given to show the 3D head shape variations from the computed 3D statistical model.
Video segmentation is fundamental to a number of applications related to video retrieval and analysis. Shot change detection is the initial step of video segmentation and indexing. There are two basic types of shot changes. One is the abrupt change or cut, and the other is the gradual shot transition. The smooth variations of the video feature values in a gradual transition produced by the editing effects are often confused with those caused by camera or object motions. To overcome this difficulty, it is reasonable to estimate the motions and suppress the disturbance caused by them. In this thesis, we explore the possibility to exploit motion and illumination estimation in a video sequence to detect both abrupt and gradual shot changes. A generalized optical flow constraint that includes an illumination parameter to model local illumination changes is employed in the motion and illumination estimation. An iterative process is used to refine the generalized optical flow constraints step by step. A robust measure that is the likelihood ratio of the corresponding motion-compensated blocks in the consecutive frames is used for detecting abrupt changes. For the detection of gradual shot transitions, we compute the average monotony of intensity variations on the stable pixels in the images in a twin-comparison framework. We test the proposed algorithm on a number of video sequences in TREC 2001 and compare the detection results with the best results reported in the TREC 2001 benchmark. The comparisons indicate that the proposed shot change detection algorithm is competitive against the best existing algorithms.
Temporal segmentation of a video sequence into different shots is fundamental to a number of video retrieval and analysis applications. Motion estimation has been wildly used in many applications of video processing, since it provides the most essential information for an image sequence. In this paper, we explore the possibility to exploit motion and illumination estimation in a video sequence to detect various types of shot changes. Optical flow is the motion vector computed at each pixel in an image sequence from intensity variation. Traditionally, optical flow computational algorithms were derived from the brightness constancy assumption. In this paper, we employ a generalized optical flow constraint that includes an illumination parameter to model local illumination changes. An iterative optical flow and illumination estimation algorithm is developed in this paper to refine the generalized optical flow constraints step by step, thus leading to a very accurate estimation of the optical flow and illumination parameters. Two robust measures are defined from the mean and standard deviation of the estimated intensity compensation values for all the blocks in the same image. Either of these two measures corresponds significantly to various types of shot changes. We show the usefulness of these two measures through experiments.
In this paper, we propose a system to reconstruct 3D face models from monocular image sequences. Our approach is based on adapting a generic 3D face model to a set of sparse 3D data points of face features recovered from a video sequence. In our system, the structure from motion is accomplished by using a robust least square minimization approach that is based on dynamically minimizing a weighted least square energy function. A small number of face feature points are selected and tracked along the video sequence. The face poses at all the frames in the sequence are approximated from a pose estimation process with the generic 3D face model. A structure from motion algorithm based on a robust least square minimization is applied to the entire video sequence to recover the face structure. The adaptation of the generic 3D head model to the recovered 3D face structure is achieved by using a radial basis function interpolation.Experimental results of 3D face model recovery using the proposed algorithm are shown.
Image alignment is the most crucial problem in industrial visual inspection. Traditional intensity-matching based methods, including the normalized correlation method, are not robust against non-uniform illumination variations. In this paper, we present a generalized intensity-based matching approach to accomplish accurate and robust image alignment under high-level noises and large non-uniform illumination variations. This generalization is an extension of our previous FLASH (Fast Localization with Advanced Search Hierarchy) algorithm for image alignment.
The image reference approach is very popular in industrial inspection due to its generality for different inspection tasks. Unfortunately, this approach is sensitive to illumination variations. A novel illumination compensation algorithm is proposed in this paper for correcting smooth intensity variations due to illumination changes. By using the proposed algorithm as a preprocessing step in the image reference based inspection or localization, we can make the image inspection or localization algorithm robust against spatially smooth illumination changes. This technique is very useful to achieve a reliable automated visual inspection system under different illumination conditions. The proposed illumination compensation algorithm is based on the assumption that the underlining image reflectance function is approximately piecewise constant and the image irradiance function is spatially smooth. Reliable gradient constraints on the smooth irradiance function are computed and selected from the image brightness function by using a local uniformity test. Two surface fitting algorithms are presented to recover the smooth image irradiance function from the selected reliable gradient constraints. One is a polynomial surface fitting algorithm and the other is a spline surface fitting algorithm. The spline surface fitting formulation leads to solving a large linear system, which is accomplished by an efficient preconditioned conjugate gradient algorithm. Once the image irradiance function is estimated, the spatial intensity inhomogeneities can be easily compensated. Some experimental results are shown to demonstrate the usefulness of the proposed algorithm.
The needs for accurate and efficient object localization prevail in many industrial applications, such as automated visual inspection and factory automation. Image reference approach is very popular in automatic visual inspection due to its general applicability to a variety of inspection tasks. However, it requires very precise alignment of the inspection pattern in the image. To achieve very precise pattern alignment, traditional template matching is extremely time-consuming when the search space is large. In this paper, we present a new FLASH (Fast Localization with Advanced Search Hierarchy) algorithm for fast and accurate object localization in a large search space. This object localization algorithm is very useful for applications in automated visual inspection and pick-and-place systems for automatic factory assembly. It is based on the assumption that the surrounding regions of the pattern within the search range are always fixed, which is valid for most industrial inspection applications. The FLASH algorithm comprises a hierarchical nearest-neighbor search algorithm and an optical-flow based energy minimization algorithm. The hierarchical nearest-neighbor search algorithm produces a rough estimate of the transformation parameters for the initial guess of the iterative optical-flow based energy minimization algorithm, which provides very accurate estimation results and associated confidence measures. Experimental results demonstrate the accuracy and efficiency of the proposed FLASH algorithm.
Blind image restoration is to recover the original images from the blurred images when the blurring function in the image formation process is unknown. In this paper, we present an efficient and practical blind image restoration algorithm based on total variational (TV) regularization. The TV regularization employs TV norm on the images for the smoothness constraint, while the traditional regularization uses H1 norm for the smoothness constraint. The TV regularization provides a larger functional space for the image functions and is known for allowing discontinuities in the image function to be recovered. The blur functions considered in this paper are combinations of a Gaussian defocus blur and a uniform motion blur, that each can be approximated by a parametric function of one or two parameters. The use of this parametric form intrinsically imposes a constraint on the blur function. The small number of parameters involved in the parametric blur function makes the resulting optimization problem tractable. The above formulation for the restoration from a single image is then extended to the blind restoration from an image sequence by introducing motion parameters into the multi-frame data constraints. An iterative alternating numerical algorithm is developed to solve the nonlinear optimization problems. Each iteration of the alternating numerical algorithm involves the Fourier preconditioned conjugate gradient iterations to update the restored image and quasi-Newton steps to update the blur and motion parameters. Some experimental results are shown to demonstrate the usefulness of our algorithm.
Proc. SPIE. 3337, Medical Imaging 1998: Physiology and Function from Multidimensional Images
KEYWORDS: Signal to noise ratio, Principal component analysis, Statistical analysis, Magnetic resonance imaging, Interference (communication), Data acquisition, Signal processing, Signal detection, Functional magnetic resonance imaging, Brain
A novel Local Principal Component Analysis (LPCA) technique is presented in this paper for activation detection in functional Magnetic Resonance Imaging (fMRI) without explicit knowledge about the shape of the activation signal. The proposed LPCA method is very different from the traditional PCA methods for fMRI signal detection in principle. At first, our LPCA algorithm does not require any orthogonality assumption between the activation signal and other signal components, while the traditional PCA methods are based on this assumption. In addition, our LPCA algorithm applies PCA to the temporal sequence of each individual voxel instead of applying PCA to the whole data set. In our algorithm, we first apply a linear regression procedure to alleviate the common baseline drift artifact. Then the baseline-corrected temporal signals are partitioned into active and inactive segments according to the paradigm used for the fMRI data acquisition. Several most dominant principal components are computed from all these segments for each voxel by the PCA. By projecting the segments of each voxel onto a linear subspace formed by the corresponding dominant principal components, two separate clusters are formed from the active and inactive segments. An activation measure is defined based on the degree of separation between these two clusters in the projection space. Experimental results of applying our LPCA algorithm to detect fMRI activation signals on various data sets are given. From our experiments, the LPCA algorithm in general provides 4 - 6 times signal-to-noise ratio (SNR) improvement over the standard t-test method.
The display of a 12-bit MR image on a common 8-bit computer monitor is usually achieved by linearly mapping the image values through a display window, which is determined by the width and center values. The adjustment of the display window for a variety of MR images involves considerable user interaction. In this paper, we present an advanced algorithm with the hierarchical neural network structure for robust and automatic adjustment of display window width and center for a wide range of MR images. This algorithm consists of a feature generator utilizing both histogram and spatial information computed from a MR image, a wavelet transform for compressing the feature vector, a competitive layer neural network for clustering MR images into different subclasses, a bi-modal linear estimator and an RBF (radial basis function) network based estimator for each subclass, as well as a data fusion process to integrate estimates from both estimators of different subclasses to compute the final display parameters. Both estimators can adapt a new types of MR images simply by training them with those images, thus making the algorithm adaptive and extendable. This trainability makes also possible for advanced future developments such as adaptation of the display parameters to user's personal preference. While the RBF neural network based estimators perform very well for images similar to those in the training data set, the bi-modal linear estimators provide reasonable estimation for a wide range of images that may not be included in the training data set. The data fusion step makes the final estimation of the display parameters accurate for trained images and robust for the unknown images. The algorithm has been tested on a wide range of MR images and shown satisfactory results. Although the proposed algorithm is very comprehensive, its execution time is kept within a reasonable range.