Error diffusion is one of the most widely used algorithms for halftoning gray scale and color images. It works by distributing the thresholding error of each pixel to unprocessed neighboring pixels, while maintaining the average value of the image. Error diffusion results in inter-pixel data dependencies that prohibit a simplistic data pipelining processing approach and increase the memory requirements of the system. In this paper, we present a multiprocessing approach to overcome these difficulties, which results in a novel architecture for high performance hardware implementation of error diffusion algorithms. The proposed architecture is scalable, flexible, cost effective, and may be adopted for processing gray scale or color images. The key idea in this approach is to simultaneously process pixels in separate rows and columns in a diagonal arrangement, so that data dependencies across processing elements are avoided. The processor was realized using an FPGA implementation and may be used for real-time image rendering in high-speed scanning or printing. The entire system runs at the input clock rate, allowing the performance to scale linearly with the clock rate. Higher data rate applications required by future applications will automatically be supported using more advanced high-speed FPGA technologies.
Color dropout refers to the process of converting color form documents to black and white by removing the colors that are part of the blank form and maintaining only the information entered in the form. In this paper, no prior knowledge of the form type is assumed. Color dropout is performed by associating darker non-dropout colors with information that is entered in the form and needs to be preserved. The color dropout filter parameters include the color values of the non-dropout colors, e.g. black and blue, the distance metric, e.g. Euclidian, and the tolerances allowed around these colors. Color dropout is accomplished by converting pixels that have color within the tolerance sphere of the non-dropout colors to black and all others to white. This approach lends itself to high-speed hardware implementation with low memory requirements, such as an FPGA platform. Processing may be performed in RGB or a Luminance-Chrominance space, such as YCbCr. The color space transformation from RGB to YCbCr involves a matrix multiplication and the dropout filter implementation is similar in both cases. Results for color dropout processing in both RGB and YCbCr space are presented.
We are developing a video see-through head-mounted display (HMD) augmented reality (AR) system for image-guided neurosurgical planning and navigation. The surgeon wears a HMD that presents him with the augmented stereo view. The HMD is custom fitted with two miniature color video cameras that capture a stereo view of the real-world scene. We are concentrating specifically at this point on cranial neurosurgery, so the images will be of the patient's head. A third video camera, operating in the near infrared, is also attached to the HMD and is used for head tracking. The pose (i.e., position and orientation) of the HMD is used to determine where to overlay anatomic structures segmented from preoperative tomographic images (e.g., CT, MR) on the intraoperative video images. Two SGI 540 Visual Workstation computers process the three video streams and render the augmented stereo views for display on the HMD. The AR system operates in real time at 30 frames/sec with a temporal latency of about three frames (100 ms) and zero relative lag between the virtual objects and the real-world scene. For an initial evaluation of the system, we created AR images using a head phantom with actual internal anatomic structures (segmented from CT and MR scans of a patient) realistically positioned inside the phantom. When using shaded renderings, many users had difficulty appreciating overlaid brain structures as being inside the head. When using wire frames, and texture-mapped dot patterns, most users correctly visualized brain anatomy as being internal and could generally appreciate spatial relationships among various objects. The 3D perception of these structures is based on both stereoscopic depth cues and kinetic depth cues, with the user looking at the head phantom from varying positions. The perception of the augmented visualization is natural and convincing. The brain structures appear rigidly anchored in the head, manifesting little or no apparent swimming or jitter. The initial evaluation of the system is encouraging, and we believe that AR visualization might become an important tool for image-guided neurosurgical planning and navigation.
Autonomous mobile robots rely on multiple sensors to perform a varied number of tasks in a given environment. Different tasks may need different sensors to estimate different subsets of world state. Also, different sensors can cooperate in discovering common subsets of world state. This paper presents a new approach to multimodal sensor fusion using dynamic Bayesian networks and an occupancy grid. The environment in which the robot operates is represented with an occupancy grid. This occupancy grid is asynchronously updated using probabilistic data obtained from multiple sensors and combined using Bayesian networks. Each cell in the occupancy grid stores multiple probability density functions representing combined evidence for the identity, location and properties of objects in the world. The occupancy grid also contains probabilistic representations for moving objects. Bayes nets allow information from one modality to provide cues for interpreting the output of sensors in other modalities. Establishing correlations or associations between sensor readings or interpretations leads to learning the conditional relationships between them. Thus bottoms-up, reflexive, or even accidentally-obtained information can provide tops-down cues for other sensing strategies. We present early results obtained for a mobile robot navigation task.
Selective perception sequentially collects evidence to support a specified hypothesis about a scene, as long as the additional evidence is worth the effort of obtaining it. Efficiency comes from selecting the best scene locations, resolution, and vision operators, where `best' is defined as some function of benefit and cost (typically, their ratio or difference). Selective vision implies knowledge about the scene domain and the imaging operators. We use Bayes nets for representation and benefit-cost analysis in a selective vision system with both visual and non-visual actions in real and simulated static and dynamic environments. We describe sensor fusion, dynamic scene, and multi-task applications.