Performance standards for industrial mobile robots and mobile manipulators (robot arms onboard mobile robots) have only recently begun development. Low cost and standardized measurement techniques are needed to characterize system performance, compare different systems, and to determine if recalibration is required. This paper discusses work at the National Institute of Standards and Technology (NIST) and within the ASTM Committee F45 on Driverless Automatic Guided Industrial Vehicles. This includes standards for both terminology, F45.91, and for navigation performance test methods, F45.02. The paper defines terms that are being considered. Additionally, the paper describes navigation test methods that are near ballot and docking test methods being designed for consideration within F45.02. This includes the use of low cost artifacts that can provide alternatives to using relatively expensive measurement systems.
This paper describes a concept for measuring the reproducible performance of mobile manipulators to be used for assembly or other similar tasks. An automatic guided vehicle with an onboard robot arm was programmed to repeatedly move to and stop at a novel, reconfigurable mobile manipulator artifact (RMMA), sense the RMMA, and detect targets on the RMMA. The manipulator moved a laser retroreflective sensor to detect small reflectors that can be reconfigured to measure various manipulator positions and orientations (poses). This paper describes calibration of a multi-camera, motion capture system using a 6 degree-of-freedom metrology bar and then using the camera system as a ground truth measurement device for validation of the reproducible mobile manipulator’s experiments and test method. Static performance measurement of a mobile manipulator using the RMMA has proved useful for relatively high tolerance pose estimation and other metrics that support standard test method development for indexed and dynamic mobile manipulator applications.
Low-cost 3D depth and range sensors are steadily becoming more widely available and affordable, and thus popular for
robotics enthusiasts. As basic research tools, however, their accuracy and performance are relatively unknown. In this
paper, we describe a framework for performance evaluation and measurement error analysis for 6 degrees of freedom
pose estimation systems using traceable ground truth instruments. Characterizing sensor drift and variance, and
quantifying range, spatial and angular accuracy, our framework focuses on artifact surface fitting and static pose analysis,
reporting testing and environmental conditions in compliance with the upcoming ASTM E57.02 standard.
Vast amounts of video footage are being continuously acquired by surveillance systems on private premises, commercial
properties, government compounds, and military installations. Facial recognition systems have the potential to identify
suspicious individuals on law enforcement watchlists, but accuracy is severely hampered by the low resolution of typical
surveillance footage and the far distance of suspects from the cameras. To improve accuracy, super-resolution can
enhance suspect details by utilizing a sequence of low resolution frames from the surveillance footage to reconstruct a
higher resolution image for input into the facial recognition system. This work measures the improvement of face
recognition with super-resolution in a realistic surveillance scenario. Low resolution and super-resolved query sets are
generated using a video database at different eye-to-eye distances corresponding to different distances of subjects from
the camera. Performance of a face recognition algorithm using the super-resolved and baseline query sets was calculated
by matching against galleries consisting of frontal mug shots. The results show that super-resolution improves
performance significantly at the examined mid and close ranges.
Flash laser detection and ranging (LADAR) systems are increasingly used in robotics applications
for autonomous navigation and obstacle avoidance. Their compact size, high frame rate, wide field
of view, and low cost are key advantages over traditional scanning LADAR devices. However,
these benefits are achieved at the cost of spatial resolution. Super-resolution enhancement can be
applied to improve the resolution of flash LADAR devices, making them ideal for small robotics
applications. Previous work by Rosenbush et al. applied the super-resolution algorithm of
Vandewalle et al. to flash LADAR data, and observed quantitative improvement in image quality in
terms of number of edges detected. This study uses the super-resolution algorithm of Young et al. to
enhance the resolution of range data acquired with a SwissRanger SR-3000 flash LADAR camera.
To improve the accuracy of sub-pixel shift estimation, a wavelet preprocessing stage was developed
and applied to flash LADAR imagery. The authors used the triangle orientation discrimination
(TOD) methodology for a subjective evaluation of the performance improvement (measured in terms
of probability of target discrimination and subject response times) achieved with super-resolution.
Super-resolution of flash LADAR imagery resulted in superior probabilities of target discrimination
at the all investigated ranges while reducing subject response times.
Flash LADAR systems are becoming increasingly popular for robotics applications. However, they generally
provide a low-resolution range image because of the limited number of pixels available on the focal plane array.
In this paper, the application of image super-resolution algorithms to improve the resolution of range data is
examined. Super-resolution algorithms are compared for their use on range data and the frequency-domain
method is selected. Four low-resolution range images which are slightly shifted and rotated from the reference
image are registered using Fourier transform properties and the super-resolution image is built using non-uniform
interpolation. Image super-resolution algorithms are typically rated subjectively based on the perceived visual
quality of their results. In this work, quantitative methods for evaluating the performance of these algorithms
on range data are developed. Edge detection in the range data is used as a benchmark of the data improvement
provided by super-resolution. The results show that super-resolution of range data provides the same advantage
as image super-resolution, namely increased image fidelity.
The National Institute of Standards and Technology (NIST) has been studying pallet visualization for the automated guided vehicle (AGV) industry. Through a cooperative research and development agreement with Transbotics, an AGV manufacturer, NIST has developed advanced sensor processing and world modeling algorithms to verify pallet location and orientation with respect to the AGV. Sensor processing utilizes two onboard AGV, single scan-line, laser-range units. The "Safety" sensor is a safety unit located at the base of a forktruck AGV and the "Panner" sensor is a panning laser-ranger rotated 90 degrees, mounted on a rotating motor, and mounted at the top, front of the AGV. The Safety sensor, typically used to detect obstacles such as humans, was also used to detect pallets and their surrounding area such as the walls of a truck to be loaded with pallets. The Panner, was used to acquire many scan-lines of range data which was processed into a 3D point cloud and segment out the pallet by a priori, approximate pallet load or remaining truck volumes. A world model was then constructed and output to the vehicle for pallet/truck volume verification. This paper will explain this joint government/industry project and results of using LADAR imaging methods.
As unmanned ground vehicles take on more and more intelligent tasks, determination of potential obstacles and accurate estimation of their position become critical for successful navigation and path planning. The performance analysis of obstacle mapping and unmanned vehicle positioning in outdoor environments is the subject of this paper. Recently, the National Institute of Standards and Technology's (NIST) Intelligent Systems Division has been a part of the Defense Advanced Research Project Agency LAGR (Learning Applied to Ground Robots) Program. NIST's objective for the LAGR Project is to insert learning algorithms into the modules that make up the NIST 4D/RCS (Four Dimensional/Real-Time Control System) standard reference model architecture which has been successfully applied to many intelligent systems. We detail world modeling techniques used in the 4D/RCS architecture and then analyze the high precision maps generated by the vehicle world modeling algorithms as compared to ground truth obtained from an independent differential GPS system operable throughout most of the NIST campus. This work has implications, not only for outdoor vehicles but also, for indoor automated guided vehicles where future systems will have more and more onboard intelligence requiring non-contact sensors to provide accurate vehicle and object positioning.
This paper describes and evaluates a vision system that accurately segments unstructured, non-homogeneous roads of arbitrary shape under various lighting conditions. The idea behind the road following algorithm is the segmentation of road from background through the use of color models. Data are collected from a video camera mounted on a moving vehicle. In each frame, color models of the road and background are constructed. The color models are used to calculate the probability that each pixel in a frame is a member of the road class. Temporal fusion of these road probabilities helps to stabilize the models, resulting in a probability map that can be thresholded to determine areas of road and non-road. Performance evaluation follows the approach described in Hong et al1. We evaluate the algorithm's performance with annotated frames of video data. This allows us to compute the false positive and false negative ratios. False positives refer to non-road areas in the image that were classified by the system as road, while false negatives refer to road areas classified as non-road. We use the sum of false positives and false negatives as an overall classification error calculated for each frame of the video sequence. After the error is calculated for each frame, we determine the statistics of the classification error throughout the whole video sequence. The overall classification error per frame allows us to compare the performance of several algorithms on the same frame, and we can analyze the overall performance of individual algorithms using their classification statistics.
The performance evaluation of an obstacle detection and segmentation algorithm for Automated Guided Vehicle (AGV) navigation in factory-like environments using a new 3D real-time range camera is the subject of this paper. Our approach expands on the US ASME B56.5 Safety Standard, which now allows for non-contact safety sensors, by performing tests on objects specifically sized in both the US and the British Safety Standards. These successful tests placed the recommended, as well as smaller, material-covered and sized objects on the vehicle path for static measurement. The segmented (mapped) obstacles were then verified in range to the objects and object size using simultaneous, absolute measurements obtained using a relatively accurate 2D scanning laser rangefinder. These 3D range cameras are expected to be relatively inexpensive and used indoors and possibly used outdoors for a vast amount of mobile robot applications building on experimental results explained in this paper.
We describe a methodology for evaluating algorithms to provide quantitative information about how well road detection and road following algorithms perform. The approach relies on generating a set of standard data sets annotated with ground truth. We evaluate the algorithms used to detect roads by comparing the output of the algorithms with ground truth, which we obtain by having humans annotate the data sets used to test the algorithms. Ground truth annotations are acquired from more than one person to reduce systematic errors. Results are quantified by looking at false positive and false negative regions of the image sequences when compared with the ground truth. We describe the evaluation of a number of variants of a road detection system based on neural networks.
Progress in algorithm development and transfer of results to practical applications such as military robotics requires the setup of standard tasks, of standard qualitative and quantitative measurements for performance evaluation and validation. Although the evaluation and validation of algorithms have been discussed for over a decade, the research community still faces a lack of well-defined and standardized methodology. The range of fundamental problems include a lack of quantifiable measures of performance, a lack of data from state-of-the-art sensors in calibrated real-world environments, and a lack of facilities for conducting realistic experiments. In this research, we propose three methods for creating ground truth databases and benchmarks using multiple sensors. The databases and benchmarks will provide researchers with high quality data from suites of sensors operating in complex environments representing real problems of great relevance to the development of autonomous driving systems. At NIST, we have prototyped a High Mobility Multi-purpose Wheeled Vehicle (HMMWV) system with a suite of sensors including a Riegl ladar, GDRS ladar, stereo CCD, several color cameras, Global Position System (GPS), Inertial Navigation System (INS), pan/tilt encoders, and odometry . All sensors are calibrated with respect to each other in space and time. This allows a database of features and terrain elevation to be built. Ground truth for each sensor can then be extracted from the database. The main goal of this research is to provide ground truth databases for researchers and engineers to evaluate algorithms for effectiveness, efficiency, reliability, and robustness, thus advancing the development of algorithms.
We describe a project to collect and disseminate sensor data for autonomous mobility research. Our goals are to provide data of known accuracy and precision to researchers and developers to enable algorithms to be developed using realistically difficult sensory data. This enables quantitative comparisons of algorithms by running them on the same data, allows groups that lack equipment to participate in mobility research, and speeds technology transfer by providing industry with metrics for comparing algorithm performance. Data are collected using the NIST High Mobility Multi-purpose Wheeled Vehicle (HMMWV), an instrumented vehicle that can be driven manually or autonomously both on roads and off. The vehicle can mount multiple sensors and provides highly accurate position and orientation information as data are collected. The sensors on the HMMWV include an imaging ladar, a color camera, color stereo, and inertial navigation (INS) and Global Positioning System (GPS). Also available are a high-resolution scanning ladar, a line-scan ladar, and a multi-camera panoramic sensor. The sensors are characterized by collecting data from calibrated courses containing known objects. For some of the data, ground truth will be collected from site surveys. Access to the data is through a web-based query interface. Additional information stored with the sensor data includes navigation and timing data, sensor to vehicle coordinate transformations for each sensor, and sensor calibration information. Several sets of data have already been collected and the web query interface has been developed. Data collection is an ongoing process, and where appropriate, NIST will work with other groups to collect data for specific applications using third-party sensors.
A robotic vehicle needs to understand the terrain and features around it if it is to be able to navigate complex environments such as road systems. By taking advantage of the fact that such vehicles also need accurate knowledge of their own location and orientation, we have developed a sensing and object recognition system based on information about the area where the vehicle is expected to operate. The information is collected through aerial surveys, from maps, and by previous traverses of the terrain by the vehicle. It takes the form of terrain elevation information, feature information (roads, road signs, trees, ponds, fences, etc.) and constraint information (e.g., one-way streets). We have implemented such an a priori database using One Semi-Automated Forces (OneSAF), a military simulation environment. Using the Inertial Navigation System and Global Positioning System (GPS) on the NIST High Mobility Multi-purpose Wheeled Vehicle (HMMWV) to provide indexing into the database, we extract all the elevation and feature information for a region surrounding the vehicle as it moves about the NIST campus. This information has also been mapped into the sensor coordinate systems. For example, processing the information from an imaging Laser Detection And Ranging (LADAR) that scans a region in front of the vehicle has been greatly simplified by generating a prediction image by scanning the corresponding region in the a priori model. This allows the system to focus the search for a particular feature in a small region around where the a priori information predicts it will appear. It also permits immediate identification of features that match the expectations. Results indicate that this processing can be performed in real time.
As part of the Army's Demo III project, a sensor-based system has been developed to identify roads and to enable a mobile robot to drive along them. A ladar sensor, which produces range images, and a color camera are used in conjunction to locate the road surface and its boundaries. Sensing is used to constantly update an internal world model of the road surface. The world model is used to predict the future position of the road and to focus the attention of the sensors on the relevant regions in their respective images. The world model also determines the most suitable algorithm for locating and tracking road features in the images based on the current task and sensing information. The planner uses information from the world model to determine the best path for the vehicle along the road. Several different algorithms have been developed and tested on a diverse set of road sequences. The road types include some paved roads with lanes, but most of the sequences are of unpaved roads, including dirt and gravel roads. The algorithms compute various features of the road images including smoothness in the world model map and in the range domain, and color features and texture in the color domain. Performance in road detection and tracking are described and examples are shown of the system in action.
This paper describes a world model that combines a variety of sensed inputs and a priori information and is used to generate on-road and off-road autonomous driving behaviors. The system is designed in accordance with the principles of the 4D/RCS architecture. The world model is hierarchical, with the resolution and scope at each level designed to minimize computational resource requirements and to support planning functions for that level of the control hierarchy. The sensory processing system that populates the world model fuses inputs from multiple sensors and extracts feature information, such as terrain elevation, cover, road edges, and obstacles. Feature information from digital maps, such as road networks, elevation, and hydrology, is also incorporated into this rich world model. The various features are maintained in different layers that are registered together to provide maximum flexibility in generation of vehicle plans depending on mission requirements. The paper includes discussion of how the maps are built and how the objects and features of the world are represented. Functions for maintaining the world model are discussed. The world model described herein is being developed for the Army Research Laboratory's Demo III Autonomous Scout Vehicle experiment.
This paper describes a real-time hierarchical system that fuses data from vision and touch sensors to improve the performance of a coordinate measuring machine (CMM) used for dimensional inspection tasks. The system consists of sensory processing, world modeling, and task decomposition modules. It uses the strengths of each sensor -- the precision of the CMM scales and the analog touch probe and the global information provided by the low resolution camera -- to improve the speed and flexibility of the inspection task. In the experiment described, the vision module performs all computations in image coordinate space. The part's boundaries are extracted during an initialization process and then the probe's position is continuously updated as it scans and measures the part surface. The system fuses the estimated probe velocity and distance to the part boundary in image coordinates with the estimated velocity and probe position provided by the CMM controller. The fused information provides feedback to the monitor controller as it guides the touch probe to scan the part. We also discuss integrating information from the vision system and the probe to autonomously collect data for 2-D to 3-D calibration, and work to register computer aided design (CAD) models with images of parts in the workplace.
It is a challenge to develop autonomous vehicles capable of operating in complicated, unpredictable, and hazardous environments. To navigate autonomous vehicles safely, obstacles such as protrusions, depressions, and steep terrains must be discriminated from terrain before any path planning and obstacle avoidance activity is undertaken. A purposive and direct solution to obstacle detection for safe navigation has been developed. The method finds obstacles in a 2-D image-based space, as opposed to a 3-D reconstructed space, using optical flow. The theory derives from new visual linear invariants based on optical flow. Employing the linear invariance property, obstacles can be directly detected by using a reference flow line obtained from measured optical flow. The main features of this approach are that (1) 2-D visual information (i.e., optical flow) is directly used to detect obstacles; no range, 3-D motion, or 3-D scene geometry is recovered; (2) the method finding protrusions and depressions is valid for the vehicle (or camera) undergoing general motion (both translation and rotation); (3) the error sources involved are reduced to a minimum, since the only information required is one component of optical flow. Experiments using both synthetic and real image data suggest that the approach is effective and robust. The method is demonstrated on both ground and air vehicles.
For many applications in computer vision, it is important to recover range, 3-D motion, and/or scene geometry from a sequence of images. However, there are many robot behaviors which can be achieved by extracting relevant 2-D information from the imagery and using this information directly, without recovery of such information. In this paper, we focus on two behaviors, obstacle avoidance and terrain navigation. A novel method of these two behaviors has been developed without 3-D reconstruction. This approach is often called purposive active vision. A linear relationship, plotted as a line and called a reference flow line, has been found. The difference between a plotted line and the reference flow line can be used to detect discrete obstacles above or below the reference terrain. For terrain characterization, slopes of surface regions can be calculated directly from optical flow. Some error analysis is also done. The main features of this approach are that (1) discrete obstacles are detected directly from 2-D optical flow, no 3-D reconstruction is performed; (2) terrain slopes are also calculated from 2- D optical flow; (3) knowledge about the terrain model, camera-to-ground coordinate transformation, or vehicle (or camera) motion is not required; (4) the error sources involved are reduced to a minimum, since the only information required is a component of optical flow. An initial experiment using noisy synthetic data is also included to demonstrate the applicability and robustness of the method.
Image flow the apparent motion of brightness patterns on the image plane can provide important visual information such as distance shape surface orientation and boundaries. It can be determined by either feature tracking or spatio-temporal analysis. The optical flow thus determined can be used to reconstruct the 3-D scene by determining the depth from camera of every point in the scene. However the optical flow determined by either of the methods mentioned above will be noisy. As a result the depth information obtained from optical flow can not be successfully used in practical applications such as image segmentation 3-D reconsiruction path planning etc. By using temporal integration we can increase the accuracy of both the optical flow and the depth determined from optical flow. In this work we describe an incremental integration scheme called the running average method to temporally integrate the image flow. We integrate the depth from camera obtained using optical flow determined from gradient based methods and show that the results of temporal integration are much more useful in practical applications than the results from local edge operators. Finally we consider an image segmentation example and show the advantages of temporal integration. 1.
Image flow the apparent motion of brightness patterns on the image plane can provide important visual information such as distance shape surface orientation and boundaries. It can be determined by either feature tracking or spatio-temporal analysis. We consider spatio-temporal methods and show how differential range can be estimated from time-space imagery. We generate a time-space image by considering only one scan line of the image obtained from a camera moving in the horizontal direction at each time interval. At the next instant of time we shift the previous line up by one pixel and obtain another line from the image. We continue the procedure to obtain a time-space image where each horizontal line represents the spatial relationship of the pixels and each vertical line the temporal relationship. Each feature along the horizontal scan line generates an edge in the time-space image the slope of which depends upon the distance of the feature from the camera. We apply two mutually perpendicular edge operators to the time-space image and determine the slope of each edge. We show that this corresponds to optical flow. We use the result to obtain the differential range and show how this can be implemented on the Pipelined Image Processing Engine (PIPE). We use a simple technique to calibrate the camera and show how the depth can be obtained from optical flow. We provide a statistical analysis of the
KEYWORDS: Optical flow, Cameras, Image processing, Image segmentation, Information visualization, Visualization, Image analysis, Real time image processing, Standards development, Signal to noise ratio
Image flow, the apparent motion of brightness patterns on the image plane, can provide important visual information such as distance, shape, surface orientation, and boundaries. It can be determined by either feature tracking or spatio-temporal analysis. We consider spatio-temporal methods, and show how differential range can be estimated from time-space imagery.
We generate a time-space image by considering only one scan line of the image obtained from a camera moving in the horizontal direction at each time interval. At the next instant of time, we shift the previous line up by one pixel, and obtain another line from the image. We continue the procedure to obtain a time-space image, where each horizontal line represents the spatial relationship of the pixels, and each vertical line the temporal relationship.
Each feature along the horizontal scan line generates an edge in the time-space image, the slope of which depends upon the distance of the feature from the camera. We apply two mutually perpendicular edge operators to the time-space image, and determine the slope of each edge. We show that this corresponds to optical flow. We use the result to obtain the differential range, and show how this can be implemented on the Pipelined Image Processing Engine (PIPE). We discuss several kinds of edge operators, show how using the zero crossings reduces the noise, and demonstrate how better discrimination can be achieved by knowledge of range.
A scheme is presented for the estimation of differential ranges on the basis of optical flow along a known direction, giving attention to the factors affecting the accuracy of results and various spatial and temporal smoothing algorithms employed to enhance the method's accuracy. It is found that while the use of edge detectors reduces noise, a priori knowledge of the environment improves the method's range discriminability. The method has been implemented on a real-time, high speed pipelined image processing engine capable of processing 60 image frames/sec, using a horizontally moving camera which generates optical flow along a scan line.