Robots are ideal surrogates for performing tasks that are dull, dirty, and dangerous. To fully achieve this ideal, a robotic teammate should be able to autonomously perform human-level tasks in unstructured environments where we do not want humans to go. In this paper, we take a step toward realizing that vision by introducing the integration of state of the art advancements in intelligence, perception, and manipulation on the RoMan (Robotic Manipulation) platform. RoMan is comprised of two 7 degree of freedom (DoF) limbs connected to a 1 DoF torso and mounted on a tracked base. Multiple lidars are used for navigation, and a stereo depth camera visualizes point clouds for grasping. Each limb has a 6 DoF force-torque sensor at the wrist, with a dexterous 3-finger gripper on one limb and a stronger 4-finger claw-like hand on the other. Tasks begin with an operator specifying a mission type, a desired final destination for the robot, and a general region where the robot should look for grasps. All other portions of the task are completed autonomously. This includes navigation, object identification and pose estimation (if the object is known) via deep learning or perception through search, fine maneuvering, grasp planning via grasp library, arm motion planning, and manipulation planning (e.g. dragging if the object is deemed too heavy to freely lift). Finally, we present initial test results on two notional tasks: clearing a road of debris such as a heavy tree or a pile of unknown light debris, and opening a hinged container to retrieve a bag inside it.
In this work, we provide an overview of vision-based control for perching and grasping for Micro Aerial Vehicles. We investigate perching on at, inclined, or vertical surfaces as well as visual servoing techniques for quadrotors to enable autonomous perching by hanging from cylindrical structures using only a monocular camera and an appropriate gripper. The challenges of visual servoing are discussed, and we focus on the problems of relative pose estimation, control, and trajectory planning for maneuvering a robot with respect to an object of interest. Finally, we discuss future challenges to achieve fully autonomous perching and grasping in more realistic scenarios.
We consider the problem of generating temporally consistent point cloud segmentations from streaming RGB-D data, where every incoming frame extends existing labels to new points or contributes new labels while maintaining the labels for pre-existing segments. Our approach generates an over-segmentation based on voxel cloud connectivity, where a modified k-means algorithm selects supervoxel seeds and associates similar neighboring voxels to form segments. Given the data stream from a potentially mobile sensor, we solve for the camera transformation between consecutive frames using a joint optimization over point correspondences and image appearance. The aligned point cloud may then be integrated into a consistent model coordinate frame. Previously labeled points are used to mask incoming points from the new frame, while new and previous boundary points extend the existing segmentation. We evaluate the algorithm on newly-generated RGB-D datasets.
Semantic perception involves naming objects and features in the scene, understanding the relations between them, and
understanding the behaviors of agents, e.g., people, and their intent from sensor data. Semantic perception is a central
component of future UGVs to provide representations which 1) can be used for higher-level reasoning and tactical
behaviors, beyond the immediate needs of autonomous mobility, and 2) provide an intuitive description of the robot's
environment in terms of semantic elements that can shared effectively with a human operator. In this paper, we
summarize the main approaches that we are investigating in the RCTA as initial steps toward the development of
perception systems for UGVs.
Most of today's robot vehicles are equipped with omnidirectional
sensors which provide surround awareness and easier navigation.
Due to the persistence of the appearance in omnidirectional images,
many global navigation or formation control tasks, instead of using
landmarks or fiducials, they need only reference images of target
positions or objects. In this paper, we study the problem of template
matching in spherical images. The natural transformation of a pattern
on the sphere is a 3D rotation and template matching is the
localization of a target in any orientation given by a reference
image. Unfortunately, the support of the template is space variant on
the Euler angle parameterization. Here we propose a new method
which matches the gradients of the
image and the template, with space-invariant operation.
Using properties of the angular momentum, we have proved
in fact that the gradient correlation can be very easily computed by the
3D Inverse Fourier Transform of a linear combination of spherical
harmonics. An exhaustive search localizes the maximum of this
correlation. Experimental results on real data show a very accurate
localization with a variety of targets. In future work, we plan to
address targets appearing in different scales.