Accurate object detection and depth estimation is critical for a variety of applications such as autonomous driving and robotics. In the context of object avoidance, one may use a LiDAR sensor to determine the position of nearby objects but, due to a lack of resolution, these sensors cannot be used to accurately categorize and label the object being detected. To contrast this, RGB cameras can provide rich semantic information, which can be used to categorize and segment an object but cannot provide accurate depth data. To overcome this, an abundance of algorithms has been created which are capable of fusing the two sensors, among others, allowing for accurate depth detection and segmentation of a given object. The problem with many of these systems is that they are complex in their approach and create 3D bounding boxes, which can result in an agent taking a less optimal path due to the size of the perceived object. The proposed approach in this paper simply determines the position of an object in an RGB image, using a CNN, and then translates two dimensions, found through the center pixel of the bounding box, to a point cloud to identify and segment point clusters.
|