Stereo matching is one of the most important computer vision tasks. Several methods can be used to compute a matching cost of two pictures. This paper proposes a method that uses convolutional neural networks to compute the matching cost. The network architecture is described as well as teaching process. The matching cost metric based on the result of neural network is applied to base method which uses support points grid (ELAS). The proposed method was tested on Middlebury benchmark images and showed an accuracy improvement compared to the base method.
The article describes the approach that allows to reconstruct the image formed by the video see-through mixed reality system corresponding to the convergence of the device user eyes. Convergence is defined by the user eye pupils position acquired from the mixed reality device eye tracking system. The image reconstruction method is based on the use of an extended (2.5-dimensional) representation of the image obtained, for example, using a 3D scanner that builds a depth map of the scene. In the proposed solution, lens optical systems that form images of the real world on LCD screens and eyepieces that project these images into the user eyes do not change their characteristics and position. The image is reconstructed by projecting the points of the original image to the image points corresponding to the required convergence by the method of "refocusing" at a distance for each point. The advantages and disadvantages of this method are shown. An approach is proposed that reduces visual perception discomfort caused by an ambiguous distance to the image point, for example, in the case of mirror or transparent objects. Virtual prototyping of the mixed reality system showed the benefits of the proposed approach to reduce the visual perception discomfort caused by the mismatch between the convergence of human eyes and the images formed by the lenses of the mixed reality system.
This paper proposes a stereo matching method that uses a support point grid in order to compute the prior disparity. Convolutional neural networks are used to compute the matching cost between pixels in two pictures. The network architecture is described as well as teaching process. The method was evaluated on Middlebury benchmark images. The results of accuracy estimation in case of using data from a LIDAR as an input for the support points grid is described. This approach can be used in multi-sensor devices and can give an advantage in accuracy up to 15%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.