Hydrodynamic tunnel is an effective mean for studying wing flow process in aerodynamics and hydrodynamics. It allows to study flow characteristics in controlled conditions and to model the conditions that could not be studied in real flight, such as aerodynamic characteristics at critical angles of attack, in icing conditions etc. Techniques for flow visualisation such as coloured jets or small particles allow to have a qualitative data about flow behaviour, being the valuable means for understanding flow behaviour. But it is more important to have quantitative characteristics of the flow allowing to predict the process evolution and to develop safety measures and recommendations.
The presented study addresses to developing a system for optical 3D measurements in hydrodynamic tunnel basing on photogrammetric techniques. To provide accurate measurements in condition of two optical media interfaces (air-glass and air-liquid) the accurate model of image formation accounting refraction is developed.
The developed photogrammetric system includes several high speed cameras (from 2 to 4 cameras) mounted in a fixed position relatively the working space and a structured light projector. Original technique is applied for the system calibration.
Two metrics has been used as a measure of the accuracy of the calibration: the first one being based on the test field points measurements, and the second one using points-to points distance for the surfaces of a reference object.
The key contributions of this paper are: (1) accurate model of image formation in case of several media interfaces (2) a technique for photogrammetric system calibration for 3D measurements in hydrodynamic tunnel (3) experimental evaluation of calibration accuracy for multi-media 3D measurements.
The performed experimental evaluation of the developed photogrammetric system has proved high accuracy of system calibration and optical 3D measurements in multi-media optical environment. The developed technique for photogrammetric system calibration and 3D measurements demonstrated applicability for the task of 3D flow analysis in hydrodynamic tunnel.
KEYWORDS: 3D modeling, Image segmentation, 3D image processing, 3D image reconstruction, Visual process modeling, Systems modeling, 3D metrology, Performance modeling, Clouds, Neural networks
imultaneous 3D scene reconstruction and semantic segmentation are required in many applications such as autonomous driving, robotics, and optical metrology. Classic 3D reconstruction methods usually perform such operations twofold. Firstly, a 3D scanner or laser scanner acquires a point cloud. Secondly, semantic segmentation of the point cloud is performed. Recently a new kind of 3D model representation was proposed that utilizes the trapezium-shaped voxels that are aligned with the camera’s frustum and pixels [1]. Frustum voxel models proved to be effective for monocular 3D scene reconstruction and segmentation from monocular images [2]. Still, many existing 3D scanning systems readily provide stereo cameras. The performance of frustum voxel model-based methods for stereo input remains an open question. This paper is focused on the evaluation of the 3D reconstruction quality of a volumetric neural network with a monocular and stereo input. We leverage an SSZ [2] volumetric neural network as a starting point for our research. We develop its modified version that we term Stereo-SSZ that receives a stereo pair as an input. We compare the performance of the original SSZ model and our Stereo-SSZ model on different real and synthetic 3D shape datasets. Specifically, we generate a stereo version of the SemanticVoxels [2] dataset and capture stereo pairs of multiple real objects using a structured light scanner. The results of our experiments are encouraging and demonstrate that the model with a stereo input outperforms the original monocular SSZ network. Specifically, the frustum voxel models generated by our Stereo-SSZ model have lower surface distance errors and demonstrate fine details in the reconstructed 3D models.
KEYWORDS: 3D modeling, Unmanned aerial vehicles, Image segmentation, Object recognition, 3D image processing, Neural networks, Motion models, Data modeling, Data processing, Computing systems
Impressive progress in technical characteristics of modern unmanned aerial vehicles (UAV) provides new opportunities for their exploiting in different applications and missions which were impossible earlier. The growing applicability of UAVs is based on high performance of modern computers and latest advances in sensor data processing techniques.
Recent decades modern convolutional neural network (CNN) models have demonstrated the state of the art performance in many computer vision problems seemed to be solved properly only by a human. The study is aimed at developing a deep learning techniques for UAV autonomous navigation in complex environment in obstacle avoidance mode. Such kind of navigation is required for cargo delivery or rescue mission in urban, industrial or forestry environment when global geo-positioning system can be unavailable.
For navigating in complex environment UAV have to recognize objects of observed scene and to estimate distance for possible obstacle. The proposed technique to solve these tasks exploits deep learning approach for image segmentation and depth map estimation using an image of the observed scene.
The convolutional neural network model is developed capable to predict depth map of the observed scene along with scene segmentation according the predefined object classes. The proposed neural network architecture is based on generative adversarial model with generative part translating an input color image into an output voxel model. The aim of the discriminative part is to estimate how close the output to real data and to penalize false output. Both generative and discriminative parts are trained simultaneously on the specially prepared dataset.
Evaluation on the testing part of the prepared dataset has demonstrated the ability of the developed neural network model to perform segmentation of unobserved complex scenes containing several objects and estimating depth map for this scene. The proposed neural network architecture provides high generalization ability for new scenes.
KEYWORDS: Clouds, Systems modeling, Roads, Geodesy, Image processing, Agriculture, Chemical analysis, Statistical analysis, Soil science, Chemical elements
Advances in unmanned aerial vehicles (UAV) and optical sensors of various types provide new opportunities for collecting and processing a remote sensing data of a new quality. Applying UAVs to acquire high-resolution imagery makes it possible to produce a digital elevation model (DEM) of high quality and resolution. New quality of an available DEMs allows to analyze small details of the land surface and to retrieve valuable information about hidden archaeological content. Our study addresses to creating and analysing of DEM of large-scale and high-resolution for detecting the traces of hidden ancient artefacts at archaeological sites. The survey for acquiring an imagery for this study has been carried out at Taman Peninsula (Russia) as a part of Russian State Historical Museum expedition aimed at studying of the Bosporan Kingdom (VI-I century BC). We presents the developed techniques for UAV imagery processing which provides improved accuracy of photogrammetric 3D measurements comparing with standard photogrammetric image processing by commercial software. These approaches have been developed for interpretation of terrain models for predicting possible spatial distribution of archaeological artefacts. The proposed techniques allows creating large-scaled digital terrain models of the archaeological sites which can serve for more reliable archaeological prediction and accurate geo-positioning of possible findings. It has showed that the developed techniques provide accurate high quality DEM and serve as useful tool for archaeological sites analyses and predictions.
Image-based scene 3D reconstruction is one of the key tasks for many machine vision applications such as scene understanding, object pose estimation, autonomous navigation. A set of reliable and accurate methods for multi-view scene 3D reconstruction has been developed last decades. But a significant drawback of such 3D reconstruction technique is the need for acquiring a large number of images in the processed sequence to obtain an acceptable 3D scene representation. Recently modern convolutional neural network (CNN) models achieve the best quality for object recognition, image segmentation, image translation and some other challenging computer vision problems. The paper proposes a convolutional neural network architecture and a technique for training data preparation which provide a prediction of voxel model of a 3D scene with several objects. For CNN training and evaluation a special dataset was collected and annotated. It contains image sequences of several scenes and corresponding depth images and 3D models of these scenes. The image sequence serves as the primary data used for further scene 3D reconstruction by SfM technique. Structure from Motion processing results in surface 3D models of all objects in the scene and camera positions and orientation for every image in a sequence. Then surface 3D model is transformed into voxel 3D model and segmented into separate objects. Conditional generative adversarial network architecture was developed for 3D reconstruction by single image. Its generative part translates an input color image into an output voxel model. The discriminative part distinguishes the correct output (close to real voxel model) from false output (wrong output voxel model). Both parts are trained simultaneously on the prepared dataset. Evaluation on the testing part of the prepared dataset has demonstrated the ability of prediction 3D models of previously unobserved complex scenes containing several objects. The proposed neural network architecture provides high generalization ability and improved resolution of predicted voxel 3D models.
Object detection and recognition is one of the important problems in many remote sensing applications such as monitoring, security, rescue mission and other data analysis tasks. So a large number of approaches and techniques have been developed to improve the quality of object detection. Recently, deep learning methods have made significant progress in detecting objects. But when dealing with objects having small size in the image (which are the often case in monitoring or rescue mission), the quality of detection noticeably decreases. To study the performance and the limits of deep learning abilities for object detection in remote sensing imagery with degrading object resolution in the image, a special dataset of aerial images containing objects of interest (human, car) at different resolution has been collected. The dataset consists of images acquired at different distances to the objects of interest, providing representative subsets of object images of various scale. Two state-of-the-art object detection convolutional neural networks (Faster-RCNN and SSD) was evaluated on the collected dataset. The aim of the study was to find out how the object size in the image influences on the detection performance and to estimate the value of object image size at which the performance drops significantly. Also the approaches for improving the small size object recognition were developed and evaluated. First approach uses multimodal image fusion, the second one applies deep learning to increase the resolution of small objects in the image. The performed tests have proved that the developed approaches allow to improve the quality of object recognition when dealing with low resolution object images.
KEYWORDS: 3D modeling, RGB color model, Cameras, Object recognition, 3D image processing, Thermal modeling, Data fusion, Thermography, Image fusion, Data modeling
Multi-spectral imagery provides wide possibilities for improving quality of object detection and recognition due to better visibility of different scene features in different spectral ranges. To use the advantage of multi-spectral data the relation between different types of data is required. This relation is provided by capturing data using calibrated, aligned and synchronized sensors. Also geo-spatial data in form of geo-referenced digital terrain models can be used for establishing geometric and semantic relations between different types of data. The presented study considers the problem of object recognition based on two data sources: visible and thermal imagery. The main aim of the performed study was to evaluate the performance of different convolutional neural network models for multimodal object recognition. For this purpose a special dataset was collected. The dataset contains synchronized visible and thermal images acquired by several sensor based on unmanned aerial vehicle. The dataset contains synchronized color and thermal images of urban and suburb scenes gathered in different seasons, different times of day and various weather conditions. For convolutional neural network training the dataset was augmented by model images created using object 3D models textured by real visible and thermal images. Several convolutional neural network architectures were trained and evaluated on the created dataset using different splits to estimate the influence of training data on object recognition performance.
KEYWORDS: Machine learning, 3D modeling, Data modeling, 3D image processing, 3D image reconstruction, Performance modeling, Unmanned aerial vehicles, 3D scanning, 3D acquisition, Digital imaging
Traditional methods of photogrammetric image processing allow reconstructing a digital terrain model with high accuracy needed for producing different geographic information system (GIS) products such as maps, orthophoto etc. The accuracy of correspondence problem solution for good imaging conditions can reach the level of hundredth of a pixel. Recently deep learning techniques have been applied for dense stereo matching of images. They allow receiving depth information directly from stereo images and show impressive results for such tasks as scene segmentation, object detection and classification. The aim of the study was to evaluate the performance of deep learning techniques for accurate digital terrain models generation. Some of the state-of-the-art deep learning stereo matching models were evaluated along with original convolutional network. To get reliable accuracy estimation of the considered techniques the results of 3D reconstruction were compared with the reference surface obtained by 3D scanning. Also the imagery acquired from unmanned aerial vehicle during the mission on digitizing an archaeological site was used for performance evaluation by comparing with digital terrain model generated by photogrammetric technique.
The application area of unmanned aerial vehicles increases significantly recent years due to progress in hardware and algorithms for data acquisition and processing. Object detection and classification (recognition) in imagery acquired by unmanned aerial vehicle are the key tasks for many applications, and usually in practice an operator solves these tasks. Growing amount of data of different types and of different nature provides the possibility for deep machine learning which nowadays shows high level results for object detection and recognition. Two key problems are to be solved for applying deep learning for object recognition task when dealing with multi-spectral imagery: (a) availability of representative dataset for neural network training and testing and (b) effective way of multi-spectral data fusion during neural network training. The paper proposes the approaches for solving these problems. For creating a representative dataset synthetic infra-red images are generated using several real infra-red images and 3D model of a given object. An technique for realistic infra-red texturing based on accurate infra-red image exterior orientation and 3D model pose estimation is developed. It allows in automated mode to produce datasets of required volume for deep learning and automatically generate ground truth data for neural network training and testing. Two approaches for multi-spectral data fusion for object recognition are developed and evaluated: data level fusion and results level fusion. The results of the evaluation of both techniques on generated multi-spectral dataset are presented and discussed.
KEYWORDS: 3D modeling, Data modeling, 3D image processing, 3D acquisition, Data fusion, Unmanned aerial vehicles, Data acquisition, Digital cameras, Calibration, Information visualization
The quality of archaeological sites documenting is of great importance for cultural heritage preserving and investigating. The progress in developing new techniques and systems for data acquisition and processing creates an excellent basis for achieving a new quality of archaeological sites documenting and visualization. archaeological data has some specific features which have to be taken into account when acquiring, processing and managing. First of all, it is a needed to gather as full as possible information about findings providing no loss of information and no damage to artifacts. Remote sensing technologies are the most adequate and powerful means which satisfy this requirement. An approach to archaeological data acquiring and fusion based on remote sensing is proposed. It combines a set of photogrammetric techniques for obtaining geometrical and visual information at different scales and detailing and a pipeline for archaeological data documenting, structuring, fusion, and analysis. The proposed approach is applied for documenting of Bosporus archaeological expedition of Russian State Historical Museum.
Structure from motion approach became a powerful mean for scene 3D reconstruction using only a sequence of images from moving camera as initial data. Such a technique has a significant potential for unmanned aerial or unmanned ground vehicles for navigation in unknown environments. Different techniques are used for estimation the 3D structure of a scene such as optical flow approach, feature detection and matching in the set of images, features tracking through a sequence of images. Robustness and accuracy of scene 3D coordinates measurements are the important characteristics of structure from motion algorithms which has to provide the reliability of the navigation. The technique for scene 3D reconstruction using unmanned aerial vehicle imagery is developed based on preliminary features detection and matching in a set of stereo pairs with appropriate basis which allows reaching reasonable accuracy of 3D measurements. The results of accuracy evaluation for two variants of surface 3D reconstruction from image sequence are presented and discussed: for the case of un-calibrated images and for images with known interior orientation. The ways for improving the accuracy of the developed 3D reconstruction technique are discussed.
More than 80% of video surveillance systems are used for monitoring people. Old human detection algorithms, based on background and foreground modelling, could not even deal with a group of people, to say nothing of a crowd. Recent robust and highly effective pedestrian detection algorithms are a new milestone of video surveillance systems. Based on modern approaches in deep learning, these algorithms produce very discriminative features that can be used for getting robust inference in real visual scenes. They deal with such tasks as distinguishing different persons in a group, overcome problem with sufficient enclosures of human bodies by the foreground, detect various poses of people. In our work we use a new approach which enables to combine detection and classification tasks into one challenge using convolution neural networks. As a start point we choose YOLO CNN, whose authors propose a very efficient way of combining mentioned above tasks by learning a single neural network. This approach showed competitive results with state-of-the-art models such as FAST R-CNN, significantly overcoming them in speed, which allows us to apply it in real time video surveillance and other video monitoring systems. Despite all advantages it suffers from some known drawbacks, related to the fully-connected layers that obstruct applying the CNN to images with different resolution. Also it limits the ability to distinguish small close human figures in groups which is crucial for our tasks since we work with rather low quality images which often include dense small groups of people. In this work we gradually change network architecture to overcome mentioned above problems, train it on a complex pedestrian dataset and finally get the CNN detecting small pedestrians in real scenes.
The paper proposes a semantic segmentation algorithm based on Convolutional Neural Networks (CNN) related to the problem of presenting multispectral sensor-derived images in Enhanced Vision Systems (EVS). The CNN architecture based on residual SqueezeNet with deconvolutional layers is presented. To create an in-domain training dataset for CNN, a semi-automatic scenario with the use of photogrammetric technique is described. Experimental results are shown for problem-oriented images, obtained by TV and IR sensors of the EVS prototype in a set of flight experiments.
Existing image fusion methods based on morphological image analysis, that expresses the geometrical idea of image shape as a label image, are quite sensitive to the quality of image segmentation and, therefore, not sufficiently robust to noise and high frequency distortions. On the other hand, there are a number of methods in the field of dimensionality reduction and data comparison that give possibility of avoiding an image segmentation step by using diffusion maps techniques. The paper proposes a new approach for multispectral image fusion based on the combination of morphological image analysis and diffusion maps theory (i.e. Diffusion Morphology). A new image fusion algorithm is described that uses a matched diffusion filtering procedure instead of morphological projection. The algorithm is implemented for a three channels Enhanced Vision System prototype. The comparative results of image fusion are shown on real images acquired in flight experiments.
KEYWORDS: Algorithm development, Algorithm development, Detection and tracking algorithms, 3D image processing, Motion analysis, Calibration, Reliability, 3D acquisition, Cameras, Video, 3D metrology, Imaging systems
Automated and accurate spatial motion capturing of an object is necessary for a wide variety of applications including industry and science, virtual reality and movie, medicine and sports. For the most part of applications a reliability and an accuracy of the data obtained as well as convenience for a user are the main characteristics defining the quality of the motion capture system. Among the existing systems for 3D data acquisition, based on different physical principles (accelerometry, magnetometry, time-of-flight, vision-based), optical motion capture systems have a set of advantages such as high speed of acquisition, potential for high accuracy and automation based on advanced image processing algorithms. For vision-based motion capture accurate and robust object features detecting and tracking through the video sequence are the key elements along with a level of automation of capturing process. So for providing high accuracy of obtained spatial data the developed vision-based motion capture system “Mosca” is based on photogrammetric principles of 3D measurements and supports high speed image acquisition in synchronized mode. It includes from 2 to 4 technical vision cameras for capturing video sequences of object motion. The original camera calibration and external orientation procedures provide the basis for high accuracy of 3D measurements. A set of algorithms as for detecting, identifying and tracking of similar targets, so for marker-less object motion capture is developed and tested. The results of algorithms’ evaluation show high robustness and high reliability for various motion analysis tasks in technical and biomechanics applications.
The improved stereo-based approach for dynamic road scene understanding in a Driver Assistance System (DAS) is presented. System calibration is addressed. Algorithms for road lane detection, road 3D model generation, obstacle predetection and object (vehicle) detection are described. Lane detection is based on the evidence analysis. Obstacle predetection procedure performs the comparison of radial ortophotos, obtained by left and right stereo images. Object detection algorithm is based on recognition of back part of cars by histograms of oriented gradients. Car Stereo Sequences (CSS) Dataset captured by vehicle-based laboratory and published for DAS algorithms testing.
Most part of existing systems for face recognition is usually based on two-dimensional images. And the quality of recognition is rather high for frontal images of face. But for other kind of images the quality decreases significantly. It is necessary to compensate for the effect of a change in the posture of a person (the camera angle) for correct operation of such systems. There are methods of transformation of 2D image of the person to the canonical orientation. The efficiency of these methods depends on the accuracy of determination of specific anthropometric points. Problems can arise for cases of partly occlusion of the person`s face. Another approach is to have a set of person images for different view angles for the further processing. But a need for storing and processing a large number of two-dimensional images makes this method considerably time-consuming. The proposed technique uses stereo system for fast generation of person face 3D model and obtaining face image in given orientation using this 3D model. Real-time performance is provided by implementing and graph cut methods for face surface 3D reconstruction and applying CUDA software library for parallel calculation.
An important stage of rapid prototyping technology is generating computer 3D model of an object to be reproduced. Wide variety of techniques for 3D model generation exists beginning with manual 3D models generation and finishing with full-automated reverse engineering system. The progress in CCD sensors and computers provides the background for integration of photogrammetry as an accurate 3D data source with CAD/CAM. The paper presents the results of developing photogrammetric methods for non-contact spatial coordinates measurements and generation of computer 3D model of real objects. The technology is based on object convergent images processing for calculating its 3D coordinates and surface reconstruction. The hardware used for spatial coordinates measurements is based on PC as central processing unit and video camera as image acquisition device. The original software for Windows 9X realizes the complete technology of 3D reconstruction for rapid input of geometry data in CAD/CAM systems. Technical characteristics of developed systems are given along with the results of applying for various tasks of 3D reconstruction. The paper describes the techniques used for non-contact measurements and the methods providing metric characteristics of reconstructed 3D model. Also the results of system application for 3D reconstruction of complex industrial objects are presented.
The problem of human face shape reconstruction basing on information about human skull is very important for many applications beginning with archeology and finishing with crime expertise. Recent progress in development of 3D scanning system provides the means and techniques for generation of metric 3D models for a skull thus creating a background for development a method for automated unknown human face virtual reconstruction. The paper presents the method, hardware, and original software for reconstruction of human face basing on metric skull 3D model. Skull 3D model is obtained using PC-based digital photogrammetric system for automated skull 3D reconstruction. It allows generating a whole textured metric skull 3D model by one step using images from three CCD cameras while turntable rotating. The proposed method for unknown face 3D reconstruction is based on the suppositions that every skull can be described by a set of reference points corresponding to a reference point set on a human face and that an arbitrary face 3D model can be transformed into reconstructed face by some kind of deformation with restriction on given tissue depth at the reference points. The description of the developed method and some results of the proposed technique applying are presented.
Progress in imaging sensors and computers create the background for numerous 3D imaging application for wide variety of manufacturing activity. Many demands for automated precise measurements are in wood branch of industry. One of them is the accurate volume definition for cut trees carried on the truck. The key point for volume estimation is determination of the front area of the cut tree package. To eliminate slow and inaccurate manual measurements being now in practice the experimental system for automated non-contact wood measurements is developed. The system includes two non-metric CCD video cameras, PC as central processing unit, frame grabbers and original software for image processing and 3D measurements. The proposed method of measurement is based on capturing the stereo pair of front of trees package and performing the image orthotranformation into the front plane. This technique allows to process transformed image for circle shapes recognition and calculating their area. The metric characteristics of the system are provided by special camera calibration procedure. The paper presents the developed method of 3D measurements, describes the hardware used for image acquisition and the software realized the developed algorithms, gives the productivity and precision characteristics of the system.
Wide variety of medical and archeological applications has a demand for skull geometric parameter measurements. Traditional contact measurement technique has some disadvantages such as low accuracy and a need for real skull for processing. Applying of photogrammetric method for non- contact spatial coordinates determination and 3D model generation allows to provide high precision and conventional interface for expert. However, the problem of textured human skull 3D reconstruction seems to be rather complicated concerning the following aspects. The human skull is a real 3D object, which can not be reconstructed basing on single stereo pair. The way of whole 3D model reconstruction basing on acquiring a set of stereo images covered the whole object surface is time consuming and requires a special mean for integration of obtained 2.5D fragments into united 3D model. Another requirement to skull 3D model is to provide for the expert the possibility of easy finding the object point, which has to be measured. Accurate photorealistic texture mapping can satisfy this requirement. The paper presents the approach, which provides high performance automated skull 3D reconstruction along with accurate texture generation. The system developed includes three CCD cameras, Pentium personal computer equipped with frame grabbers, structural light projector and PC-controlled turnable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.