Estimating building height from satellite imagery is important for digital surface modeling while also providing rich information for change detection and building footprint detection. The acquisition of building height usually requires a LiDAR system, which is not often available in many satellite systems. In this paper, we describe a building height estimation method that does not require building height annotation. Our method estimates building height using building shadows and satellite image metadata given a single RGB satellite image. To reduce the data annotation needed, we design a multi-stage instance detection method for building and shadow detection with both supervised and semi-supervised training. Given the detected building and shadow instances, we can then estimate the building height with satellite image metadata. Building height estimation is done by maximizing the overlap between the projected shadow region given a query height and the detected shadow region. We evaluate our method on the xView2 and Urban Semantic 3D datasets and show that the proposed method achieves accurate building detection, shadow detection, and height estimation.
Standard ATR algorithms suffer from a lack of transparency into why the algorithm recognized a particular object as a target. We present an enhanced Explainable ATR algorithm that utilizes super-resolution networks to provide increased robustness. XATR is a two-level network, with the lower level using Region-based Convolution Neural Networks (R-CNNs) to recognize major parts of the target, known as vocabulary. The upper level employs Markov Logic Networks (MLN) and structure learning to learn the geometric and spatial relationships between the parts in the vocabulary that best describe the objects. Image degradation due to noise, blurring, decimation, etc., can severely impact XATR performance as feature content is irrevocably lost. We address this by introducing a novel super-resolution network. This network uses a dynamic u-net design. A ResNet is on the encoder path while the imagery is reconstructed with dynamically linked upsampling heads in the decoder path. The network is trained on high resolution and degraded imagery pairs to super-resolve the degraded imagery. The trained dynamic u-net then super-resolves unseen degraded imagery to improve XATR’s performance compared to lost performance when using the degraded imagery. In this paper, we perform experiments to 1) Determine the sensitivity of XATR to image corruption 2) Improve XATR performance with super-resolution and 3) Demonstrate XATR robustness to image degradation and occlusion. Our experiments demonstrate improved recall (+40%) and accuracy (+20%) on degraded images when super-resolution is applied.
An explainable automatic target recognition (XATR) algorithm with part-based representation of 2D and 3D objects is presented. The algorithm employs a two-phase approach. In the first phase, a collection of Convolutional Neural Networks (CNNs) recognizes major parts of these objects, also known as the vocabulary. A Markov Logic Network (MLN) and structure learning mechanism are used to learn the geometric and spatial relationships between the parts in the vocabulary that best describe the objects. The resultant network offers three unique features: 1) the inference results are explainable with qualitative information involving the vocabulary that make up the object; 2) the part-based approach achieves robust recognition performance in cases of partially occluded objects or images of hidden object under canopy; and 3) different object representations can be created by varying the vocabulary and permuting learned relationships.
This work presents a novel fusion mechanism for estimating the three-dimensional trajectory of a moving target using images collected by multiple imaging sensors. The proposed projective particle filter avoids the explicit target detection prior to fusion. In projective particle filter, particles that represent the posterior density (of target state in a high-dimensional space) are projected onto the lower-dimensional observation space. Measurements are generated directly in the observation space (image plane) and a marginal (sensor) likelihood is computed. The particles states and their weights are updated using the joint likelihood computed from all the sensors. The 3D state estimate of target (system track) is then generated from the states of the particles. This approach is similar to track-before-detect particle filters that are known to perform well in tracking dim and stealthy targets in image collections. Our approach extends the track-before-detect approach to 3D tracking using the projective particle filter. The performance of this measurement-level fusion method is compared with that of a track-level fusion algorithm using the projective particle filter. In the track-level fusion algorithm, the 2D sensor tracks are generated separately and transmitted to a fusion center, where they are treated as measurements to the state estimator. The 2D sensor tracks are then fused to reconstruct the system track. A realistic synthetic scenario with a boosting target was generated, and used to study the performance of the fusion mechanisms.
In this work we study the problem of detecting and tracking challenging targets that exhibit low signal-to-noise ratios (SNR). We have developed a particle filter-based track-before-detect (TBD) algorithm for tracking such dim targets. The approach incorporates the most recent state estimates to control the particle flow accounting for target dynamics. The flow control enables accumulation of signal information over time to compensate for target motion. The performance of this approach is evaluated using a sensitivity analysis based on varying target speed and SNR values. This analysis was conducted using high-fidelity sensor and target modeling in realistic scenarios. Our results show that the proposed TBD algorithm is capable of tracking targets in cluttered images with SNR values much less than one.
Target detection and tracking with passive infrared (IR) sensors can be challenging due to significant degradation and corruption of target signature by atmospheric transmission and clutter effects. This paper summarizes our efforts in phenomenology modeling of boosting targets with IR sensors, and developing algorithms for tracking targets in the presence of background clutter. On the phenomenology modeling side, the clutter images are generated using a high fidelity end-to-end simulation testbed. It models atmospheric transmission, structured clutter and solar reflections to create realistic background images. The dynamics and intensity of a boosting target are modeled and injected onto the background scene. Pixel level images are then generated with respect to the sensor characteristics. On the tracking analysis side, a particle filter for tracking targets in a sequence of clutter images is developed. The particle filter is augmented with a mechanism to control particle flow. Specifically, velocity feedback is used to constrain and control the particles. The performance of the developed “adaptive” particle filter is verified with tracking of a boosting target in the presence of clutter and occlusion.
Binocular reconstruction of a 3D shape is an ill-conditioned inverse problem: in the presence of visual and oculomotor noise the reconstructions based solely on visual data are very unstable. A question, therefore, arises about the nature of a priori constraints that would lead to accurate and stable solutions. Our previous work showed that planarity of contours, symmetry of an object and minimum variance of angles are useful priors in binocular reconstruction of polyhedra. Specifically, our algorithm begins with producing a 3D reconstruction from one retinal image by applying priors. The second image (binocular disparity) is then used to correct the monocular reconstruction. In our current study, we performed psychophysical experiments to test the importance of these priors. The subjects were asked to recognize shapes of 3D polyhedra from unfamiliar views. Hidden edges of the polyhedra were removed. The recognition performance, measured by detectability measure d¢, was high when shapes satisfied regularity constraints, and was low otherwise. Furthermore, the binocular recognition performance was highly correlated with the monocular one. The main aspects of our model will be illustrated by a demo, in which binocular disparity and monocular priors are put in conflict.
In missile defense target recognition applications, knowledge about the problem may be imperfect, imprecise, and incomplete. Consequently, complete probabilistic models are not available. In order to obtain robust inference results and avoid making inaccurate assumptions, the probabilistic argumentation system (PAS) is employed. In PAS, knowledge is encoded as logical rules with probabilistically weighted assumptions. These rules map directly to Dempster-Shafer belief functions, which allow for uncertainty reasoning in the absence of complete probabilistic models. The PAS can be used to compute arguments for and against hypotheses of interest, and numerical answers that quantify these arguments. These arguments can be used as explanations that describe how inference results are computed. This explanation facility can also be used to validate intelligent information, which can in turn improve inference results. This paper presents a Java implementation of the probabilistic argumentation system as well as a number of new features. A rule-based syntax is defined as a problem encoding mechanism and for Monte Carlo simulation purposes. In addition, a graphical user interface (GUI) is implemented so that users can encode the knowledge database, and visualize relationships among rules and probabilistically weighted assumptions. Furthermore, a graphical model is used to represent these rules, which in turn provides graphical explanations of the inference results. We provide examples that illustrate how classical pattern recognition problems can be solved using canonical rule sets, as well as examples that demonstrate how this new software can be used as an explanation facility that describes how the inference results are determined.