Object detection, a critical task in computer vision, has been revolutionized by Deep Learning technologies, especially convolutional neural networks (CNN). These techniques are increasingly deployed in infrared imaging systems for long-range target detection, localization, and identification. Its performance is highly dependent on the training procedure, network architecture and computing resources. In contrast, human-in-the-loop task performance can be reliably predicted using well-established models. Here we model the performance of a CNN developed for MWIR and LWIR sensors and compare against human perception models. We focus on tower detection relevant to vision-based geolocation tasks which present novel high-aspect ratio, unresolved and low-clutter scenarios.
|