Deep neural networks achieve state-of-the-art performance on object detection tasks with RGB data. However, there are many advantages of detection using multi-modal imagery for defence and security operations. For example, the IR modality offers persistent surveillance and is essential in poor lighting conditions and 24hr operation. It is, therefore, crucial to create an object detection system which can use IR imagery. Collecting and labelling large volumes of thermal imagery is incredibly expensive and time-consuming. Consequently, we propose to mobilise labelled RGB data to achieve detection in the IR modality. In this paper, we present a method for multi-modal object detection using unsupervised transfer learning and adaptation techniques. We train faster RCNN on RGB imagery and test with a thermal imager. The images contain object classes; people and land vehicles and represent real-life scenes which include clutter and occlusions. We improve the baseline F1-score by up to 20% through training with an additional loss function, which reduces the difference between RGB and IR feature maps. This work shows that unsupervised modality adaptation is possible, and we have the opportunity to maximise the use of labelled RGB imagery for detection in multiple modalities. The novelty of this work includes; the use of the IR imagery, modality adaption from RGB to IR for object detection and the ability to use real-life imagery in uncontrolled environments. The practical impact of this work to the defence and security community is an increase in performance and the saving of time and money in data collection and annotation.
The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.
The aim of this paper is to describe the progress and results of an imaging system designed to optimise the performance
of human operator tasks through exploitation of multimodal sensors and scene context. The performance of tasks such as
surveillance, target detection and situational awareness is dependent on the scene content, the sensors available and the
algorithms deployed. Intelligent analysis of the scene into contextual regions allows specific algorithms to be optimised
and appropriate sensors to be selected, thereby increasing the performance of the operator's tasks. Context-specific
algorithms, which will adapt as the scene changes, are required. In the case discussed in this paper, the contextual
regions include road, sky and vegetation, and the dynamic detection of each region utilises different sensor modalities.
The paper will describe the overall system concept and a real-time imaging demonstrator using GPUs, which will be
used for future demonstrations of the context-specific processing. Simulations of the context-specific scene analysis will
be described using sensor data from a vehicle in a rural environment. The performance of a motion detection system with
and without context will also be illustrated using measured image data.
This paper presents a method for the detection of faces (via skin regions) in images where faces may be low-resolution and no
assumptions are made about fine facial features being visible. This type of data is challenging because changes in appearance
of skin regions occur due to changes in both lighting and resolution. We present a non-parametric classification scheme based
on a histogram similarity measure. By comparing performance of commonly-used colour-spaces we find that the YIQ colour
space with 16 histogram bins (in both 1 and 2 dimensions) gives the most accurate performance over a wide range of imaging
conditions for non-parametric skin classification. We demonstrate better performance of the non-parametric approach vs.
colour thresholding and a Gaussian classifier. Face detection is subsequently achieved via a simple aspect-ratio and we show
results from indoor and outdoor scenes.
In surveillance and remote sensing applications images are often subject to processing both at the sensor and for
display/storage. This paper presents a step towards measuring and understanding how these processes impact on human
interpretation of regions of similar statistics within images. The paper describes the methods involved, including image
generation, image processing algorithm application, and algorithm impact measurement techniques. A comparison is
then made between the impact measurements and human observations. It is suggested that data on mathematical
measures of the impact of processing algorithms on statistical regions of images may be usable for intelligent algorithm