With the latest advances in image sensor technology, cameras are able to generate video with tens of megapixels per frame. These high resolution videos streams offer great potential to be used in the surveillance domain. For ground based systems, gigapixel streams are already used with great effect as illustrated by the ICME 2019 crowd counting challenge. However, for Unmanned Aerial Vehicles (UAVs), this vast stream of data exceeds the limit of transmission bandwidth to send this data back to the ground. On board data analysis and selection is thus required to use and benefit from high resolution cameras. This paper presents a result of the CAVIAR project, where a combination of hardware and algorithms was designed to answer the question: ‘how to exploit a high resolution high frame rate camera on board a UAV?’. With the associated size, weight and power limitations, we implement data reduction by deploying deep learning on hardware to find the relevant information and transmit it to an operator station. The proposed solution aims at employing the high resolution potential of the sensor only onto objects of interest. We encode and transmit the identified regions containing those objects of interest (ROI) at the original resolution and framerate, while also transmitting the downscaled background to provide context for an operator. We demonstrate using a 35 fps, 65 Megapixel camera that this set-up indeed saves considerable bandwidth while retaining all important video data at high quality at the same time.
Hyperspectral imaging sensors acquire images in a large number of spectral bands, unlike traditional electro-optical and infrared sensors which sample only one or few bands. Hyperspectral mosaic sensors acquire an image of all spectral bands in one shot. Using a patterned array of spectral filters they measure different wavelength bands at different pixel locations, but this comes at the cost of a lower spatial resolution, as the sampling per spectral band is lower. Software algorithms can compensate for this loss in spatial sampling in each spectral channel. Here we compare the image quality obtained with spatial bicubic interpolation and two categories of super resolution algorithms: two single frame super resolution algorithms which exploit spectral redundancies in the data and two multiframe super resolution algorithms which exploit spatio-temporal structure. We make a quantitative assessment of the spatial and spectral image reconstruction quality on synthetic data as well as on semi-synthetic mosaic sensor data for applications in security and medical domains. Our results show that multi frame super resolution provides the best spatial and signal-to-noise quality. The single frame super resolution approaches score lower on spatial sharpness but do provide a substantial improvement compared to mere spatial interpolation, while providing in some cases the best spectral quality.
For military operations, the availability of high-quality imaging information from Electro-Optical / Infrared (EO/IR) sensors is of vital importance. This information can be used for timely detection and identification of threatening vessels in an environment with a large amount of neutral vessels. EO/IR sensors provide imagery of all vessels at different moments in time. It is challenging to interpret the images of the different vessels within a larger region of interest. It is therefore helpful to automatically detect and track vessels, and save the detections of the vessels, called snapshots, for identification purposes.
Of all available snapshots, only the best and most representative snapshots should be selected for the operator. In this paper, we present two different approaches for snapshot selection from a vessel track. The first is based on directional track information, and the second on the snapshot appearance. We present results for both these methods on IR recordings, containing vessels with different track patterns in a harbor scenario.
Automatic detection and tracking of maritime targets in imagery can greatly increase situation awareness on naval vessels. Various methods for detection and tracking have been proposed so far, both for reasoning as well as for learning approaches. Learning approaches have the promise to outperform reasoning approaches. They typically detect targets in a single frame, followed by a tracking step in order to follow targets over time. However, such approaches are sub-optimal for detection of small or distant objects, because these are hard to distinguish in single frames. We propose a new spatiotemporal learning approach that detects targets directly from a series of frames. This new method is based on a deep learning segmentation model and is now applied to temporal input data. This way, targets are detected based not only on appearance in a single frame, but also on their movement over time. Detection hereby becomes more similar to how it is performed by the human eye: by focusing on structures that move differently compared to their surroundings. The performance of the proposed method is compared to both ground-truth detections and detections of a contrast-based detector that detects targets per frame. We investigate the performance on a variety of infrared video datasets, recorded with static and moving cameras, different types of targets, and different scenes. We show that spatiotemporal detection overall obtains similar to slightly better performance on detection of small objects compared to the state-of-the-art frame-wise detection method, while generalizing better with fewer adjustable parameters, and better clutter reduction.
Imaging systems can be used to obtain situational awareness in maritime situations. Important tools for these systems are automatic detection and tracking of objects in the acquired imagery, in which numerous methods are being developed. When designing a detection or tracking algorithm, its quality should be ensured by a comparison with existing algorithms and/or with a ground truth. Detection and tracking methods are often designed for a specific task, so evaluation with respect to this task is crucial, which demands for different evaluation measures for different tasks. We, therefore, propose a variety of quantitative measures for the performance evaluation of detectors and trackers for a variety of tasks. The proposed measures are a rich set from which an algorithm designer can choose in order to optimally design and assess a detection or tracking algorithm for a specific task. We compare these different evaluation measures by using them to assess detection and tracking quality in different maritime detection and tracking situations, obtained from three real-life infrared video data sets. With the proposed set of evaluation measures, a user is able to quantitatively assess the performance of a detector or tracker, which enables an optimal design for his approach.
Detecting maritime targets with electro-optical (EO) sensors is an active area of research. One current trend is to automate target detection through image processing or computer vision. Automation of target detection will decrease the number of people required for lower-level tasks, which frees capacity for higher-level tasks. A second trend is that the targets of interest are changing; more distributed and smaller targets are of increasing interest. Technological trends enable combined detection and identification of targets through machine learning. These trends and new technologies require a new approach in target detection strategies with specific attention to choosing which sensors and platforms to deploy.
In our current research, we propose a ‘maritime detection framework 2.0’, in which multi-platform sensors are combined with detection algorithms. In this paper, we present a comparison of detection algorithms for EO sensors within our developed framework and quantify the performance of this framework on representative data.
Automatic detection can be performed within the proposed framework in three ways: 1) using existing detectors, such as detectors based on movement or local intensities; 2) using a newly developed detector based on saliency on the scene level; and 3) using a state-of-the-art deep learning method. After detection, false alarms are suppressed using consecutive tracking approaches. The performance of these detection methods is compared by evaluating the detection probability versus the false alarm rate for realistic multi-sensor data.
New types of maritime targets require new target detection strategies. Combining new detection strategies with existing tracking technologies shows potential increase in detection performance of the complete framework.
The bottleneck in situation awareness is no longer in the sensing domain but rather in the data interpretation domain, since the number of sensors is rapidly increasing and it is not affordable to increase human data-analysis capacity at the same rate. Automatic image analysis can assist a human analyst by alerting when an event of interest occurs. However, common state-of-the-art image recognition systems learn representations in high-dimensional feature spaces, which makes them less suitable to generate a user-comprehensive message. Such data-driven approaches rely on large amounts of training data, which is often not available for quite rare but high-impact incidents in the security domain. The key contribution of this paper is that we present a novel real-time system for image understanding based on generic instantaneous low-level processing components (symbols) and flexible user-definable and user-understandable combinations of these components (sentences) at a higher level for the recognition of specific relevant events in the security domain. We show that the detection of an event of interest can be enhanced by utilizing recognition of multiple short-term preparatory actions.
Naval ships have camera systems available to assist in performing their operational tasks. Some include automatic detection and tracking, assisting an operator by keeping a ship in view or by keeping collected information about ships. Tracking errors limit the use of camera information. When keeping a ship in view, an operator has to re-target a tracked ship if it is no longer automatically followed due to a track break, or if it is out of view. When following several ships, track errors require the operator to re-label objects.
Trackers make errors, for example, due to inaccuracies in detection, or motion that is not modeled correctly. Instead of improving this tracking using the limited information available from a single measurement, we propose a method where tracks are merged at a later stage, using information over a small interval. This merging is based on spatiotemporal matching. To limit incorrect connections, unlikely connections are identified and excluded. For this we propose two different approaches: spatiotemporal cost functions are used to exclude connections with unlikely motion and appearance cost functions are used to exclude connecting tracks of dissimilar objects. Next to this, spatiotemporal cost functions are also used to select tracks for merging. For the appearance filtering we investigated different descriptive features and developed a method for indicating similarity between tracks. This method handles variations in features due to noisy detections and changes in appearance.
We tested this method on real data with nine different targets. It is shown that track merging results in a significant reduction in number of tracks per ship. With our method we significantly reduce incorrect track merges that would occur using naïve merging functions.