Deep metric learning is an approach to establish a distance metric between data to measure the similarity. Metric learning methods map an input image to a representative feature space where semantically similar samples are closer and dissimilar examples are far away. The use of metric learning in fine-grained recognition is widely studied in recent years. Fine-grained Recognition (FGR) focuses on categorizing hard-to-distinguish classes such as birds' species and models of the cars. In the FGR datasets, the intra-class variance is high, while the inter-class variance is low. This makes it challenging to annotate them, leading to erroneous labels. Especially in defense applications, it is quite costly to label the data due to the fact that this work should be done by experts. The performance of the metric learning methods is directly related to the loss function exploited during training the model. The loss functions are divided into two categories: Pair-based and proxy-based approaches. A proxy is a representation of distribution in feature space. Although the pair-based loss functions utilize the data-to-data relations, the proxy-based loss functions exploit data-to-proxy relations. In this paper, we analyzed the effect of the label noise on open-set fine-grained recognition performance. The pair-based and proxy-based methods are evaluated on three widely adopted benchmark datasets: CUB 200-2011, Stanford Cars 196, and FGVC Aircraft.
Real-time face verification from a live stream is still an open question, although it is quite popular in recent years. In order to overcome this problem, several bio-metric techniques are widely used for authentication purposes, both in military and civilian areas. It is merely the task of detecting and comparing a candidate face with the other faces in the database to validate whether they are the same person or not. Typically, A face verification pipeline is composed of four stages: Face detection, alignment, recognition, and matching. The faces in the frames of the live stream are detected via a deep neural network (DNN) in the first part. Then, the detected faces are aligned, and another DNN extracts the face features. The feature vector of each face is used for matching with other vectors in the database to validate the identity. New developments in deep learning lead to achieving human-level performance on the aforementioned tasks. However, the networks used in the stages require high computation power. In order to achieve real-time performance in resource-limited devices, lightweight networks should be preferred. Unfortunately, usage of these kinds of networks decreases the detection and recognition performance dramatically in some frames of a live stream. Therefore, the set of feature vectors for an individual, collected from the live stream, contains outliers that complicate obtaining a robust reference feature vector, which is essential for achieving high confidence in verification tasks. In this work, a conditional generative network is utilized for generating these vectors for the given candidate. We conduct the experiments on a real-life scenario for showing the incrementation of performance that is caused by our proposed generative network.
Deep learning is a widely utilized approach specifically for computer vision applications. Visual recognition is one of the applications utilizing deep learning. Several challenges limit the performance of visual recognition methods. One of the most important challenges is the insufficient number of labeled data in the datasets. To overcome this challenge, the recent studies propose sophisticated methods which require high computational resources, which may create another problem. That is, the implementation of such algorithms on mobile devices is quite challenging. Especially, these issues are encountered in surveillance systems that utilize the drones and/or CC-TVs. To solve these problems and obtain high accuracy, the network should be able to extract both representative and discriminative features from such a small amount of data. In this paper, we propose a generative adversarial semi-supervised training method for visual recognition. Experiments are performed to evaluate a lightweight deep convolutional neural network as a classifier network that is trained by the proposed method and a conditional/unconditional generator networks that are examined in adversarial training.
Face recognition is a key task of computer vision research that has been employed in various security and surveillance applications. Recently, the importance of this task has risen with the improvements in the quality of sensors of cameras, as well as with the increasing coverage of camera networks setup everywhere in the cities. Moreover, biometry-based technologies have been developed for the last three decades and have been available on many devices such as the mobile phones. The goal is to identify people based on specific physiological landmarks. Faces are one of the most commonly utilized landmarks, due to the fact that facial recognition systems do not require any voluntary actions such as placing hands or fingers on a sensor, unlike the other bio-metric methods. In order to inhibit cyber-crimes and identity theft, the development of effective methods is necessary. In this paper, we address the face recognition problem by matching any face image visually with previously captured ones. Firstly, considering the challenges due to optical artifacts and environmental factors such as illumination changes and low resolution, in this paper, we deal with these problems by using convolutional neural networks (CNN) with state-of-the-art architecture, ResNet. Secondly, we make use of a large amount of data consisting of face images and train these networks with the help of our proposed loss function. Application of CNNs was proven to be effective in visual recognition compared to the traditional methods based on hand-crafted features. In this work, we further improve the performance by introducing a novel training policy, which utilizes quadruplet pairs. In order to ameliorate the learning process, we exploit several methods for generating quadruplet pairs from the dataset and define a new loss function corresponding to the generation policy. With the help of the proposed selection methods, we obtain improvement in classification accuracy, recall, and normalized mutual information. Finally, we report results for the end-to-end system for face recognition, performing both detection and classification.
The need for capabilities of automated visual content analysis has substantially increased due to presence of large number of images captured by surveillance cameras. With a focus on development of practical methods for extracting effective visual data representations, deep neural network based representations have received great attention due to their success in visual categorization of generic images. For fine-grained image categorization, a closely related yet a more challenging research problem compared to generic image categorization due to high visual similarities within subgroups, diverse applications were developed such as classifying images of vehicles, birds, food and plants. Here, we propose the use of deep neural network based representations for categorizing and identifying marine vessels for defense and security applications. First, we gather a large number of marine vessel images via online sources grouping them into four coarse categories; naval, civil, commercial and service vessels. Next, we subgroup naval vessels into fine categories such as corvettes, frigates and submarines. For distinguishing images, we extract state-of-the-art deep visual representations and train support-vector-machines. Furthermore, we fine tune deep representations for marine vessel images. Experiments address two scenarios, classification and verification of naval marine vessels. Classification experiment aims coarse categorization, as well as learning models of fine categories. Verification experiment embroils identification of specific naval vessels by revealing if a pair of images belongs to identical marine vessels by the help of learnt deep representations. Obtaining promising performance, we believe these presented capabilities would be essential components of future coastal and on-board surveillance systems.