Poster + Paper
4 October 2023 A frequency-driven deep learning technique for bird segmentation and detection from RGB video
Author Affiliations +
Conference Poster
Abstract
The convolutional neuronal network (CNN) performs spatial learning on a two-dimensional data (e.g., images) using filters to learn features from the images. Hence it requires many images that have high discriminant spatial and longitudinal features, within and between classes for comprehensive learning. When this requirement is not met the CNN models suffer from the data paucity problem that leads to limited learning and poor classification performance. The segmentation and detection of birds from RGB videos to study the behavior of backyard birds is one of the applications that suffer from this data paucity problem. This paper first presents a new backyard birds’ dataset that is extracted from RGB videos and consisted of the images of a cardinal and a sparrow to use it for developing an artificial neural network (ANN) model with a frequency-driven feature learning approach. It was observed that the images of these birds and their discriminant textures are geometrically distorted due to rapid movements and postures of these birds. These geometrical distortions bury the true representations of the main and the side lobs of the frequency spectrum of the images of the birds. To extract these latent features at different frequency bands and construct feature vectors for training an ANN model, Kaiser–Bessel window is used in the frequency domain along with the fast Fourier transform. Simulations show that by carefully selecting the model’s parameters of the ANN model and the simulation parameters, we can achieve segmentation and detection of the cardinal and sparrow images with about 98% and 96% training and testing accuracy, respectively.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Shan Suthaharan "A frequency-driven deep learning technique for bird segmentation and detection from RGB video", Proc. SPIE 12675, Applications of Machine Learning 2023, 1267517 (4 October 2023); https://doi.org/10.1117/12.2682041
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Artificial neural networks

Education and training

Image segmentation

RGB color model

Data modeling

Video

Simulations

Back to Top