Open Access Paper
12 November 2024 Self-supervised learning based on spatial scale learning and category prediction
Author Affiliations +
Proceedings Volume 13395, International Conference on Optics, Electronics, and Communication Engineering (OECE 2024) ; 1339537 (2024) https://doi.org/10.1117/12.3049840
Event: International Conference on Optics, Electronics, and Communication Engineering, 2024, Wuhan, China
Abstract
In the rapid development of the Internet era, various intelligent devices will generate massive image data. With the emergence of large-scale manually labeled datasets, deep learning technology has made great breakthroughs in the field of computer vision such as image classification, image super resolution and object detection. However, manually labeling image data is a tedious and time-consuming process. In contrast, unlabeled datasets are cheap and easy to obtain from the internet. Therefore, how to effectively use the massive unlabeled data on the internet is one of the research hotspots in the field of computer vision. Self-supervised representation learning constructs supervision signals by designing self-supervised pretext tasks, and learns rich semantic representations from unlabeled datasets. However, many existing self-supervised representation learning models usually need a large batch size to learn a good visual representation in the training process, and setting a large batch size often requires a large amount of computing resources. In order to solve the problems in the above self supervised representation learning algorithm, we apply self-supervised representation learning to object detection task and propose a self-supervised object detection method based on spatial scale learning and category prediction. Without the need to add additional manual labels, we help the model to learn the spatial scale relationship and category relationship between objects in the image by introducing the spatial scale information learning task and category prediction task. Moreover, in the feature extraction stage, we use the feature pyramid network and attention mechanism fusion to help the model better adapt to the size difference of different objects in the image, so as to learn more abundant detail information in the image and further improve the performance of the model. The experimental results show that our method can achieve better performance.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jifeng Sun "Self-supervised learning based on spatial scale learning and category prediction", Proc. SPIE 13395, International Conference on Optics, Electronics, and Communication Engineering (OECE 2024) , 1339537 (12 November 2024); https://doi.org/10.1117/12.3049840
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Spatial learning

Target detection

Education and training

Feature extraction

Image fusion

Internet

Detection and tracking algorithms

Back to Top