Paper
3 February 2023 STDFormer: transformer-based network for arbitrary-shaped text detection in natural scenes
Jiale Su, Chongyang Zhang
Author Affiliations +
Proceedings Volume 12511, Third International Conference on Computer Vision and Data Mining (ICCVDM 2022); 125112T (2023) https://doi.org/10.1117/12.2659991
Event: Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), 2022, Hulun Buir, China
Abstract
Natural scene text detection refers to locating and representing the text in natural scene images. The existing methods of natural scene text detection are based on convolutional neural network (CNN), but it is vulnerable to useless background noise in the process of extracting the features of curved text instances because the convolution kernel of CNN is fixed in size and rectangular in shape. In order to solve this problem, this paper proposes a novel Transformer-based Feature Fusion Module (TFFM) by integrating the transformer structure into feature pyramid network to reduce the influence of background noise in the process of feature fusion. On this basis, combined with the backbone and detection head of transformer structure, a network of natural scene text detection with full transformer structure is constructed. The method proposed in this paper achieves the state-of-the-art result on CTW1500 and Total-Text datasets, and the Transformer-based Feature Fusion Module (TFFM) proposed in this paper can be easily applied to other target detection frameworks in theory.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiale Su and Chongyang Zhang "STDFormer: transformer-based network for arbitrary-shaped text detection in natural scenes", Proc. SPIE 12511, Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), 125112T (3 February 2023); https://doi.org/10.1117/12.2659991
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Feature extraction

Head

Convolution

Feature selection

Image fusion

Image processing

Back to Top