Paper
25 May 2023 Conformer based hybrid ctc/attention end-to-end Tibetan speech recognition
Jiewen Ning, Yugang Dai, Guanyu Li, Senyan Li, Sirui Li, Guangming Li
Author Affiliations +
Proceedings Volume 12712, International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2023); 127120S (2023) https://doi.org/10.1117/12.2678850
Event: International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2023), 2023, Huzhou, China
Abstract
Deep learning has led to remarkable success in many research fields,such as machine translation,speech recognition. Combined with the characteristics of the Tibetan and the current development of Tibetan, with the help of transformer, conformer and other frameworks, we studies how changes in modeling units and encoder types affect Tibetan speech recognition. The modeling unit in this paper generates subwords through bpe algorithm or automatically divides Tibetan characters, and the encoder types are transformer and conformer. Finally, we verify it on the open-source Tibetan language dataset xbmu-amdo31. The experimental results show that the Tibetan characters as the modeling unit and conformer as the encoder has the best performance. The CER of conformer encoder whose modeling unit is character is 11.89%, which is 44.51% lower than the transformer model. The WER of conformer encoder whose modeling unit is subword is 17.40%, which is relatively reduced by 32.27%.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiewen Ning, Yugang Dai, Guanyu Li, Senyan Li, Sirui Li, and Guangming Li "Conformer based hybrid ctc/attention end-to-end Tibetan speech recognition", Proc. SPIE 12712, International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2023), 127120S (25 May 2023); https://doi.org/10.1117/12.2678850
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Modeling

Transformers

Speech recognition

Convolution

Associative arrays

Acoustics

Back to Top