Paper
20 January 2021 Speech recognition based on concatenated acoustic feature and lightGBM model
Jiali Yu, Yuanyuan Qu, Zhongkai Zhang, Qidong Lu, Zhiliang Qin, Xiaowei Liu
Author Affiliations +
Proceedings Volume 11719, Twelfth International Conference on Signal Processing Systems; 117190P (2021) https://doi.org/10.1117/12.2581426
Event: Twelfth International Conference on Signal Processing Systems, 2020, Shanghai, China
Abstract
In this paper, we focus on the application of the LightGBM model for audio sound classification. Though convolutional neural networks (CNN) generally have superior performance, LightGBM model possess certain notable advantages, such as low computational costs, feasibility of parallel implementations, and comparable accuracies over many datasets. In order to improve the generalization ability of the model, data augmentation operations are performed on the audio clips including pitch shifting, time stretching, compressing the dynamic range and adding white noise. The accuracy of speech recognition heavily depends on the reliability of the representative features extracted from the audio signal. The audio signal is originally a one-dimensional time series signal, which is difficult to visualize the frequency change. Hence it is necessary to extract the discernible components in the audio signal. To improve the representative capacity of our proposed model, we use the Mel spectrum and MFCC (Mel-Frequency Cepstral Coefficients) to select features as twodimensional input to accurately characterize the internal information of the signal. The techniques mentioned in this paper are mainly trained on Google Speech Commands dataset. The experimental results show that the method, which is an optimized LightGBM model based on the Mel spectrum, can achieve high word classification accuracy.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiali Yu, Yuanyuan Qu, Zhongkai Zhang, Qidong Lu, Zhiliang Qin, and Xiaowei Liu "Speech recognition based on concatenated acoustic feature and lightGBM model", Proc. SPIE 11719, Twelfth International Conference on Signal Processing Systems, 117190P (20 January 2021); https://doi.org/10.1117/12.2581426
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speech recognition

Data modeling

Performance modeling

Acoustics

Evolutionary algorithms

Detection and tracking algorithms

Optical filters

Back to Top