In text classification tasks, using traditional char granularity or word granularity information to encode context information into the same parameter space cannot effectively extract enough text feature information, and the expressive ability of text semantic information is also relatively weak. In order to solve this problem, this paper adopts the embedding method of fusing graphemes and morphemes, and establishes a text classification model that combines a hybrid self-attention mechanism and RCNN. The model first divides the text into char sequences and word sequences and fuses them, and then uses the Transformer encoder to extract preliminary features. The self-attention mechanism in the Transformer can better focus on semantic and location information, and then uses RCNN to extract the deep features which fusion grapheme and morpheme information, the feature vector is calculated and output by the softmax function, and finally the classification of Chinese short texts is realized. The experimental results show that the method proposed in this paper has different degrees of improvement in the accuracy of text classification compared with a single embedding method and a single neural network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.