Paper
10 November 2022 Toxic detection based on RoBERTa and TF-IDF
Xinmin Liu, Feiyu Zhao
Author Affiliations +
Proceedings Volume 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022); 123481C (2022) https://doi.org/10.1117/12.2641437
Event: 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 2022, Zhuhai, China
Abstract
In an information age, the presence of toxic content has become a major problem for many online communities, and existing methods are not robust enough to detect it. Therefore, the demand for a more accurate and efficient system for toxic messages detection has reached its peak. In this paper, we introduce machine learning and deep learning models to this task. Following the intuition of acquiring the knowledge of both the word itself and its relationship with other words, a stacking model is constructed as the optimal strategy, combining both term frequency-inverse document frequency method (TF-IDF), a robustly optimized Bidirectional Encoder Representations from Transformers pretraining approach (RoBERTa) as the base-model, and neural network as the meta-model. The experiments show that stacking method and K-fold cross validation are advantageous, and our model achieves a detecting accuracy of 0.9023.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xinmin Liu and Feiyu Zhao "Toxic detection based on RoBERTa and TF-IDF", Proc. SPIE 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 123481C (10 November 2022); https://doi.org/10.1117/12.2641437
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Neural networks

Machine learning

Computer programming

Transformers

Data modeling

Performance modeling

Toxicity

RELATED CONTENT


Back to Top