A Comprehensive Study on Improving Hate Speech Detection through NLP Data Augmentation
This study explores the effectiveness of various data augmentation techniques, including legacy approaches and contemporary practices such as Large Language Models (LLMs), in enhancing the performance of supervised machine learning models for hate speech detection. The authors propose an optimized utilization of BERT-based encoder models with contextual cosine similarity filtration to address the limitations of prior synonym substitution methods.