In this study, the authors focus on exploring misogynistic comments in code-mixed Hinglish from YouTube videos. They highlight the rise of online hate speech and cyberbullying, particularly affecting women. The lack of studies addressing misogyny detection in under-resourced languages is emphasized. A novel dataset of YouTube comments labeled as 'Misogynistic' and 'Non-misogynistic' is presented for analysis. Exploratory Data Analysis (EDA) techniques are applied to gain insights into sentiment scores, word patterns, and more. The paper discusses the motivation behind the study, hypothesis, literature review on misogyny detection and code-mixed languages, dataset details, EDA findings, PCA results with distinct clusters identified, research questions answered through EDA insights, and concludes by outlining future steps for machine learning model training.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Sargam Yadav... às arxiv.org 03-18-2024
https://arxiv.org/pdf/2403.09709.pdfPerguntas Mais Profundas