Profiling Hate-Speech Spreaders on Twitter
With the rapid growth of internet and social media usage, online hate-speech has become increasingly prevalent. Internet anonymity has given people the power to say what they want without fear of consequences. Bad actors often misuse platforms to incite violence and target disadvantaged groups. Automated hate-speech detection could be used to prevent such misuse. Yet, current hate-speech detection approaches show a need for further research. State-of-the-art classifiers tend to deliver false positives and are often biased towards certain terms. Regarding a matter that is becoming increasingly political, these issues are untenable for systems that could potentially restrict speech online. Current literature shows further potential for research in hate-speech detection. Context and understanding are often found to be lacking in existing proposed hate-speech detection pipelines. This thesis explores the remaining research gap in hate-speech detection. Findings from other natural language processing tasks are included to improve machine understanding of textual context. Furthermore, different modern text embedding types are compared to improve results. The learnings of this thesis are applied by implementing an improved hate-speech classifier. The constructed classifier is evaluated by profiling hate-speech spreaders on Twitter in conjunction with an academic research challenge by the PAN@CLEF 2021 lab.