Profiling Hate-Speech Spreaders on Twitter
Thesis Type | Master |
Thesis Status |
Finished
|
Student | Jannik Siebert |
Final |
|
Start |
|
Thesis Supervisor | |
Contact | |
Research Field |
With the rapid growth of internet and social media usage, online hate speech has become increasingly prevalent. Internet anonymity has given people the power to communicate their thoughts without fear of consequences. Bad actors often misuse platforms to incite violence and target disadvantaged groups. Automated hate speech detection could be used to prevent such misuse. State-of-the-art classification methods tend to deliver false positive results and are often biased toward certain terms. These issues are untenable for systems that could potentially restrict speech onlineāan increasingly political topic. Current literature shows potential for further research in hate speech detection. Contextual relevance and understanding of feature importance are often found to be lacking in existing proposed hate speech detection pipelines. Existing hate speech detection systems often lack context awareness and nuanced understanding. This thesis aims to explore the topic of authorship profiling for hate speech detection. The goal is to make classification methods more accurate and robust. Findings from broad natural language processing tasks are included to improve machine understanding of textual context. Furthermore, different modern text embedding types are compared to improve results. The learnings of this thesis are applied by implementing an improved hate speech classifier. The constructed classifier is evaluated by profiling hate speech spreaders on Twitter in conjunction with an academic research challenge by the PAN@CLEF 2021 lab. Using the methods discussed in this thesis, we show competitive results with an accuracy score of 75%. We conclude with recommendations for extending applied methods to improve methodical construction and evaluation of machine learning systems for authorship profiling and related tasks.