Creation of a Multi-Author Analysis Dataset
| Thesis Type | Bachelor | 
| Thesis Status | 
             Finished 
       | 
      
| Student | Andreas Pittl | 
| Init | 
             | 
      
| Final | 
             | 
      
| Start | 
             | 
      
| Thesis Supervisor | |
| Contact | |
| Research Field | 
The goal of multi-author analysis is to investigate methods to analyze and characterize the writing style of authors. Multi-author analysis can pave the way for tasks like detecting the positions at which the author changes, or authorship attribution (determining the author of a given text). Developing and training models for multi-author analysis requires a sufficient amount of training data containing texts written by multiple authors with labels specifying the author of each section. The goal of this thesis is the implementation of a dataset generator that is based on the social media platform Reddit. The generator should allow to reproducibly create diverse datasets based on a set of parameters such as the number of different authors, text length, the number of switches between authors, etc.