Style Change Detection for Author Identification
Thesis Type | Master |
Thesis Status |
Currently running
|
Student | Pedro Marques Costa |
Start |
|
Thesis Supervisor | |
Contact | |
Research Field |
PAN is a renown initiative in the field of text mining, including authorship identification, author profiling or plagiarism detection. On an annual basis, different tasks are proposed and competitions are held, i.e., participants develop algorithms, which are evaluated on the same data sets, making them comparable. In its 2018 edition, PAN proposed a Style Change Detection task, which has a simple description: given a previously unseen text document, decide whether it was written by one author or by multiple authors.
In this thesis, one or more algorithms should be developed for the style change detection task and evaluated on the dataset provided by PAN. Besides using state-of-the-art machine learning methods, also existing techniques from the related field of text segmentation should be utilized or even included in the algorithms, if they work well.