Sentence Boundary Disambiguation in Colloquial Texts

Thesis Type	Bachelor
Thesis Status	Finished
Student	Sebastian Hepp
Init	22.03.2022 12:00
Final	28.06.2022 12:00
Start	03.03.2022 12:00
Thesis Supervisor	Manfred Moosleitner, MSc. Prof. Dr. Günther Specht
Contact	manfred.moosleitner@uibk.ac.at

Sentence boundary disambiguation (SBD) is the task of splitting a text up into individual sentences. This task can be approached in a number of ways, employing various different machine learning models including hidden Markov models, neural networks, and support vector machines, to name a few. Most existing models for sentence boundary disambiguation are trained on "proper" texts like newspaper articles or novels. Less work has been done on models that work on more colloquial texts that don't strictly follow the grammatical or orthographical rules of the language.

The goal of this thesis is to build an SBD model that is optimized for colloquial texts. For this, transcripts of userinteractions with a digital assistent are used as data set.