Automatic Geolocation Detection for Historic German Texts

Thesis Type Master
Thesis Status
Student Florian Krull
Thesis Supervisor

The project Semantics of Mountaineering History aims at analyzing the annual reports of the Austrian Alpine Association (Alpenverein) of the years 1872 until 1998. A particular focus lies on detecting the geolocation of places mentioned in the texts (particularly, mountain peaks). Obtaining this information allows to e.g., locate articles about first ascents of mountain peaks ascents on a map. The challenges of this task are on the one hand, the fact that mountain names are not unique and on the other hand, that the amount of training data is limited. 

In this master thesis, we aim to perform a two-stage approach to tackle the above-mentioned challenges. First, we aim to derive features relevant for detecting the geolocation for parts of a text (e.g., embeddings) based on texts, but also based on the available gazetteers, which contain mappings of place names and geolocations. In a second step, based on the features detected and on an integrated representation for articles, we aim to detect the geolocation of new articles (and parts of these articles) via modern machine learning techniques.