Analysis of Semistructured Data in Wikipedia

Thesis Type Master
Thesis Status
Student Alexander Larcher
Thesis Supervisor
Research Field

The Wikipedia Encyclopedia has become a widespread and common used benchmark in the area of knowledge procurement throughout the last years. The content of its articles mostly consists of free text. Indeed, a human has no problems to read these kinds of texts, but a computer system does. An approach to solve these problems is to store the information in a more structured manner. For this purpose, so-called infoboxes have been introduced. These tables are situated at the top of a Wiki-page and contain key-value pairs. They make it easier for computer programs to find and get information and help people to get an overview of the corresponding article. The idea behind this thesis is to extract and analyse the data from these Infoboxes. The behaviour of the Wikipedia-community is analysed, emerging patterns are detected and structural changes are tracked. The gained information is the basis
of a new approach which supports the user in creating infoboxes. This approach uses a recommendation system and is implemented in a tool called SnoopBox, which is presented in this thesis.