Natural Language Processing with SyntaxNet/Parsey McParseface

Thesis Type	Bachelor
Thesis Status	Finished
Student	Maximilian Mayerl
Init	10.01.2017 12:00
Final	27.06.2017 12:00
Start	01.11.2016 12:00
Thesis Supervisor	Michael Tschuggnall, PhD
Contact	michael.tschuggnall@uibk.ac.at
Research Field	Authorship Analysis and Cross-Language Grammar Features

Natural language processing (NLP) is a diverse discipline concerned with the processing and analysis of texts written in a natural language using a computer.
Recently, Google published an NLP model for their machine learning framework TensorFlow called SyntaxNet, which makes it possible to easily implement and train models for some basic NLP tasks like POS tagging (assigning part of speech tags to words), dependency parsing (determining how words in a sentence depend on each other), or sentence compression (determining which words in a sentence are unnecessary and can be dropped).
In addition to this, Google also published a POS tagging and dependency parsing model for SyntaxNet called Parsey McParseface, trained on well-known English corpora like the Penn Treebank.
The goal of this thesis is to implement a Java library for using SyntaxNet from Java programs, to evaluate how well SyntaxNet performs on select corpora and in comparison to Stanford NLP (another well-known and widely used NLP toolkit) and to determine how easy it is to train new SyntaxNet models and how well such models perform.