Natural Language Processing with SyntaxNet/Parsey McParseface

Thesis Type Bachelor
Thesis Status
Student Maximilian Mayerl
Thesis Supervisor
Research Field

Natural language processing (NLP) is a diverse discipline concerned with the processing and analysis of texts written in a natural language using a computer.
Recently, Google published an NLP model for their machine learning framework TensorFlow called SyntaxNet, which makes it possible to easily implement and train models for some basic NLP tasks like POS tagging (assigning part of speech tags to words), dependency parsing (determining how words in a sentence depend on each other), or sentence compression (determining which words in a sentence are unnecessary and can be dropped).
In addition to this, Google also published a POS tagging and dependency parsing model for SyntaxNet called Parsey McParseface, trained on well-known English corpora like the Penn Treebank.
The goal of this thesis is to implement a Java library for using SyntaxNet from Java programs, to evaluate how well SyntaxNet performs on select corpora and in comparison to Stanford NLP (another well-known and widely used NLP toolkit) and to determine how easy it is to train new SyntaxNet models and how well such models perform.