Authorship Analysis and Cross-Language Grammar Features

Intrinsic Plagiarism Detection and Authorship Analysis

Capturing the essence of the writing style of authors is an important research area in natural language processing. It allows to identify and attribute the author of a previously unseen document, perform so-called style change detection (find the positions at which the author changes within a document), detect plagiarism intrinsically, develop new technology for writing support, or perform forensic analyses.

To date, detecting variations in the writing style belongs to the most difficult and most interesting challenges in authorship analyses. The task of authorship attribution is particularly challenging in scenarios where ground truth textual data is only available in different languages (for instance, for bilingual authors). Moreover, style change detection is the only means to detect plagiarism in a document if no comparison texts are available.

In our research, we focus on utilizing grammar features for several of the above-mentioned tasks. Thereby, we have pioneered work in cross-language scenarios, where authors have written documents in multiple languages. Current research in this field also covers the detection of social media bots, which have become a more pressing matter in recent years. 

At DBIS, we are part of PAN, an international group of scientists focusing on the writing styles and habits of authors. The PAN initiative organizes shared tasks, where many researchers from across the world compete against each other in finding the best strategies to tackle problems in Authorship Attribution, Author Profiling as well as Multi-Author-Decomposition. Particularly, we are co-organizers of the Style Change Detection task at PAN.

 

Team

Publications

2016

Bib Link

Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast: Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, September 2016. Pages 691-715. CLEF and CEUR-WS.org. ISSN 1613-0073.

Bib Link

Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall and Benno Stein: Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. In Norbert Fuhr et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16), Berlin Heidelberg New York, September 2016. Springer. ISBN 978-3-319-44564-9.

Bib Download

Michael Tschuggnall, Günther Specht and Christian Riepl: Algorithmisch unterstützte Literarkritik: Eine grammatikalische Analyse zur Bestimmung von Schreibstilen. In In Memoriam Wolfgang Richter, Hrsg.: H. Rechenmacher, pages 415-428. EOS-Verlag, 2016.

Bib Link

Michael Tschuggnall and Günther Specht: From Plagiarism Detection to Bible Analysis: The Potential of Machine Learning for Grammar-Based Text Analysis. In Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2016), pages 245-248. 2016

2015

Bib Link

Michael Tschuggnall and Günther Specht: On the Potential of Grammar Features for Automated Author Profiling. In International Journal On Advances in Intelligent Systems, Volume 8, Number. 3&4, pages 255-265, 2015.

Bib Link

Michael Tschuggnall: Intrinsische Plagiatserkennung und Autorenerkennung mittels Grammatikanalyse. In Ausgezeichnete Informatikdissertationen 2014, Volume D-15, pages 279-288. Bonner Köllen Druck+Verlag, 2015.

2014

Bib Download

Michael Tschuggnall: Intrinsic Plagiarism Detection and Author Analysis By Utilizing Grammar. PhD thesis, University of Innsbruck, Department of Computer Science, 2014.

Bib Link

Michael Tschuggnall and Günther Specht: Automatic Decomposition of Multi-Author Documents Using Grammar Analysis. In Proceedings of the 26th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), (GvDB 2014), October 2014, Ritten, Italy. CEUR-WS.org, Volume 1313, pages 17-22, 2014

Bib Link

Michael Tschuggnall, Günther Specht: What Grammar Tells About Gender and Age of Authors. In Proceedings of the 4th International Conference on Advances in Information Mining and Management (IMMM 2014), July 2014, Paris, France, pp. 30-35, 2014

Bib Link

Michael Tschuggnall and Günther Specht: Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), volume 2: Short Papers, April 2014, ACL, Gothenburg, Sweden, pages 195-199, 2014.

2013

Bib Link

Michael Tschuggnall and Günther Specht: Countering Plagiarism by Exposing Irregularities in Authors Grammars. In Proceedings of the European Intelligence and Security Informatics Conference (EISIC 2013), 12.-14. August 2013, Uppsala, Sweden, IEEE, pages 15-22, 2013

Bib Link

Michael Tschuggnall and Günther Specht: Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents. In Proceedings of the 18th International Conference of Natural Language Processing and Information Systems (NLDB 2013), Manchester, UK, June 2013, Springer, LNCS Volume 7934, pages 297-302, 2013

Bib Link Download

Michael Tschuggnall and Günther Specht. Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors. In Proceedings of the 15. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW 2013), 11.-15. März 2013, Magdeburg, LNI, pages 241-259, 2013

Bib Link

Michael Tschuggnall and Günther Specht: Plag-Inn: Uncovering Plagiarism by Examining Author’s Grammar Syntax. In M. Barden, Alexander Ostermann (ed): Scientific Computing @ uibk, innsbruck university press, pages 151-152, 2013

2012

Bib Link

Michael Tschuggnall and Günther Specht. Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees. In Proceedings of the 17th International Conference of Natural Language Processing and Information Systems (NLDB 2012), Groningen, The Netherlands, June 2012, Springer, LNCS Volume 7337, pages 284-289, 2012