Please use this identifier to cite or link to this item:
Type of publication: Straipsnis / Article
Author(s): Šarkutė, Ligita;Utka, Andrius;Kapočiūtė-Dzikienė, Jurgita
Title: The effect of author set size in authorship attribution for Lithuanian
Is part of: NODALIDA 2015 : proceedings of the 20th Nordic conference of computational linguistics, May 11–13, 2015, Institute of the Lithuanian language, Vilnius, 2015, p. 87-96
Date: 2015
Keywords: Autorystės nustatymas;Parlamento stenogramos;Authorship attribution;Parliamentary transcripts
ISBN: 9789175190983
Abstract: This paper reports the first authorship attribution results based on the effect of the author set size using automatic computational methods for the Lithuanian language. The aim is to determine how fast authorship attribution results are deteriorating while the number of candidate authors is gradually increasing: i.e. starting from 3, going up to 5, 10, 20, 50, and 100. Using supervised machine learning techniques we also investigated the influence of different features (lexical, character, morphological, etc.) and language types (normative parliamentary speeches and non-normative forum posts). The experiments revealed that the effectiveness of the method and feature types depends more on the language type rather than on the number of candidate authors. The content features based on word lemmas are the most useful type for the normative texts, due to the fact that Lithuanian is a highly inflective, morphologically and vocabulary rich language. The character features are the most accurate type for forum posts, where texts are too complicated to be effectively processed with external morphological tools.
Appears in Collections:3. Konferencijų medžiaga / Conference materials

Files in This Item:
Show full item record

Page view(s)

checked on May 24, 2019


checked on May 24, 2019

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.