Please use this identifier to cite or link to this item:
Type of publication: research article
Type of publication (PDB): Straipsnis konferencijos medžiagoje kitose duomenų bazėse / Article in conference proceedings in other databases (P1c)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Utka, Andrius;Šarkutė, Ligita
Title: Authorship attribution and author profiling of Lithuanian literary texts
Is part of: RANLP 2015 : 10th international conference on recent advances in natural language processing, BSNLP 2015 : 5th workshop on Balto-Slavic natural language processing, 10–11 September 2015, Hissar, Bulgaria : proceedings. Shoumen, Bulgaria : INCOMA Ltd., 2015
Extent: p. 96-105
Date: 2015
Note: Konferencijos internetinis puslapis : ;
Keywords: Autorystės nustatymas;Literatūriniai tekstai;Lietuvių kalba;Authorship attribution;Literary texts;Lithuanian language
ISBN: 9789544520335
Abstract: In this work we are solving authorship attribution and author profiling tasks (by focusing on the age and gender dimensions) for the Lithuanian language. This paper reports the first results on literary texts, which we compared to the results, previously obtained with different functional styles and language types (i.e., parliamentary transcripts and forum posts). Using the Naïve Bayes Multinomial and Support Vector Machine methods we investigated an impact of various stylistic, character, lexical, morpho-syntactic features, and their combinations; the different author set sizes of 3, 5, 10, 20, 50, and 100 candidate authors; and the dataset sizes of 100, 300, 500, 1,000, 2,000, and 5,000 instances in each class. The highest 89.2% accuracy in the authorship attribution task using a maximum number of candidate authors was achieved with the Naïve Bayes Multinomial method and document-level character tri-grams. The highest 78.3% accuracy in the author pro- filing task focusing on the age dimension was achieved with the Support Vector Machine method and token lemmas. An accuracy reached 100% in the author profiling task focusing on the gender dimension with the Naïve Bayes Multinomial method and rather small datasets, where various lexical, morpho-syntactic, and character feature types demonstrated a very similar performance
Affiliation(s): Kauno technologijos universitetas
Lituanistikos katedra
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:3. Konferencijų medžiaga / Conference materials
Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml9.39 kBXMLView/Open

MARC21 XML metadata

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on May 1, 2021


checked on May 1, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.