Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/30687
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Informatika / Computer science (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Utka, Andrius;Šarkutė, Ligita
Title: Authorship attribution and author profiling of Lithuanian literary texts
Is part of: RANLP 2015 : 10th international conference on recent advances in natural language processing, BSNLP 2015 : 5th workshop on Balto-Slavic natural language processing, 10–11 September 2015, Hissar, Bulgaria : proceedings. Shoumen, Bulgaria : INCOMA Ltd., 2015
Extent: p. 96-105
Date: 2015
Note: Konferencijos internetinis puslapis : http://lml.bas.bg/ranlp2015/cfp2.php ; http://bsnlp-2015.cs.helsinki.fi/index.html
Keywords: Autorystės nustatymas;Literatūriniai tekstai;Lietuvių kalba;Authorship attribution;Literary texts;Lithuanian language
ISBN: 9789544520335
Abstract: In this work we are solving authorship attribution and author profiling tasks (by focusing on the age and gender dimensions) for the Lithuanian language. This paper reports the first results on literary texts, which we compared to the results, previously obtained with different functional styles and language types (i.e., parliamentary transcripts and forum posts). Using the Naïve Bayes Multinomial and Support Vector Machine methods we investigated an impact of various stylistic, character, lexical, morpho-syntactic features, and their combinations; the different author set sizes of 3, 5, 10, 20, 50, and 100 candidate authors; and the dataset sizes of 100, 300, 500, 1,000, 2,000, and 5,000 instances in each class. The highest 89.2% accuracy in the authorship attribution task using a maximum number of candidate authors was achieved with the Naïve Bayes Multinomial method and document-level character tri-grams. The highest 78.3% accuracy in the author pro- filing task focusing on the age dimension was achieved with the Support Vector Machine method and token lemmas. An accuracy reached 100% in the author profiling task focusing on the gender dimension with the Naïve Bayes Multinomial method and rather small datasets, where various lexical, morpho-syntactic, and character feature types demonstrated a very similar performance
Internet: https://hdl.handle.net/20.500.12259/30687
http://bsnlp-2015.cs.helsinki.fi/bsnlp2015-book.pdf
https://eltalpykla.vdu.lt/1/30687
Affiliation(s): Informatikos fakultetas
Kauno technologijos universitetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:1. Straipsniai / Articles
Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml9.17 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

204
checked on Oct 14, 2019

Download(s)

72
checked on Oct 14, 2019

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.