Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/36092
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Matematika / Mathematics (N001)
Author(s): Stanikūnas, Daumantas;Mandravickaitė, Justina;Krilavičius, Tomas
Title: Comparison of distance and similarity measures for stylometric analysis of Lithuanian texts
Is part of: CEUR Workshop proceedings [electronic resource]: ICYRIME 2017 : proceedings of the symposium for young researchers in informatics, mathematics and engineering, Kaunas, Lithuania, April 28, 2017. Aachen : CEUR-WS, 2017, Vol. 1852
Extent: p. 1-7
Date: 2017
Keywords: Stylometry;Computational stylistics;Statistical analysis;Data visualization
Abstract: Constant developments in information and computer technologies make it possible to handle constantly increasing amount of data, thereby expanding the research possibilities. In this article, we discuss and compare distance and similarity measures used in stylometric analysis which could be applied to analyze Lithuanian texts. As corpus for the analysis, transcripts of parliamentary debates by two politicians of the Lithuanian Parliament were chosen. Furthermore, comparison of distance measures, stylometric analysis and visualization were performed. Objective of the experiment was to identify what measures would perform better when executing stylometric analysis of Lithuanian texts and explore where these differences in the performance occur. Summarizing the experiment results, the recommendations are as follow: number of Most Frequent Words used should be at least 1000, Eder's Simple Delta measure can be used in general stylometric analysis of transcriptions of parliamentary debates of Lithuanian Parliament, in a case when Most Frequent Words are limited to 2000, Binomial Index shows an increase in performance over Eder's Simple Delta and thus it is more suitable
Internet: https://hdl.handle.net/20.500.12259/36092
https://eltalpykla.vdu.lt/handle/1/36092
http://ceur-ws.org/Vol-1852/p01.pdf
Affiliation(s): Baltijos pažangių technologijų institutas, Vilnius
Baltijos pažangiųjų technologijų institutas
Informatikos fakultetas
Matematikos ir statistikos katedra
Taikomosios informatikos katedra
Vilniaus universitetas
Vytauto Didžiojo universitetas
Appears in Collections:3. Konferencijų medžiaga / Conference materials
Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml7.92 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

152
checked on Oct 14, 2019

Download(s)

48
checked on Oct 14, 2019

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.