Please use this identifier to cite or link to this item:
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Informatika / Computer science (N009)
Author(s): Briedienė, Monika;Kapočiūtė-Dzikienė, Jurgita
Title: An automatic gender detection from non-normative Lithuanian texts
Is part of: CEUR Workshop proceedings [electronic resource]: SYSTEM 2017: proceedings of the symposium for young scientists in technology, engineering and mathematics, Kaunas, Lithuania, April 28, 2017. Aachen : CEUR-WS, 2017, Vol. 1853
Extent: p. 75-79
Date: 2017
Keywords: Gender detection;Non-normative Lithuanian language;Supervised machine learning
Abstract: This paper describes the gender detection research done on Lithuanian texts using automatic machine learning methods. The main contribution of our work is investigations done namely on the very short (avg. ~ 39 tokens) non-normative texts. With this paper we analyze a fundamental problem: how to choose automatic methods (in particular, classifiers and feature types) that could achieve the highest accuracy in our solving author profiling task (when the short pure text itself is the only evidence used for determining the author’s meta-information). The related research analysis helped us to select the methods which demonstrated encouraging results on the other languages and to apply them on the Lithuanian dataset. Out of a number of experimentally investigated classifiers with lexical or symbolic features the Naïve Bayes Multinomial method with character n-grams (of n = [1, 5]) feature type yielded the best performance reaching 83.6% of the accuracy
Affiliation(s): Informatikos fakultetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml6.51 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

checked on Dec 9, 2019


checked on Dec 9, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.