Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/101578
Type of publication: Straipsnis kitose duomenų bazėse / Article in other databases (S4)
Field of Science: Informatika / Informatics (N009)
Author(s): Mandravickaitė, Justina;Krilavičius, Tomas;Man, Ka Lok
Title: Document classification to functional styles (Domains of use): Lithuanian case
Is part of: International journal of design, analysis and tools for integrated circuits and systems. Hong Kong: Solari (HK) Co., 2019, vol. 8, iss. 1
Extent: p. 38-41
Date: 2019
Note: This volume is comprised of research papers from the International Conference on Recent Advancements in Computing in AI, Internet of Things (IoT) and Computer Engineering Technology (CICET), October 21-23, 2019, Taipei, Taiwan. CICET 2019 is hosted by The Tamkang University amid pleasant surroundings in Taipei, which is a delightful city for the conference and traveling around; and co-hosted
Keywords: Document classification;Functional styles;Quantitative linguistic indicators
Abstract: We report an experiment on classification of Lithuanian texts according to their domain (area of use), i.e. functional style. Functional style is a variety of standard language that is defined by domain, contents, functions, stylistic devices and linguistic means. In this paper we discuss an experiment on document classification into 3 functional styles of Lithuanian language – administrative, publicist and scientific. We compare results of 5 algorithms: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) with kernel function and Naïve Bayes. We also used 8 quantitative linguistic indicators as discriminating features. For administrative style SVM was the most effective (96.5 % of texts classified correctly), for publicist style – LDA (98.9 % of texts classified correctly) and for scientific style – QDA (93.1 % of texts classified correctly). We achieved the best F-score with SVM (94.7 % – for administrative style, 98.9 – for publicist style and 85.9 – for scientific style)
Internet: https://doi.org/10.15388/LMITT.2019
http://datics.org/ijdatics_/current_issues/CICET2019_Proceedings.pdf
Affiliation(s): Baltijos pažangių technologijų institutas, Vilnius
Baltijos pažangiųjų technologijų institutas
Informatikos fakultetas
Taikomosios informatikos katedra
Vilniaus universitetas
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml7.05 kBXMLView/Open

MARC21 XML metadata

Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.