Document classification to functional styles (Domains of use): Lithuanian case
Author | Affiliation | |||
---|---|---|---|---|
Baltijos pažangiųjų technologijų institutas | LT | Vilniaus universitetas | LT | |
Date |
---|
2019 |
We report an experiment on classification of Lithuanian texts according to their domain (area of use), i.e. functional style. Functional style is a variety of standard language that is defined by domain, contents, functions, stylistic devices and linguistic means. In this paper we discuss an experiment on document classification into 3 functional styles of Lithuanian language – administrative, publicist and scientific. We compare results of 5 algorithms: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) with kernel function and Naïve Bayes. We also used 8 quantitative linguistic indicators as discriminating features. For administrative style SVM was the most effective (96.5 % of texts classified correctly), for publicist style – LDA (98.9 % of texts classified correctly) and for scientific style – QDA (93.1 % of texts classified correctly). We achieved the best F-score with SVM (94.7 % – for administrative style, 98.9 – for publicist style and 85.9 – for scientific style).
This volume is comprised of research papers from the International Conference on Recent Advancements in Computing in AI, Internet of Things (IoT) and Computer Engineering Technology (CICET), October 21-23, 2019, Taipei, Taiwan. CICET 2019 is hosted by The Tamkang University amid pleasant surroundings in Taipei, which is a delightful city for the conference and traveling around; and co-hosted