Sentiment analysis of Lithuanian texts using traditional and deep learning approaches

Kapočiūtė-Dzikienė, Jurgita; Damaševičius, Robertas; Wozniak, Marcin

doi:10.3390/computers8010004

Use this url to cite publication: https://hdl.handle.net/20.500.12259/61081

Sentiment analysis of Lithuanian texts using traditional and deep learning approaches

Type of publication

Straipsnis Web of Science ir Scopus duomenų bazėje / Article in Web of Science and Scopus database (S1)

Author(s)

Author	Affiliation
Kapočiūtė-Dzikienė, Jurgita	Taikomosios informatikos katedra / Department of Applied Informatics	LT
Damaševičius, Robertas	Kauno technologijos universitetas	LT
Wozniak, Marcin

Title

Sentiment analysis of Lithuanian texts using traditional and deep learning approaches

[en]

Is part of

Computers. Basel : MDPI, 2019, Vol. 8, iss. 1

Date Issued

Date
2019

Publisher

Basel : MDPI

Is Referenced by

Emerging Sources Citation Index (Web of Science)

Scopus

Extent

p. 1-16

URI

URI
https://www.vdu.lt/cris/bitstream/20.500.12259/61081/2/ISSN2073-431X_2019_V_8_1.PG_1-16.pdf
https://doi.org/10.3390/computers8010004
https://hdl.handle.net/20.500.12259/61081

DOI

10.3390/computers8010004

Field of Science

Keywords (lt)

Keywords (en)

Abstract (en)

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.

Type of document

type::text::journal::journal article::research article

Language

Anglų / English (en)

Coverage Spatial

Šveicarija / Switzerland (CH)

Description

(This article belongs to the Special Issue Selected Papers from the 24th International Conference on Information and Software Technologies (ICIST 2018))

File(s)

Owning collection

Universiteto mokslo publikacijos / University Research Publications

Mapped collections

1. Straipsniai / Articles

ISSN (of the container)

2073-431X

WOS

WOS:000464345200001

Other Identifier(s)

VDU02-000023736

Access Rights

Atviroji prieiga / Open Access

Taikomosios informatikos katedra / Department of Applied Informatics

Informatikos fakultetas / Faculty of Informatics

Journal	Cite Score	SNIP	SJR	Year	Quartile
Computers	2.5	1.25	0.361	2019	Q3