Sentiment analysis and sarcasm detection in texts using machine learning approaches

Rong, Tianzheng

Use this url to cite ETD: https://hdl.handle.net/20.500.12259/130179

Sentiment analysis and sarcasm detection in texts using machine learning approaches

Type of publication (PDB)

Magistro darbas / Master Thesis

Field of Science

Informatika / Informatics (N009)

Degree Discipline

Taikomoji informatika / Applied Informatics (M)

Type of publication

type::text::thesis::master thesis

Title

Sentiment analysis and sarcasm detection in texts using machine learning approaches

Other Title

Sentimentų analizė ir sarkazmo nustatymas iš tekstų taikant mašininio mokymosi metodikas

Author

Rong, Tianzheng

Advisor

Kapočiūtė-Dzikienė, Jurgita

Extent

49 p.

Thesis Defence Date

2021-05-27

Keywords (lt)

Keywords (en)

Abstract (lt)

Ironijos ir sarkazmo nustatymas iš teksto – viena įdomiausių kalbos technologijų temų. Socialinė žiniasklaidos plėtra (ypač socialinių tinklų) skatina sarkazmo naudojimą internete. Sarkazmą tekste aptikti galima tik jį tyrinėjant semantiniu, o ne leksiniu lygiu. Ypač svarbi tampa ir tyrinėjamo teksto kalba, nes skirtingos kalbos pasižymi skirtingomis savybėmis. Sarkazmo nustatymo uždavinys – tai tam tikro tipo teksto klasifikavimo uždavinys, kuriam įprastai taikomi tiek tradiciniai mašininio mokymosi metodai (pvz., paprastasis Bejesas, atraminių vektorių mašina), tiek giliojo mokymosi metodai (ilgos trumpalaikės atminties metodas, konvoliuciniai neuroniniai tinklai), tiek transformaciniai modeliai (BERT). Magistrinio darbo metu atlikti eksperimentiniai tyrimai ir gauti rezultatai parodė kiek stipriai tikslumas yra įtakojamas modelio parametrų bei įvesties kalbos.

Abstract (en)

Detecting irony and sarcasm in social media represents an interesting topic for NLP researchers. The development of social media drives the use of sarcasm on the Internet. Sarcasm detection is based on the exploration of semantics and context, rather than the lexical level. Moreover, the language of the input texts should also be considered. Different languages have different attributes. Explore can be made from this point to achieve the purpose of improving accuracy. No matter the low-cost traditional machine learning methods (NBM, SVM), or the deep learning methods (LSTM, CNN) which are suit for the complex problems or the transfer learning methods (BERT) based on the pre-training model, each of the methods is worth trying on the sarcasm detection tasks. The results obtained showed the influence on the accuracy of the change on model parameters and input language.