Application of machine learning for MWE identification

Bumbulienė, Ieva; Mandravickaitė, Justina; Krilavičius, Tomas

Use this url to cite publication: https://hdl.handle.net/20.500.12259/57508

Application of machine learning for MWE identification

Type of publication

Konferencijų tezės nerecenzuojamame leidinyje / Conference theses in non-peer-reviewed publication (T2)

Author(s)

Author	Affiliation
Bumbulienė, Ieva	Baltijos pažangiųjų technologijų institutas	LT
Mandravickaitė, Justina	Baltijos pažangiųjų technologijų institutas	LT	Taikomosios informatikos katedra / Department of Applied Informatics	LT

Title

Application of machine learning for MWE identification

[en]

Part Of

Data analysis methods for software systems – DAMSS: 9th international workshop, Druskininkai, Lithuania, November 30 - December 2, 2017 / editor Jolita Bernatavičienė

Date Issued

Date
2017

Publisher

Vilnius : Vilnius University Institute of Data Science and Digital Technologies, 2017

Extent

p. 10-10

URI

URI
https://hdl.handle.net/20.500.12259/57508

Field of Science

Keywords (en)

Abstract (en)

Identification of Multiword Expressions is an important problem in Natural Language Processing, especially for machine translation and other semantic analysis tasks. Often, lexical association measures (LAM), such as pointwise mutual information (PMI), log likelihood ratio (LLR), Dice are used to identify MWE's. However, just LAMs are insufficient for MWE detection, especially for Lithuanian language, but could be very useful as additional features for Machine Learning (ML) algorithms. Early experiments with Lithuanian and Latvian languages show that using Random Forest with Resample filter, we can achieve almost 99% precision, 58% recall and 73% F-score. We discuss experiments with delfi.lt based corpora, different features, including LAMs, as well as experiments with different ML methods, i.e., Naive Bayes, Random Forests, Support Vector Machines, Artificial Neural Networks and others.

Type of document

type::text::conference output::conference proceedings::conference paper

Language

Anglų / English (en)

Coverage Spatial

Lietuva / Lithuania (LT)

ISBN (of the container)

9789986680642

Other Identifier(s)

VDU02-000022155