Please use this identifier to cite or link to this item:
Type of publication: conference paper
Type of publication (PDB): Konferencijų tezės nerecenzuojamuose leidiniuose / Conference theses in non-peer-reviewed publications (T2)
Field of Science: Informatika / Informatics (N009)
Author(s): Bumbulienė, Ieva;Mandravickaitė, Justina;Krilavičius, Tomas
Title: Application of machine learning for MWE identification
Is part of: Data analysis methods for software systems – DAMSS: 9th International Workshop, Druskininkai, Lithuania, November 30-December 2, 2017 / editor Jolita Bernatavičienė. Vilnius : Vilnius University Institute of Data Science and Digital Technologies, 2017
Extent: p. 10-10
Date: 2017
Keywords: Machine learning;Natural language processing;Multiword expressions
ISBN: 9789986680642
Abstract: Identification of Multiword Expressions is an important problem in Natural Language Processing, especially for machine translation and other semantic analysis tasks. Often, lexical association measures (LAM), such as pointwise mutual information (PMI), log likelihood ratio (LLR), Dice are used to identify MWE's. However, just LAMs are insufficient for MWE detection, especially for Lithuanian language, but could be very useful as additional features for Machine Learning (ML) algorithms. Early experiments with Lithuanian and Latvian languages show that using Random Forest with Resample filter, we can achieve almost 99% precision, 58% recall and 73% F-score. We discuss experiments with based corpora, different features, including LAMs, as well as experiments with different ML methods, i.e., Naive Bayes, Random Forests, Support Vector Machines, Artificial Neural Networks and others
Affiliation(s): Baltijos pažangių technologijų institutas, Vilnius
Baltijos pažangiųjų technologijų institutas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on Jun 6, 2021


checked on Jun 6, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.