Please use this identifier to cite or link to this item:
Type of publication: research article
Type of publication (PDB): Straipsnis kitose duomenų bazėse / Article in other databases (S4)
Field of Science: Informatika / Informatics (N009)
Author(s): Mandravickaitė, Justina;Krilavičius, Tomas;Man, Ka Lok
Title: A Combined approach for automatic identification of multi-word expressions for Latvian and Lithuanian
Is part of: IAENG. International journal of computer science. Hong Kong : International Association of Engineers, 2017, Vol. 44, iss. 4
Extent: p. 598-606
Date: 2017
Note: ISSN: 1819-9224 (online version). Manuscript received October 10, 2017. This research was partly funded by a grant (No. LIP-027/2016) from the Research Council of Lithuania
Keywords: Hybrid-approach;Lexical-associationmeasures;Machine-learning;Multi-word-expression
Abstract: We discuss an experiment on automatic identification of bi-gram multiword expressions (MWE) in parallel Latvian and Lithuanian corpora. Raw corpora, lexical association measures (LAMs) and supervised machine learning (ML) are used due to the scarceness and quality of lexical resources (e.g., POS-tagger, parser) and tools. Combining LAMs with ML works well for other languages, our experiments show that it perform well for Lithuanian and Latvian as well. We analyse and discuss frequency thresholds in terms of potential MWE and LAMs values. Finally, combining LAMs with ML we have achieved 98,8% precision and 57,5% recall for Latvian and 96,9% precision and 61,8% recall for Lithuanian
Affiliation(s): Baltijos pažangiųjų technologijų institutas
Taikomosios informatikos katedra
Vilniaus universitetas
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on Dec 24, 2021


checked on Dec 24, 2021

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.