Please use this identifier to cite or link to this item:
Type of publication: Straipsnis konferencijos medžiagoje kitose duomenų bazėse / Article in conference proceedings in other databases (P1c)
Field of Science: Informatika / Informatics (N009)
Author(s): Bumbulienė, Ieva;Mandravickaitė, Justina;Boizou, Loic;Krilavičius, Tomas
Title: An overview of Lithuanian internet media n-gram corpus
Is part of: CEUR workshop proceedings [electronic resource]: SYSTEM 2017: proceedings of the symposium for Young Scientists in Technology, Engineering and Mathematics, Kaunas, Lithuania, April 28, 2017. Aachen : CEUR-WS, 2017, Vol. 1853
Extent: p. 24-28
Date: 2017
Keywords: Internet media;Lithuanian Internet;N-gram corpus
Abstract: This paper describes construction and properties of the open 70 million words Lithuanian Internet media n-gram corpus. Due to copyright limitations often contemporary media based resources availability is restricted, while n-grams corpora (e.g., Google N-gram viewer/corpus) solve the problem. Lithuanian language is under-resourced, hence n-gram corpus of Lithuanian media is designed to contribute to publicly available ready-to-use lexical resources. In this paper we report corpus construction procedure, preprocessing, corpus statistics and possible areas of application
Affiliation(s): Baltijos pažangių technologijų institutas, Vilnius
Baltijos pažangiųjų technologijų institutas
Informatikos fakultetas
Taikomosios informatikos katedra
Vilniaus universitetas
Vytauto Didžiojo universitetas
Appears in Collections:3. Konferencijų medžiaga / Conference materials
Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml8.11 kBXMLView/Open

MARC21 XML metadata

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

Page view(s)

checked on Mar 5, 2020


checked on Mar 5, 2020

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.