Please use this identifier to cite or link to this item:
Type of publication: Straipsnis recenzuojamoje užsienio tarptautinės konferencijos medžiagoje / Article in peer-reviewed foreign international conference proceedings (P1d)
Field of Science: Filologija / Philology (H004)
Author(s): Rimkutė, Erika;Utka, Andrius;Daudaravičius, Vidas
Title: Morphological annotation of the Lithuanian corpus
Is part of: ACL 2007: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistic, June 23-30, 2007: Workshop on Balto-Slavonic Natural Language, Special Theme: Information Extraction and Enabling Technologies, June 29, 2007 Prague, Czech Republic. USA : Association for Computational Linguistics, 2007
Extent: p. 94-99
Date: 2007
Keywords: Language;Lithuanian
ISBN: 9781932432862
Abstract: As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lithuanian corpus are presented in the article. Statistical methods have enabled to develop the automatic tool of morphological annotation for Lithuanian, with the disambiguation precision of 94%. Statistical data about the distribution of parts of speech, most frequent wordforms, and lemmas, in the annotated Corpus of The Contemporary Lithuanian Language is also presented
Affiliation(s): Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml8.14 kBXMLView/Open

MARC21 XML metadata

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on Dec 9, 2020


checked on Dec 9, 2020

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.