Please use this identifier to cite or link to this item:
Type of publication: Straipsnis / Article
Author(s): Gelumbeckaitė, Jolanta;Zinkevičius, Vytautas;Šinkūnas, Mindaugas
Title: Senosios lietuvių kalbos tekstynas (SLIEKKAS) – nauja diachroninio tekstyno samprata
Other Title: Old Lithuanian reference corpus (SLIEKKAS). A new concept of a historical corpus
Is part of: Darbai ir dienos, 2012, nr. 58, p. 257-281
Date: 2012
Abstract: The Old Lithuanian Reference Corpus (Lith. Senosios lietuvių kalbos tekstynas; acronym SLIEKKAS, Germ. Referenzcorpus Altlitauisch), a comprehensive, deeply annotated reference corpus of Old Lithuanian, is being developed in cooperation between the Goethe-University of Frankfurt am Main (Germany), the Institute of Lithuanian Language (Vilnius, Lithuania), and the University of Pisa (Italy). Its ultimate goal is to develop the linguistic and text-technological basis for the creation of a reference corpus of Old Lithuanian (1500–1800, ca. 10 m. text words) and to test it on the basis of an exemplary corpus comprising ca. 350 000 Old Lithuanian tokens. The attempt to start with a test corpus is driven by the necessity to establish complex multilayered structures that are needed for a diachronic corpus, and to apply them gradually. The envisaged annotation scheme of the Corpus embraces the following structural features: a thorough linguistic and textological annotation, including header information, lemmatisation, grammatical information (Part of Speech-Tagging, morphological and basic syntactical information), glossation (in Modern Lithuanian, English, and possibly other languages), information about the text structure (text subdivision into words, sentences, lines, verses, paragraphs etc.), palaeographic and textological information; a multi-level architecture of the annotations; multi-modality of the corpus through the alignment of the texts with facsimile reproductions of the originals. Since most of the Old Lithuanian texts are translations from Latin, German, or Polish sources, the source texts (in the case of the test corpus ca. 190 000 text words) will be accumulated and annotated in the same way as the Lithuanian ones. This will allow for the alignment of the Old Lithuanian texts with their sources. Furthermore, Old Lithuanian texts of the same genre will be aligned with each other in order to allow for an assessment of possible mutual influences within one and the same genre, as well as across genres. The Old Lithuanian Reference Corpus will be designed to provide an innovative scientific resource for historical and comparative linguistics as well as literary, religious and cultural studies concerning the Baltic countries, including the controversy between pre-Christian and Christian cultures and the confessional spinoff processes of the area as well as their backgrounds. In this way, essential knowledge of the cultural development of Lithuania and the Baltic countries in the given period will be gained. With regard to historical linguistics, the Old Lithuanian Reference Corpus is expected to provide a basis for an efficient development and implementation of further research programmes concerning the diachronic grammar and the lexicon of Lithuanian.
Appears in Collections:Darbai ir dienos / Deeds and Days 2012, nr. 58

Files in This Item:
Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on May 1, 2021


checked on May 1, 2021

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.