Use this url to cite researcher: https://hdl.handle.net/20.500.12259/154977
Now showing 1 - 10 of 122
  • review article
    Acta Linguistica Lithuanica = Lietuvių kalbotyros klausimai
      5  1Scopus© SNIP 0.257
  • Publication
    Kodėl svarbios neasmenuojamosios formos: mokomojo lietuvių kalbos vartosenos leksikono veiksmažodžių tyrimas
    [Why do infinite forms matter: analysis of verbs from the lexical database of Lithuanian language usage]
    research article;
    Taikomoji kalbotyra
    From the corpus data, we observe that in the real language usage, the particular verb does not appear in all theoretically possible finite and infinite verb forms in the morphologically rich Lithuanian but is used in those forms which are relevant for the verb patterning. On the one hand, by teaching vocabulary, is it important to represent lexis in these relevant forms – frequently used forms, and, on the other hand, in grammar teaching, there is a need to provide learners with appropriate vocabulary, e.g., by teaching infinite forms, to use verbs, in the usage of which, these forms are relevant and frequent.In this paper, we provide language teaching practitioners with the data about the frequently used Lithuanian verbs and show which of them and how often appear in infinite forms (participles in passive and active voice, adverbial participles, half participles). As a research data we use 200 verbs from the Lexical Database of Lithuanian Language Usage which was developed on the basis of the written subcorpus of the Pedagogic corpus of Lithuanian. The investigated verbs belong to the frequent vocabulary: in the corpus of approx. 700,000 tokens, these verbs are used 100 times (and above). First, we analysed, which verbs appear in infinite forms, second, we checked whether frequent and typical infinite forms are included into corpus pattern(s) of these particular verbs, and if there is a link between the infinite form and a particular meaning of the verb.All verbs (except of three verbs with no infinite forms) were included into one of three groups: 1) 11 verbs which occur in the infinite forms frequently (more than 50% of all forms – finite and infinite) and, accordingly, typical; 2) 117 verbs with the infinite forms making up from 10 to 50%; 3) 69 verbs, with the infinite forms making up less than 10% of all verb forms. Interestingly, the verbs of the first group, usually have only one infinite form, e.g., participle in passive voice which makes up more than 50% of all forms of verb. These cases are also frequently observed in the second verb group. Thus, if the verb tends to be used in infinite forms, it is important to know which infinite form is relevant to that particular verb.In the Lexical Database of Lithuanian Language Usage, lexical and grammatical patterning of the word is represented in the form of corpus patterns. In this study, we showed the interrelation between the frequently used infinite forms of the verb and its corpus patterns (also, corpus patterns related to particular meaning of the polysemous verb). We can expect various applications of the provided data in the Lithuanian as a foreign language teaching: the provided data about the verbs typical and frequent in infinite forms and the corpus patterns including these infinite forms can be used for building vocabulary training as well as for developing grammar exercises.
      16  5Scopus© SNIP 0.235
  • book; ;
    Kaunas : Vytauto Didžiojo universitetas, 2022
    Mokomoji priemonė – tai PDF formatu išleistas elektroninis leidinys "Lietuvių kalbos kolokacijos: vartojimas, mokymas(is) ir vertimas" ir būtina jo dalis – internetinė svetainė, kurioje galima interaktyviai atlikti kai kurias mokomosios priemonės užduotis, pasitikrinti atsakymus, peržiūrėti komentarus. Elektroniniame leidinyje yra dvi dalys: pirmoji knygos dalis „Kolokacijų mokymas(is)“ pirmiausia skirta mokytojams ir dėstytojams, kurie moko lietuvių kalbos kaip svetimosios, studentams filologams, kurie domisi lietuvių kaip svetimosios mokymu, ir, žinoma, aukštesniųjų lygių (B2–C2) svetimkalbiams mokiniams. Antrosios knygos dalies „Kolokacijų vertimas“ auditorija – gimtakalbiai studentai, kurie studijuoja vertimą, vertimo krypčių dėstytojai.
      58
  • Publication
    Mokomasis lietuvių kalbos vartosenos leksikonas – nauja tekstyno pagrindu parengta leksinė bazė
    [A new corpus-driven lexical database for Lithuanian as a foreign language]
    research article;
    Darnioji daugiakalbystė : periodinis mokslo žurnalas = Sustainable multilingualism : biannual scientific journal.
    In this paper, we describe a new lexicographic resource for advanced learners of Lithuanian, the Lexical Database of Lithuanian Language Usage, which is the first attempt in Lithuanian lexicography to prepare a description of vocabulary based on the word usage analysis in the particular corpus. The written subpart of the Lithuanian Pedagogic Corpus (approx. 620,000 tokens) was used to develop headword lists and collect word usage information in the form of corpus patterns. In the database, there are 3,700 lexical items, words and multi-word units (compounds, idioms or sayings). For the appr. 700 most frequent words from a shared vocabulary (they appear in texts assigned to A1, A2, B1 and B2 levels, and their frequency in the whole corpus is 100 occurrences and above), we prepared a full-record entry: it includes sense-related corpus patterns with grammatical, semantic and lexical information and the examples illustrating all pattern components. The short-record entry (no patterns, only examples) is prepared for the less frequent words from the shared vocabulary, which are derivationally related to the most frequent headwords. The users are provided with 2,542 derivatives, which are linked to 940 headwords. In the database, 28,550 encoding examples are manually selected for all 3,000 headwords and 700 phrases. We discuss the features of the database, and, particularly, the adopted semi-automated procedure of Corpus Pattern Analysis, which was used for the description of word usage. We evaluate the approach applied,and discuss its advantages for users as well as provide the suggestions for the future improvements of the resource, which can be used as an additional resource in the classroom of Lithuanian as a foreign language, and, together with the available corpora, fill in a gap of usage information in the existing (learner) dictionaries.
      26Scopus© SNIP 0.229
  • Šiame žodyne yra pateikiamos „Lietuvių kalbos pastoviųjų žodžių junginių duomenų bazėje“ (žr. https://resursai.pastovu.vdu.lt/paieska/paprastoji) sukauptos lietuvių kalbos kolokacijos, kurios laikytinos arbitraliosiomis kolokacijomis (toliau AK) (pvz., skirti dėmesį, platus akiratis) ir skiriasi nuo trivialiųjų kolokacijų (pvz., graži moteris, saulėta diena). Šioje duomenų bazėje yra 18790 kolokacijų, 8861 iš jų laikoma AK. Pagrindinėje žodyno dalyje AK išdėstytos pagal daiktavardį, nes jis dažniausiai būna pagrindinis kolokacijos žodis (bet nebūtinai sintaksiškai pagrindinis žodis). AK yra įvairios struktūros: su būdvardžiais, su daiktavardžiais, su veiksmažodžiais (pastarosios kolokacijos sugrupuotos ir pagal sintaksinį ryšį). Pirmajame priede pateikiamos visos į žodyną įtrauktos AK, išdėstytos pagal abėcėlę. Prie kiekvienos kolokacijos pateikiamos kaitybinės formos, išdėstytos dažnio mažėjimo tvarka (dažnis pažymėtas skliaustuose). Antrajame priede yra pagal abėcėlę išdėstyti visi žodžiai (jų lemos), kurie pavartoti žodyne pateiktose kolokacijose. Į AK įeina 1202 daiktavardžiai, 742 veiksmažodžiai, 366 būdvardžiai, 7 prielinksniai, 6 įvardžiai, 4 prieveiksmiai.
      37
  • book; ;
    Kaunas : Vytauto Didžiojo universitetas, 2022
    In this user manual, we aim to present three newly developed digital resources that can be used for research purposes and in teaching Lithuanian as a foreign language. These empirical databases include the Pedagogic Corpus of Lithuanian (hereinafter referred to as ‘the pedagogic corpus’), the Lithuanian Learner Corpus (LLC) and the corpus-driven Lexical Database of Lithuanian (henceforth ‘lexical database’). These resources were created in 2017–2019 within the framework of the EU-funded project “Lithuanian Academic Scheme for International Cooperation in Baltic Studies” and are publicly available at https://kalbu.vdu.lt/. The resources primarily target teachers, researchers, and learners of Lithuanian as a foreign language but can also be relevant for anyone else interested in Lithuanian. They were developed to represent authentic use of the Lithuanian language as it is used by native speakers (in the pedagogic corpus and the lexical database) and non-native speakers (in the LLC); so far, such resources have not been publicly available. The new data in the two corpora with integrated automated search possibilities offers new potentials for (learner) language research and language teaching. In the corpus-driven lexical database, users will find the usage information of 3,700 lexical items (words and multi-word units). The pedagogic corpus (https://kalbu.vdu.lt/mokymosi-priemones/mokomasis-tekstynas/) contains authentic Lithuanian language texts, selected according to criteria that are relevant to language learners of different proficiency levels. All the texts are classified into levels A1, A2, B1 and B2 according to the Common European Framework of Reference for Languages (CEFR). The corpus represents both written data and orthographically transcribed spoken data: 111,000 words for levels A1-A2 (96,000 words in the written component and 15,000 words in the spoken component); and 558,000 words to represent levels B1-B2 (523,000 words in the written part and 35,000 words in the spoken component). In total, the corpus contains 669,000 words. The spoken part of the corpus consists of natural conversations recorded in different settings, covering different communicative situations and various social roles of the interlocutors. Some of the texts were taken from the Corpus of Spoken Lithuanian (see http://sakytinistekstynas. vdu.lt/). The written component consists of two types of texts: (1) texts collected from coursebooks for learners of Lithuanian as a foreign language (they make up about 17% of the entire written subcorpus), and (2) texts collected from popular scientific and fiction books, news portals, public signs, instructions, announcements, documents, etc. (they amount up to about 83% of the entire written subcorpus). Coursebook and non-coursebook texts are classified into 29 genres (dialogues, narratives, instructional texts, etc.) and correspond to four groups according to communication goals (informative, popular scientific texts, appelative, and imaginative). In the LLC, just like in the pedagogic corpus, the texts of Lithuanian language learners are classified into levels A1, A2, B1 and B2 (based on the CEFR). The texts are divided into these four levels of proficiency on the basis of diagnostic placement tests or the amount of Lithuanian language contact hours received in formal education. The corpus (https://kalbu.vdu.lt/mokymosi-priemones/mokiniu-tekstynas/) contains both written and spoken language: 103,148 words in A1 level texts (81,339 written languages and 21,809 spoken languages); 99,359 words in A2 level texts (85,158 written languages and 14,201 spoken languages); 64,400 words for B1 level texts (39,558 written languages and 24,842 spoken languages); and 51,734 words in B2 level texts (24,211 written languages and 27,523 spoken languages). In total, this corpus comprises 318,641 tokens. The LLC represents a large variety of text types (essays, narratives, argumentative texts, letters, emails, postcards, etc.). In addition, it provides information on the linguistic background of the learner, the learning task for which a text was produced, and the learning context. The corpus is normalised and annotated for errors in grammar, lexis, syntax, pronunciation. The lexical database (https://kalbu.vdu.lt/mokymosi-priemones/leksikonas/) is developed on the basis of the written subcorpus of the pedagogic corpus consisting of approximately 620,000 words. This small, monolingual, and morphologically annotated corpus was used to develop the list of headwords for the lexicon and to collect the word usage information. The headword list consists of two categories: (1) approximately 700 most frequent words, which are used at least 100 times in all four levels represented in the pedagogic corpus (from level A1 to B2); and (2) derivatives, compounds, and multi-word units related to these most common 700 words; they make up 3,000 lexical items. Thus, in total, the lexicon contains 3,700 lexical items including individual words and multi-word units (such as compound names, fixed expressions, and sayings). In the headword list, lexical items functioning as verbs, nouns, adjectives, and adverbs are included, and the category of fixed expressions and sayings also includes interjections. Two types entries are used in the lexical database to represent information obtained from the corpus data. The entry of words with frequency of 100 and above is a full-record entry including pronunciation, inflections, corpus patterns, examples of use, and derivatives related to different word meanings. For derivatives and multi-word units related to the most frequent vocabulary, a short-record entry is provided, which contains pronunciation, inflections, examples of use, and derivatives. A large number of word patterns, examples, and lexical relations provide language teachers and learners with valuable information needed to improve language production skills. Corpora and corpus-based reference sources have already become indispensable resources of authentic language use especially in the research and teaching of such widely used languages as English. Thus, following this relatively recently established tradition, we aim to show how the newly developed resources of Lithuanian can contribute to the development of data-driven (or evidence-based) research as well as teaching materials and curricula. In this manual, we provide some suggestions as to how the new databases can be used to find new answers to some well-known issues and to develop new questions that can be triggered by intuitively unnoticed trends in language use but which can become apparent through data-driven language analysis. We also offer some types of data-driven language teaching activities, which can be further extended or supplemented with some other focus activities. In this book, we present the structure, nature, and main features of the corpora and the lexicon; we overview the principles of development, relevance and practical application of these resources. When using any empirical or lexicographic resource, it is important to know the principles behind it, so this book will explain the methods used to collect and systematize the empirical material, discuss search parameters and their rationale, and provide examples and suggestions as to how the data available in the resources can be researched and used in language teaching. We discuss each resource in detail in the following order: the pedagogic corpus is presented in Chapter 1, the lexical database is introduced in Chapter 2, and the learner corpus is overviewed in Chapter 3.
      104
  • This paper gives an overview of the conducted research on Lithuanian multi-wordexpressions, particularly, collocations, and presents the developed resources. Beginning from the identification, analysis, and lexicographic description of MWEs in general, the focus of the research was narrowed down to collocations and their certain features, such as arbitrariness, during the later stages. Arbitrary collocations were seen as having lexically motivated relations and a certain degree of restricted collocability of constituents. Within the framework of the two projects, PASTOVU (2016-2018) and ARKA (2020-2022), corpus-driven approaches were applied to extract and document Lithuanian collocations and develop a number of lexical resources to be discussed in this paper.
      28WOS© IF 0.7WOS© AIF 1.7Scopus© SNIP 0.519
  • Leksikonas yra elektroninė leksinė bazė, kurioje sukaupta medžiaga skirta lietuvių kalbos kaip svetimosios mokymui(si). Leksikonas parengtas Mokomojo tekstyno pagrindu, panaudojant rašytinės kalbos dalį, kurią sudaro apie 620 tūkst. žodžių. Šis nedidelis, vienakalbis, morfologiškai anotuotas tekstynas yra sudarytas lietuvių kalbos mokymo(si) reikmėms, todėl naudotas kaip pagrindinis leksikono šaltinis – antraštynui sudaryti ir leksinių vienetų vartosenos dėsningumams ištirti. Leksikone pateikta 3700 leksinių vienetų – žodžių ir pastoviųjų žodžių junginių (sudėtinių pavadinimų, frazeologizmų, posakių). Leksikono antraštyną sudaro: 1) žodžiai (veiksmažodžiai, daiktavardžiai, būdvardžiai ir prieveiksmiai), daugiau nei 100 kartų pavartoti visuose keturiuose Mokomojo tekstyno lygiuose nuo A1 iki B2, iš viso tokių žodžių aprašyta 700; 2) su dažniausiais 700 žodžių susiję tekstyne esantys dariniai ir pastovieji žodžių junginiai, iš viso 3000 leksinių vienetų. Pagrindinis leksikono tikslas buvo sukaupti duomenų lietuvių kalbos mokymui(si) aukštesniuosiuose lygiuose, t. y. pateikti kuo daugiau informacijos apie leksinio vieneto (žodžio ir pastoviojo žodžių junginio) vartoseną – autentišką, būdingą dabartinei lietuvių kalbai ir aktualią kalbos mokymuisi.Leksikone rasite informacijos, kaip aprašyti žodžiai ir junginiai vartojami dabartinėje lietuvių kalboje: kaip jie rašomi, tariami, kokiomis formomis dažniausiai vartojami, kokia jiems būdinga leksinė ir gramatinė aplinka. Leksinei ir gramatinei aplinkai atskleisti naudojami ne tik pavyzdžiai, bet ir vartosenos modeliai – juose matyti skirtingoms žodžio reikšmėms būdinga gramatika ir leksika. Leksikone nepateikiami žodžių reikšmių aiškinimai – atskirti ir suprasti žodžio reikšmes padeda vartosenos modeliuose fiksuojami dėsningumai. [...]
      41