Use this url to cite researcher: https://hdl.handle.net/20.500.12259/154462
Now showing 1 - 4 of 4
- The aim of the paper is the estimate of the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing (by Microsoft Corporation), and Yandex (by ÎÎÎ ¾ßíäåêñ¿, Russia). For this purpose, a special list of 100 rare Lithuanian words (pivot words) with specific characteristics was compiled. Low frequency of pivot words is crucial to consider the count of document matches reported by GSE as an indicator of the word count. Statistical analysis has shown the following amounts of Lithuanian words as of April 2022: 56 billion words by Google, 29 billion words by Bing and 41 billion words by Yandex. Comparative results for neighbouring Belarusian (∼0.31×LT), Estonian (∼1.45×LT), Finnish (∼2.4×LT), Latvian (∼0.95×LT), Polish (∼11×LT), and Russian (∼49×LT) languages have also been assessed.
28WOS© IF 0.7WOS© AIF 1.7Scopus© SNIP 0.519 12
- CLARIN-LT consortium is one of the leading Lithuanian language re-search and digital data storage infrastructures. This chapter will present outreach and initiatives performed by or in cooperation with the CLARIN-LT consortium and highlight their most significant outcomes. We will first highlight some of the resources stored in the CLARIN-LT repository and present their usage statistics. Next, we will show a use case of scientific outreach, followed by a success story involving the cooperation of large-scale national projects and CLARIN-LT in the development of IT services for Lithuanian. Finally, we will demonstrate an example of CLARIN content integration in university classes. The initiatives we overview here, although they have different aims and audiences, share one common feature – they all found a home at the CLARIN-LT repository. The presented use cases and success stories performed by or in cooperation with the CLARIN-LT consortium during the relatively short period of time since its establishment in 2015 show that the infra-structure is gaining recognition and is increasingly being addressed by scientific, educational, public, and private communities.
34Scopus© Citations 4 Tradicinių aiškinamųjų žodynų antraštyno atnaujinimo būdasPublication[A method to update traditional explanatory dictionaries]In the paper the method is presented how to update traditional digitalised dictionaries based on comparison of the dictionary lemmas and a big corpus. Hunspell platform is used for generation of all the word forms from the dictionary lemmas. 6th edition of The Dictionary of Modern Lithuanian was chosen for its comparison with the lexical data from The Joint Corpus of Lithuanian. The outcome of the comparison was two lists of non-overlapping lexis: the list of the dictionary lemmas unused in the present-day Lithuanian and the list of the dictionary gaps, i.e., frequently used words and word forms ignored by the dictionary. The latter is discussed in greater detail to give lexicographers a clue for updates. 16 44Scopus© Citations 1Scopus© SNIP 0