Digital Research on Text and Voice, Development and Application of Resources and Technologies

Research overview

Research and activities related to Lithuanian language resources started at VMU in 1994 when the Centre of Computational Linguistics ( The results comprise the big „Corpus of the Contemporary of the Lithuanian Language“(150 million running words of texts and its spoken component supplied by  language specific tools: a corpus query system and collocation extraction tool, a lemmatiser, a multifunctional tool called „Morfolema“. The most recent tools comprise a system of morphological annotation and disambiguation, a tool for automatic textual function detection, the internet program of automatic accentuation, a music (mono singer) transcription „Solo Explorer“, parallel corpora are available on the CCL website as well as MT system from English into the Lithuanian language meant for translation of internet texts (

The CCL hosts researchers that deal with both fundamental and applied research that is necessary for the computerization of the Lithuanian language. The most prominent trend is corpus-based and corpus-driven analysis of Lithuanian words and collocations, automated analysis of the Lithuanian grammar, computer-aided text analysis and translation, speech annotation and recognition, analysis of the phonetic and phonotactic characteristics of the Lithuanian speech.

Main researchers

PhD candidates