Digital Language Data and Intelligent Technologies

Research overview

Research, projects and publications conducted by researchers in this cluster are interdisciplinary in nature, bringing together linguists and computer scientists and covering such topics as computational linguistics, artificial intelligence and digital humanities. Scientists from the Centre for Computational Linguistics ( and the Faculty of Informatics ( cooperate in the cluster.

The research of the cluster is necessary for the digitization and development of Lithuanian language resources and technologies. Major research interests: Lithuanian terminology and multi word units, Lithuanian morphology and syntax, text analysis and translation, annotation and recognition of spoken Lithuanian, and Lithuanian phonetics and phonotactics, among other topics.

One of the priorities of the cluster is the preparation of language data for open access, since modern statistical and machine learning methods require large amounts of accurately pre-processed language data. The Centre for Computational Linguistics coordinates the representation of Lithuania in the European Research Infrastructure for Language Resources and Technology CLARIN ERIC. The CLARIN-LT Centre ( and a consortium of Lithuanian education institutions have been established to ensure the preparation and open access availability of the Lithuanian language resources and analysis tools.

The cluster is leading and participating in a number of projects. Scientists of the cluster have participated in such projects as the first English-Lithuanian MT system (, “Resilience for Survivability” (ReSIST), “Automatic Identification of Science and Education Terms” (ŠIMTAI), “Automatic Identification of Lithuanian Multi-word Expressions” (PASTOVU), “Lithuanian Academic Scheme for International Cooperation in Baltic Studies”, SEMANTIKA, SEMANTIKA-2 and many others.

Main researchers

PhD candidates