Automatinis lietuvių kalbos žodžių skiemenavimas, kirčiavimas, transkribavimas : [studija]
Author | Affiliation | |
---|---|---|
LT | ||
LT | ||
UAB Practice Manager Baltic | LT |
Date |
---|
2010 |
Studijoje „Automatinis lietuvių kalbos žodžių skiemenavimas, kirčiavimas, transkribavimas“ aprašomi lietuvių kalbos dalių kirčiavimo algoritmai, aptariami struktūrinio skiemenavimo, kirčiavimo, transkribavimo modelio sudarymo principai, analizuojamos pagrindinės problemos, kurios iškilo įprastas lingvistines taisykles pritaikant algoritmams. Nagrinėjama sąveika su morfologinės ir leksinės informacijos duomenų bazėmis. Kiekvieno algoritmo (skiemenavimo, kirčiavimo, transkribavimo) aprašas sudarytas iš dviejų dalių: lingvistinės medžiagos ir algoritmo schemos. Studija skiriama ir kalbininkams, ir informatikams, dirbantiems kalbos technologijų srityje.
This report presents a detailed description of algorithms performing automatic syllable boundary detection, stress assignment, and phonetic transcription. These algorithms were developed by a group of researchers and students at Vytautas Magnus University during 2002-2009. It is difficult to perceive syllable boundaries in a naturally uttered word. The number of syllables and not their exact boundaries is the primary factor affecting speech perception. However the detection of exact syllable boundaries is important to carry out fundamental investigations about the syllable structure. An automatic syllabification tool has been developed for this purpose. Syllable boundaries were defined according to the functional theory of a syllable: the syllable should begin with the maximum cluster of consonants given that such cluster (its model) is found at the beginning of other words. This report describes some problematic cases where phonological model of a syllable (STRVRTS) cannot be directly applied. The solutions are proposed for disambiguating the juncture of a pair of consonants (diphthong, hiatus, frontalised vowel), accurately detecting word prefixes, for syllabifying composite (two-root) words. The syllabification algorithm is very accurate. A few errors occur because of homographs that are spelled identically but pronounced and syllabified differently. The syllabification algorithm is being used as a component to other language processing tools created at VMU: within algorithms for automatic stress assignment and automatic phonetic transcription, and within computer tools for investigating word phonotactics and text rhythmics. Automatic word stress assignment is the most challenging task because the stress of Lithuanian is not attached to any particular syllable. Word stress can move from one syllable to another within the same accentuation paradigm and often depends on word intonation. [...]
Recenzentai: prof. habil. dr. A. Pakerys (VPU); dr. P. Kasparaitis (VU); Sapagovas (KMU)