Statistiniai kolokacijų nustatymo metodai ir vertimo atitikmenys lygiagrečiajame grožinės literatūros tekstyne

Karaliūtė, Asta

Use this url to cite ETD: https://hdl.handle.net/20.500.12259/123476

Statistiniai kolokacijų nustatymo metodai ir vertimo atitikmenys lygiagrečiajame grožinės literatūros tekstyne

Field of Science

Filologija / Philology (H004)

Type of publication

type::text::thesis::master thesis

Title

Statistiniai kolokacijų nustatymo metodai ir vertimo atitikmenys lygiagrečiajame grožinės literatūros tekstyne

Other Title

Statistical collocation extraction methods and translation equivalents in the Pprallel corpus of fiction

Author

Karaliūtė, Asta

Advisor

Utka, Andrius

Extent

62 p.

Date Issued

2010-06-01

Keywords

statistiniai metodai

kolokacija

tekstynas

atitikmenys

vertimas

statistical methods

collocation

corpus

equivalents

translation

Abstract

Darbo tyrimo objektas – kolokacijos ir jų tyrimo metodai. Pagrindinis darbo tikslas – išanalizuoti statistiniais metodais nustatytų kolokacijų sąrašus, juos palyginti ir išnagrinėti atrinktų kolokacijų vertimo atitikmenis. Darbo aktualumas – kolokacijų analizė padės lingvistams ir kitiems kalbos specialistams pasirinkti tinkamą kolokacijų nustatymo metodą tiek anglų, tiek lietuvių kalbai. O kolokacijų vertimo proceso supratimas svarbus vertimo analizei, vertėjų darbui. Tyrimas susideda iš penkių dalių. Antrajame skyriuje pristatoma teorinė kolokacijos sąvoka. Pateikiama sudėtinga kolokacijų vertimo problematika ir keturių analizei pasirinktų statistinių metodų charakteristikos: Tarpusavio Informacija (angl. Mutual Information), T-lygmuo (angl. T-score), Lošimo kauliukų metodas (angl. Dice) ir Logaritminio tikėtinumo santykis (angl. Log-likelihood ratio). Trečiajame skyriuje, remiantis pagrindiniu analizės šaltiniu – lygiagrečiu grožinės literatūros tekstynu, nustatomi kolokacijų sąrašai. Paaiškėja, kad T-lygmens ir Logaritminio tikėtinumo santykio (LTS) metoduose išryškėjo gramatinės kolokacijos, o Tarpusavio Informacijos (TI) ir Lošimo kauliukų (LK) metoduose – leksinės. Parinktos ir apibrėžtos kolokacijų ribos bei metodų panašumo koeficientai. Ketvirtajame skyriuje pasirenkamas 200 geriausiųjų kolokacijų sąrašas ir atliekamas kiekvienos kalbos statistinių metodų palyginimas. Metodai lyginami poromis pagal panašumo kriterijus – LK su TI (leksinės kolokacijos) bei T-lygmuo su LTS (gramatinės). Kolokacijos sugrupuojamos į papildomas kategorijas pagal išryškėjusius, dažnai pasitaikančius junginius. Penktajame skyriuje, nagrinėjami kolokacijų atitikmenys ir bandoma nustatyti, ar po vertimo kolokacija išlieka kolokacija. Lygiagrečiai lyginami LK ir TI metodų sąrašai. Metoduose nustatomi tik trijų grupių kolokacijų atitikmenys, todėl rezultatai tikslinami pasitelkus įvairių frazių grupės kolokacijas bei eilės numerį sąrašuose. Gauti rezultatai rodo, kad atitikmenys daugiau nei 50 procentų išliko kolokacijomis, kai analizei naudojami neriboto kiekio, o pilni kolokacijų sąrašai. Daroma išvada, kad kolokacijų sąrašai tinkami kalbų lyginimui, vertimo analizei atlikti.

The main objective of the Master thesis is collocations and collocation extraction methods. The aim of the research is to analyze collocation lists extracted by statistical methods from the parallel corpus of fiction and determine the collocation equivalents. Relevance of the thesis – collocation analysis can help linguists and other language specialists choose the right collocaton extraction methods in both, English and Lithuanian, languages. What is more, understanding of collocation translation process is very important for the translation analysis and interpreters. Research consists of 5 parts. Chapter 2 presents the concept of collocation and possible collocation translation problems. The theoretical part also includes the characteristics of the four selected statistical methods: Mutual Information (MI), T-score, Dice and Log-likelihood ratio (LLR). In chapter 3, collocation lists for each language, English and Lithuanian, are extracted. The analysis reveal that T-score and LLR methods extract grammatical collocations, while MI and Dice – lexical ones. Futher in this chapter, collocation boundaries and the coefficients of each method are defined. Chapter 4 presents a list of top 200 collocations of each language and method. The methods with new collocation lists are compared in pairs according to similarity criteria - Dice with MI (lexical collocations) and T-score with LLR (grammatical). Another distribution of bigrams according to frequency is identified, and both lexical and grammatical collocations are grouped into additional categories. Chapter 5, the last paragraph, deals with the analysis of collocation equivalents. First, the parallel language comparison of Dice lists are done, then MI. Both methods MI and Dice corresponded to only 3 groups of collocations, so further analysis is based just on the group of various phrases repeated in both, MI and Dice methods. The equivalents are extracted from the concordance lines and verified with a number in Lithuanian sample lists. The results revealed that half of the collocations after the translation remained collocations and it is concluded that the collocations lists are suitable for the further analysis. Statistical methods extract different collocation lists which might be useful for various linguistic analysis.

Language

Lietuvių / Lithuanian (lt)

URI

https://hdl.handle.net/20.500.12259/123476

Defended

Taip / Yes

Access Rights

Atviroji prieiga / Open Access

File(s)

asta_karaliute_md.pdf (2.6 MB)

Options

Statistiniai kolokacijų nustatymo metodai ir vertimo atitikmenys lygiagrečiajame grožinės literatūros tekstyne