Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/47102
Type of publication: Straipsnis recenzuojamoje užsienio tarptautinės konferencijos medžiagoje / Article in peer-reviewed foreign international conference proceedings (P1d)
Field of Science: Informatika / Computer science (N009)
Author(s): Henríquez, Carlos A. Q;Costa-jussà, Marta R;Daudaravičius, Vidas;Banchs, Rafael E;Mariño, B. José
Title: UPC-BMIC-VDU system description for the IWSLT 2010 : testing several collocation segmentations in a phrase-based SMT system
Is part of: International workshop on spoken language translation, December 2-3, 2010, Paris, France : proceedings [elektroninis išteklius]. Paris : Maison de la Chimie, 2010
Extent: p. 189-195
Date: 2010
Keywords: UPC-BMIC-VDU;IWSLT 2010;SMT system
Abstract: This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems
Internet: https://upcommons.upc.edu/bitstream/handle/2117/102470/iwslt10_ec_upc.pdf
https://upcommons.upc.edu/bitstream/handle/2117/102470/iwslt10_ec_upc.pdf
Affiliation(s): Informatikos fakultetas
Sistemų analizės katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml7.8 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

138
checked on Dec 9, 2019

Download(s)

14
checked on Dec 9, 2019

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.