Please use this identifier to cite or link to this item:
Type of publication: Straipsnis recenzuojamoje užsienio tarptautinės konferencijos medžiagoje / Article in peer-reviewed foreign international conference proceedings (P1d)
Field of Science: Informatika / Informatics (N009)
Author(s): Henríquez, Carlos A. Q;Costa-jussà, Marta R;Daudaravičius, Vidas;Banchs, Rafael E;Mariño, B. José
Title: Using collocation segmentation to augment the phrase table
Is part of: WMT'10 : Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, 15-16 July 2010. Morristown, USA : Association for Computational Linguistics
Extent: p. 98-102
Date: 2010
Keywords: Collocation segmentation;Machine translation;Translation system
ISBN: 9781932432718
Abstract: This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC in cooperation with BMIC and VMU. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that different and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the non-segmented corpus to compute the phrase table. We present the configurations considered and also report results obtained with internal and official test sets
Affiliation(s): Sistemų analizės katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml7.43 kBXMLView/Open

MARC21 XML metadata

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

CORE Recommender

Page view(s)

checked on Mar 30, 2020


checked on Mar 30, 2020

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.