Bilingual Sentence Alignment of a Parallel Corpus by Using English as a Pivot Language

 

Authors
Aguiar Pontes, Josaf? de Jes?s
Format
Article
Status
publishedVersion
Description

Statistically training a machine translation model requires a parallel corpus containing a huge amount of aligned sentence pairs in both languages. However, it is not easy to obtain such a corpus when English is not the source or the target language. The European Parliament parallel corpus contains only English sentence alignments with 20 European languages, missing alignments for other 190 language pairs. A previous method using sentence length information is not enough reliable to produce alignments for training statistical machine translation models. Hybrid methods combining sentence length and bilingual dictionary information may produce better results, but dictionaries may not be affordable. Thus, we introduce a technique which aligns non-English corpora from the European Parliament by using English as a pivot language without a bilingual dictionary. Our technique has been illustrated with French and Spanish, resulting on an equivalent performance with the existing one in the original EnglishFrench and English-Spanish corpora.
Escuela Polit?cnica Nacional.
http://www.aclweb.org/anthology/W14-6902

Publication Year
2014
Language
eng
Topic
BILINGUAL
SENTENCE
ALIGNMENT
PARALLEL CORPUS
Repository
Repositorio SENESCYT
Get full text
http://repositorio.educacionsuperior.gob.ec/handle/28000/2788
Rights
openAccess
License
openAccess