Discourse in Statistical Machine Translation

Studia Linguistica Upsaliensia No. 15


By: Christian Hardmeier
October 2014
Uppsala University
Distributed by Coronet Books
ISBN: 9789155489632
185 Pages, Illustrated
$67.50 Paper original


This thesis addresses the technical and linguistic aspects of discourse level processing in phrase-based statistical machine translation (SMT). Connected texts can have complex text-level linguistic dependencies across sentences that must be preserved in translation. However, the models and algotithms of SMT are pervaded by locality assumptions. In a standard SMT setup, no model has more complex dependencies that an n-gram model. The popular stack decoding algorithm explots this fact to implement efficient search with a dynamic programming technique. This is a serious technical obstacle to discourse-level modelling in SMT.