This thesis addresses the technical and linguistic aspects of discourse level processing in phrase-based statistical machine translation (SMT). Connected texts can have complex text-level linguistic dependencies across sentences that must be preserved in translation. However, the models and algotithms of SMT are pervaded by locality assumptions. In a standard SMT setup, no model has more complex dependencies that an n-gram model. The popular stack decoding algorithm explots this fact to implement efficient search with a dynamic programming technique. This is a serious technical obstacle to discourse-level modelling in SMT.