Extraction of Lexical Data from
Parallel Corpora & Their Application
in Natural Language Processing
By Jorg Tiedemann
Uppsala University Press
139 pages, Illustrated, 6 ½" x 9 ½"
$49.50 Paper Original
This Ph.D. dissertation focuses on re-using translations for applications in natural language processing. It presents five parallel corpora consisting of documents and their translations, containing over 35 million words, and including 60 languages. The thesis also proposes an innovative approach to word alignment using statistical and linguistic clues. This approach can be used for the automatic extraction of bilingual lexical data. These data consist of words and phrases linked to their translations and can be applied to computational lexicography and to machine translation. Four example applications are discussed in the thesis.
About the Author: Jorg Tiedemann is a researcher at the Department of Linguistics at Uppsala University. His main interests include data-driven methods in computational linguistics, multilingual lexicography, terminology, and machine translation.
Studia Linguistica Upsaliensia No. 1
Return to Coronet Books main page