Uni-München
14. März 2017Seminar Advanced Topics in Statistical Machine Translation
Phrase-based statistical machine translation (PBSMT) is the state-of-the-art for machine translation of some language pairs. PBSMT is surprisingly free of explicit linguistic knowledge, but can be very effective. However, this is not always true. For instance, when translating into a...
Erstelle deinen persönlichen Lernplan
Wir helfen dir, diesen Kurs optimal vorzubereiten — mit einem individuellen Lernplan, Tipps und passenden Ressourcen.
Jetzt Lernplan erstellenPhrase-based statistical machine translation (PBSMT) is the state-of-the-art for machine translation of some language pairs. PBSMT is surprisingly free of explicit linguistic knowledge, but can be very effective. However, this is not always true. For instance, when translating into a morphologically rich language the translation quality is lacking, particularly when there is also significant syntactic divergence between the two languages. The quality of PBSMT is poor in this case because of independence assumptions made involving morphology and syntax in the translation model that do not reflect linguistic reality. Domain adaptation and the new neural machine translation paradigm are two other topics that will be covered.
In this course we will read papers that try to address this problem by adding knowledge to the translation process in a wide variety of ways. We will start with an intensive focus on morphology. We will then move on to syntax and semantic roles. Then we will cover domain adaptation, and finally we will take a brief look at the new end-to-end discriminative approach referred to as -neural machine translation-. Participants will be encouraged to look at actual translation system output for problems and we will connect these observations with the work that we discuss.
Philipp Koehn's book -Statistical Machine Translation-
Kevin Knight's tutorial on SMT
See course web page for further details - additional literature will be presented during the lecture.
Bemerkung
Blockseminar; Termine wie oben, aber Dienstag nur 10:00-14:00!
W3-Professur für Computerlinguistik (Univ. Prof. Dr. Hinrich Schütze)
LMU München
SoSe 2015
Dozent