Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn This talk was presented at IJCNLP 2017 It is based on this paper: http://aclweb.org/anthology/i17-2004 bib: http://aclweb.org/anthology/i17-2004.bib
Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn
combine adequacy of PBMT with fluency of NMT 3
use PBMT to constrain the search space of NMT 4
Source die brötchen sind warm PBMT the Lattice bread is buns are warm 5
the Source die brötchen sind warm Lattice bread is buns are warm Neural Lattice Search Target the buns are warm 6
die brötchen sind warm the buns are warm bread is the buns are warm 7
die brötchen sind warm bread is the buns are warm 8
die brötchen sind warm bread is the buns are warm 0 1 2 3 4 9
die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 10
die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 11
die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 12
die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 13
die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 14
die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 15
die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 16
die brötchen sind warm bread is the buns are warm the buns warm 0 1 2 3 4 17
die brötchen sind warm bread is the buns are warm warm warm the buns warm warm warm warm warm 0 1 2 3 4 18
Experiments 19
Setting: Domain adaptation Small in-domain IT, Medical, Koran, Subtitles PBMT outperforms NMT Large out-of-domain parliamentary proceedings (WMT) NMT outperforms PBMT 20
Setting: Domain adaptation NMT in-domain out-of-domain PBMT in-domain out-of-domain 21
IT Results 60 50 +5.0 BLEU 40 30 20 10 0 PBMT (in) NMT (out) n-best Lattice Search 22
Results 60 50 +5.0 +0.2 BLEU 40 30 20 10 +0.4 +1.6 0 IT Koran Medical Subtitle PBMT (in) NMT (out) n-best lattice search 23
Conclusion Lattice search > n-best rescoring Use in-domain PBMT to constrain search space NMT can be in- or out-of-domain Code: github.com/khayrallah/nematus-lattice-search 24
Thanks! This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA). 25
Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn {huda, gkumar, kevinduh, post, phi}@cs.jhu.edu code: github.com/khayrallah/nematus-lattice-search
27
28
29
Corpus Sizes Corpus Words Sentences W/S Medical 14,301,472 1,104,752 13 IT 3,041,677 337,817 9 Koran 9,848,539 480,421 21 Subtitles 114,371,754 13,873,398 8 EuroParl 113,165,079 4,562,102 25 30
How much text do we have? BLEU 30 26.2 26.9 27.9 28.6 29.2 29.6 30.3 31.1 IT 27.4 30.1 30.4 29.2 24.9 25.7 27.8 28.6 23.4 21.8 22.2 23.5 26.1 26.9 24.7 21.2 20 19.6 22.4 18.1 16.4 18.2 10 0 1.6 7.2 14.7 11.9 Koran Medical Phrase-Based with Big LM Phrase-Based Neural 10 6 10 7 10 8 Subtitles & WMT Corpus size (English words) [Koehn & Knowles 2017] 31
Corpus Sizes 140000000 words 120000000 100000000 80000000 60000000 40000000 20000000 0 Medical IT Koran Subtitles EuroParl words 32
IT Baselines 50 45 40 35 BLEU 30 25 20 15 10 5 0 PBMT (in) PBMT (out) NMT (in) NMT (out) 33
34
Source Versionsinformationen ausgeben und beenden Reference output version information and exit PBMT Spend version information and end NMT Spend and end versionary information lattice Print version information and exit 35
Results BLEU 60 50 40 30 20 10 0 +4.2 +0.7 +1.1 +0.1 IT Medical Koran Subtitle PBMT (in) NMT (in) N-best Hybrid Lattice 36
Stack Based Decoding Stacks based on number of target words translated Keep track of: Score Current lattice node Current neural state incoming arc length 37