Neural Lattice Search for Domain Adaptation in Machine Translation

Similar documents
Better Punctuation Prediction with Hierarchical Phrase-Based Translation

The 2016 KIT IWSLT Speech-to-Text Systems for English and German

NEWS ENGLISH LESSONS.com

Word Embeddings for NLP in Python. Marco Bonzanini PyCon Italia 2017

New study says coffee is good for you

NEWS ENGLISH LESSONS.com

National Ice Cream Day September 23 rd

NEWS ENGLISH LESSONS.com

This shop has been compensated by Collective Bias, Inc. and its advertiser. All opinions are mine alone. #GranolaMyWay #CollectiveBias

News English.com Ready-to-use ESL / EFL Lessons

Recording Form. Part One: Oral Reading. Excerpt is taken from pages Running words: 255

Envelope Punch Board Mini Album Designed By: Peg Coombes November 2013

Frozen Drinks: An A To Z Guide To All Your Frozen Favorites By Cheryl Charming

NEWS ENGLISH LESSONS.com

News English.com Ready-to-use ESL / EFL Lessons

Seashells. Read. 210 Lesson 13 Unfamiliar Words. Independent Practice

NEWS ENGLISH LESSONS.com

Introduction TWEET EARLY, TWEET OFTEN SEND A SEASON S GREETING FEATURE A PRIX FIXE MENU ASSEMBLE YOUR STAFF BE RESERVATION READY

GrillCam: A Real-time Eating Action Recognition System

Liquid candy needs health warnings

My Indian Kitchen: Preparing Delicious Indian Meals Without Fear Or Fuss By Hari Nayak

NATIONAL JELLY BEAN DAY

CALIFORNIA WINERY DIRECTORY 2006 MEDIA KIT

Table Reservations Quick Reference Guide

Spaghetti. Spaghetti

Hotels Danish Pastries Viennoiserie Sweet Treats Fast Food Bread French Bread Speciality Bread

GUIDE TO U.S CULTURE AND CUSTOMS.

Serbia Market Overview. Bord Bia, Frankfurt November 27 th 2008

Japanese food. A tailor made Sentiment Analysis

A comparison between homebrew and commercial scale utilization Eric Bean and Frank Barickman

World Robot Olympiad Regular Category Elementary. Game Description, Rules and Scoring FOOD MATTERS REDUCE FOOD WASTE

SANDWICH DAY.

NEWS ENGLISH LESSONS.com

- 1 - A) COFFEE AREA. This unique service is made up of 4 simple steps:

California Wine Community Sustainability Report Chapter 12 SOLID WASTE REDUCTION AND MANAGEMENT

TEXAS II. AIS Analysis Results AIS Standards Activities. David Pietraszewski U. S. C. G. Research and Development Center

Product Diversity and Consumer Choice in U.S. Markets for Wine, Beer, and Spirits

Growth in early yyears: statistical and clinical insights

Shan - Subtitling Price List (USD) WEF. 1st January 2019

STARTING LINEUP SOUPS & SALADS. Ask your server for today s Soup Du Jour Cup $3.99 Bowl $5.99

Liquid candy needs health warnings

W TH F. Week: Activities: Week: M

Table of Contents. Introduction. Logo Interpretation

Proposal for Instruction Manual/Feasibility Study /Rewrite of Manual on Starbucks Barista Espresso Machine

THE STATISTICAL SOMMELIER

Argument Paper, MLA Style (Zhang)

Nitron 2. Cold Draft. Cold Brew and Nitro Coffee - On Tap. bunn.com/nitron

THE FRENCH WINE MARKET LANDSCAPE REPORT SEPTEMBER 2017

MyPlate ipad Webquest

INTRO TO TEXT MINING: BAG OF WORDS. What is text mining?

Vending. Food Carousel Food carousel machines stocked with new products including our one pot meals

Biosecurity selfassessment. and vulnerability assay. Harold van den Berg. The Netherlands Biosecurity Office

FOOD ALLERGY CANADA COMMUNITY EVENT PROPOSAL FORM

Development of an efficient machine planting system for progeny testing Ongoing progeny testing of black walnut, black cherry, northern red oak,

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

GEORGIA DEPARTMENT OF CORRECTIONS Standard Operating Procedures. Policy Number: Effective Date: 2/9/2018 Page Number: 1 of 5

News English.com Ready-to-use ESL/EFL Lessons by Sean Banville

The Economic Impact of Wine and Grapes in Lodi 2009

Read all about it: Calorie counts are everywhere, but do we care?

DATA MINING CAPSTONE FINAL REPORT

Treating vines after hail: Trial results. Bob Emmett, Research Plant Pathologist

FAIRTRADE. What does Fairtrade mean? How does Fairtrade work? How do we know if things are Fairtrade? What kind of things are Fairtrade?

Ohio SNAP-Ed Adult & Teen Programs Eating More Vegetables & Fruits: You Can Do It!

NE LESSON CODE GN Let s Get Cooking: Cooking with Fruit

MINUTES. Attendees: Phil Lahm (Chair), Al Ambrosino, John Caracciola, Linda Carifo, Eloise Eckler, Joan Greco, Marlene Wolf, Chef Radames

You know what you like, but what about everyone else? A Case study on Incomplete Block Segmentation of white-bread consumers.

CGA Festival Insights. Food & Drink

A New Information Hiding Method for Image Watermarking Based on Mojette Transform

Esri Demographic Data Release Notes: Israel

MICROWAVE DIELECTRIC SPECTRA AND THE COMPOSITION OF FOODS: PRINCIPAL COMPONENT ANALYSIS VERSUS ARTIFICIAL NEURAL NETWORKS.

ATKINS DIET FOR BEGINNERS: A Comprehensive Quickstart Guide To Kickstart Your Own Atkins Diet For Permanent Weight Loss And A Healthier New You

INTERNAL USE ONLY: CHRISTMAS HALL RENTAL UPPER & LOWER HALL FORMS

Aesculap Surgical Instruments NOIR Supreme Cut Scissors

The new standard for tabletop coffee machines

Search Engine Rankings Report

Report Brochure WINE PACKAGING IN AUSTRALIA JUNE REPORT PRICE AUD 925 GBP Report Credit

think process! FINAL PROOFING SYSTEMS IMPROOFING YOUR QUALITY

News English.com Ready-to-Use English Lessons by Sean Banville

News English.com Ready-to-use ESL / EFL Lessons

DropStop wine breather Wine needs air

FLORIDA DEPARTMENT OF CITRUS. Communicating About Citrus in a Changing World presented to the International Citrus and Beverage Conference

Interloper s legacy: invasive, hybrid-derived California wild radish (Raphanus sativus) evolves to outperform its immigrant parents

Judging Foods. General Rules For Judges. Quantities for Exhibits

The sandwich celebrates 250th birthday

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Food Handler Study Guide Florida

IWC Online Resources. Introduction to Essay Writing: Format and Structure

DOWNLOAD OR READ : SMART SOLUTIONS FOR BUSY FAMILIES PDF EBOOK EPUB MOBI

THOMSON REUTERS INDICES CONTINUOUS COMMODITY TOTAL RETURN INDEX

BEER DAY.

Find the wine you are looking for at the best prices.

PJ 87/ January 2015 Original: English. Executive Summaries of the final reports for the concluded projects

US FOODS E-COMMERCE AND TECHNOLOGY OFFERINGS

Food in South Asia and China. Food is playing a significant role in people s lives. In China, there is a

Targeting Influential Nodes for Recovery in Bootstrap Percolation on Hyperbolic Networks

News English.com Ready-to-use ESL / EFL Lessons

IESNA LM MEASURING LUMEN MAINTENANCE OF LED LIGHT SOURCES. MEASUREMENT AND TEST REPORT For

Activity One. The Traditional Lands of the Navaho

Comparing R print-outs from LM, GLM, LMM and GLMM

Untangling My Chopsticks: A Culinary Sojourn In Kyoto By Victoria Abbott Riccardi

Transcription:

Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn This talk was presented at IJCNLP 2017 It is based on this paper: http://aclweb.org/anthology/i17-2004 bib: http://aclweb.org/anthology/i17-2004.bib

Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn

combine adequacy of PBMT with fluency of NMT 3

use PBMT to constrain the search space of NMT 4

Source die brötchen sind warm PBMT the Lattice bread is buns are warm 5

the Source die brötchen sind warm Lattice bread is buns are warm Neural Lattice Search Target the buns are warm 6

die brötchen sind warm the buns are warm bread is the buns are warm 7

die brötchen sind warm bread is the buns are warm 8

die brötchen sind warm bread is the buns are warm 0 1 2 3 4 9

die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 10

die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 11

die brötchen sind warm bread is the buns are warm the 0 1 2 3 4 12

die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 13

die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 14

die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 15

die brötchen sind warm bread is the buns are warm the buns 0 1 2 3 4 16

die brötchen sind warm bread is the buns are warm the buns warm 0 1 2 3 4 17

die brötchen sind warm bread is the buns are warm warm warm the buns warm warm warm warm warm 0 1 2 3 4 18

Experiments 19

Setting: Domain adaptation Small in-domain IT, Medical, Koran, Subtitles PBMT outperforms NMT Large out-of-domain parliamentary proceedings (WMT) NMT outperforms PBMT 20

Setting: Domain adaptation NMT in-domain out-of-domain PBMT in-domain out-of-domain 21

IT Results 60 50 +5.0 BLEU 40 30 20 10 0 PBMT (in) NMT (out) n-best Lattice Search 22

Results 60 50 +5.0 +0.2 BLEU 40 30 20 10 +0.4 +1.6 0 IT Koran Medical Subtitle PBMT (in) NMT (out) n-best lattice search 23

Conclusion Lattice search > n-best rescoring Use in-domain PBMT to constrain search space NMT can be in- or out-of-domain Code: github.com/khayrallah/nematus-lattice-search 24

Thanks! This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA). 25

Neural Lattice Search for Domain Adaptation in Machine Translation Huda Khayrallah, Gaurav Kumar Kevin Duh, Matt Post, Philipp Koehn {huda, gkumar, kevinduh, post, phi}@cs.jhu.edu code: github.com/khayrallah/nematus-lattice-search

27

28

29

Corpus Sizes Corpus Words Sentences W/S Medical 14,301,472 1,104,752 13 IT 3,041,677 337,817 9 Koran 9,848,539 480,421 21 Subtitles 114,371,754 13,873,398 8 EuroParl 113,165,079 4,562,102 25 30

How much text do we have? BLEU 30 26.2 26.9 27.9 28.6 29.2 29.6 30.3 31.1 IT 27.4 30.1 30.4 29.2 24.9 25.7 27.8 28.6 23.4 21.8 22.2 23.5 26.1 26.9 24.7 21.2 20 19.6 22.4 18.1 16.4 18.2 10 0 1.6 7.2 14.7 11.9 Koran Medical Phrase-Based with Big LM Phrase-Based Neural 10 6 10 7 10 8 Subtitles & WMT Corpus size (English words) [Koehn & Knowles 2017] 31

Corpus Sizes 140000000 words 120000000 100000000 80000000 60000000 40000000 20000000 0 Medical IT Koran Subtitles EuroParl words 32

IT Baselines 50 45 40 35 BLEU 30 25 20 15 10 5 0 PBMT (in) PBMT (out) NMT (in) NMT (out) 33

34

Source Versionsinformationen ausgeben und beenden Reference output version information and exit PBMT Spend version information and end NMT Spend and end versionary information lattice Print version information and exit 35

Results BLEU 60 50 40 30 20 10 0 +4.2 +0.7 +1.1 +0.1 IT Medical Koran Subtitle PBMT (in) NMT (in) N-best Hybrid Lattice 36

Stack Based Decoding Stacks based on number of target words translated Keep track of: Score Current lattice node Current neural state incoming arc length 37