Word Embeddings for NLP in Python. Marco Bonzanini PyCon Italia 2017

Similar documents
Learning the Language of Wine CS 229 Term Project - Final Report

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Wine Rating Prediction

DIR2017. Training Neural Rankers with Weak Supervision. Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps, and W.

Cloud Computing CS

learning goals ARe YoU ReAdY to order?

What makes a good muffin? Ivan Ivanov. CS229 Final Project

Predicting Wine Varietals from Professional Reviews

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

DATA MINING CAPSTONE FINAL REPORT

Semantic Web. Ontology Engineering. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

Learning Connectivity Networks from High-Dimensional Point Processes

Efficient Image Search and Identification: The Making of WINE-O.AI

NVIVO 10 WORKSHOP. Hui Bian Office for Faculty Excellence BY HUI BIAN

Predicting Wine Quality

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

What Makes a Cuisine Unique?

FOOD and DRINKS Question: What do you usually eat and drink for breakfast? Complete the paragraph on the right with the words on the left.

GOING OUT (05) Going to a restaurant (01) - At the bar (04)

1 What s your favourite type of cake? What ingredients do you need to make a cake? Make a list. 3 Listen, look and sing Let s go shopping!

GLOBALIZATION UNIT 1 ACTIVATE YOUR KNOWLEDGE LEARNING OBJECTIVES

-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

Using the Forest to see the Trees: A computational model relating features, objects and scenes

Amazon Fine Food Reviews wait I don t know what they are reviewing

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Mahout

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

Table of Contents. Introduction... 1 Mastering Your Craft...1 A Little About Me...3. PART I: Becoming a Bartender... 6

Chapter 3 Labor Productivity and Comparative Advantage: The Ricardian Model

Grapevine: A Wine Prediction Algorithm Using Multi-dimensional Clustering Methods

Please sign and date here to indicate that you have read and agree to abide by the above mentioned stipulations. Student Name #4

What Cuisine? - A Machine Learning Strategy for Multi-label Classification of Food Recipes

ENGI E1006 Percolation Handout

P.S. Also, make sure you connect with me on Facebook and Instagram! /ashybinesbikinibodychallenge

Table of Contents. Toast Inc. 2

GCSE 4091/01 DESIGN AND TECHNOLOGY UNIT 1 FOCUS AREA: Food Technology

Lesson 4. Choose Your Plate. In this lesson, students will:

EMC Publishing s C est à toi! 3, 2E Correlated to the Colorado World Language Frameworks French 3

Maximizing Efficiency In The Production Of Coffee

1.3 Box & Whisker Plots

Perceptual Mapping and Opportunity Identification. Dr. Chris Findlay Compusense Inc.

The 2016 KIT IWSLT Speech-to-Text Systems for English and German

Frogs Into Princes: Neuro Linguistic Programming By John Grinder, Richard Bandler READ ONLINE

1) Revisit the list of filtering terms taking into consideration the list of most popular search terms

Development and evaluation of a mobile application as an e-learning tool for technical wine assessment

Title Topics Learning Competencies Assessment Week 1

Better Punctuation Prediction with Hierarchical Phrase-Based Translation

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

Menus of Change General Session 3 Changing Consumer Behaviors and Attitudes

Maximising Sensitivity with Percolator

Ag in the Classroom Going Local

Plant Parts - Roots. Fall Lesson 5 Grade 3. Lesson Description. Learning Objectives. Attitude and Behavior Goals. Materials and Preparation

Lesson 1: Traveling Asia s Silk Road

Neural Lattice Search for Domain Adaptation in Machine Translation

2 Recommendation Engine 2.1 Data Collection. HapBeer: A Beer Recommendation Engine CS 229 Fall 2013 Final Project

Food Words. Ph! (English; Yr 3, ACELA1484) Learn extended and technical vocabulary and ways of expressing opinion including modal verbs and adverbs

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

TEA INTERACTION DESIGN

Note Taking Study Guide UNDERSTANDING OUR PAST

Assignment #3: Lava Lite!!

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Promoting Whole Foods

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

While all foods should be presented in a visually appealing manner, we are going to focus on plating desserts.

STACKING CUPS STEM CATEGORY TOPIC OVERVIEW STEM LESSON FOCUS OBJECTIVES MATERIALS. Math. Linear Equations

Reliable Profiling for Chocolate and Cacao

Testing Taste. FRAMEWORK I. Scientific and Engineering Practices 1,3,4,6,7,8 II. Cross-Cutting Concepts III. Physical Sciences

Comparing R print-outs from LM, GLM, LMM and GLMM

Click to edit Master title style Delivering World-Class Customer Service Through Lean Thinking

confidence for front line staff Key Skills for the WSET Level 1 Certificate Key Skills in Wines and Spirits ISSUE FIVE JULY 2005

Back to the English. One Step to the World

Lack of Credibility, Inflation Persistence and Disinflation in Colombia

Modeling Regional Endogenous Growth

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

west australian wine industry sustainable funding model

Lesson 4: Potatoes on MyPlate

Vineyard Cash Flows Tremain Hatch

WINE 205 Course Syllabus Fundamentals of Wine: From the Soil to the Table Fall 2016

Grapes of Class. Investigative Question: What changes take place in plant material (fruit, leaf, seed) when the water inside changes state?

From VOC to IPA: This Beer s For You!

Chapter 3. Labor Productivity and Comparative Advantage: The Ricardian Model. Pearson Education Limited All rights reserved.

S700. Innovation Becomes an Invitation.

TRACKS Lesson Plan. Fruit Fruit Rocks Grades 5 8 Girls Club

Download or Read Online ebook teaching inference marzano in PDF Format From The Best User Guide Database

Lesson 11: Comparing Ratios Using Ratio Tables

DOWNLOAD OR READ : SMART SOLUTIONS FOR BUSY FAMILIES PDF EBOOK EPUB MOBI

Out of Home ROI and Optimization in the Media Mix Summary Report

The Golden Ratio And Fibonacci Numbers By R. A. Dunlap READ ONLINE

December Lesson: Eat a Rainbow

Summary of Main Points

A SPECIAL WAY TO DISCOVER ITALY WITH ANPA - ACCADEMIA NAZIONALE PROFESSIONI ALBERGHIERE & ATENEO DEL GELATO ITALIANO

Dining Room Theory

End to End Chilled Water Optimization Merck West Point, PA Site

Concepts and Vocabulary

Lesson 23: Newton s Law of Cooling

Activity 7.3 Comparing the density of different liquids

Recommended Resources: The following resources may be useful in teaching this lesson:

Name: Hour: Review: 1. What are the three elements that you need to measure to guarantee a successful recipe?

CASC 28 May Copyright Presentation by Claudia Sanchez Bajo 2014

Transcription:

Word Embeddings for NLP in Python Marco Bonzanini PyCon Italia 2017

Nice to meet you

WORD EMBEDDINGS?

Word Embeddings = Word Vectors = Distributed Representations

Why should you care?

Why should you care? Data representation is crucial

Applications

Applications Classification / tagging Recommendation Systems Search Engines (Query Expansion) Machine Translation

One-hot Encoding

One-hot Encoding Rome = [1, 0, 0, 0, 0, 0,, 0] Paris = [0, 1, 0, 0, 0, 0,, 0] Italy = [0, 0, 1, 0, 0, 0,, 0] France = [0, 0, 0, 1, 0, 0,, 0]

One-hot Encoding Rome Paris word V Rome = [1, 0, 0, 0, 0, 0,, 0] Paris = [0, 1, 0, 0, 0, 0,, 0] Italy = [0, 0, 1, 0, 0, 0,, 0] France = [0, 0, 0, 1, 0, 0,, 0]

One-hot Encoding V = vocabulary size (huge) Rome = [1, 0, 0, 0, 0, 0,, 0] Paris = [0, 1, 0, 0, 0, 0,, 0] Italy = [0, 0, 1, 0, 0, 0,, 0] France = [0, 0, 0, 1, 0, 0,, 0]

Bag-of-words

Bag-of-words doc_1 = [32, 14, 1, 0,, 6] doc_2 = [ 2, 12, 0, 28,, 12] doc_n = [13, 0, 6, 2,, 0]

Bag-of-words Rome Paris word V doc_1 = [32, 14, 1, 0,, 6] doc_2 = [ 2, 12, 0, 28,, 12] doc_n = [13, 0, 6, 2,, 0]

Word Embeddings

Word Embeddings Rome = [0.91, 0.83, 0.17,, 0.41] Paris = [0.92, 0.82, 0.17,, 0.98] Italy = [0.32, 0.77, 0.67,, 0.42] France = [0.33, 0.78, 0.66,, 0.97]

Word Embeddings n. dimensions << vocabulary size Rome = [0.91, 0.83, 0.17,, 0.41] Paris = [0.92, 0.82, 0.17,, 0.98] Italy = [0.32, 0.77, 0.67,, 0.42] France = [0.33, 0.78, 0.66,, 0.97]

Word Embeddings Rome = [0.91, 0.83, 0.17,, 0.41] Paris = [0.92, 0.82, 0.17,, 0.98] Italy = [0.32, 0.77, 0.67,, 0.42] France = [0.33, 0.78, 0.66,, 0.97]

Word Embeddings Rome = [0.91, 0.83, 0.17,, 0.41] Paris = [0.92, 0.82, 0.17,, 0.98] Italy = [0.32, 0.77, 0.67,, 0.42] France = [0.33, 0.78, 0.66,, 0.97]

Word Embeddings Rome = [0.91, 0.83, 0.17,, 0.41] Paris = [0.92, 0.82, 0.17,, 0.98] Italy = [0.32, 0.77, 0.67,, 0.42] France = [0.33, 0.78, 0.66,, 0.97]

Word Embeddings Rome Paris Italy France

Word Embeddings Rome Paris + Italy - France Rome

THE MAIN INTUITION

Distributional Hypothesis

You shall know a word by the company it keeps. J.R. Firth 1957

Words that occur in similar context tend to have similar meaning. Z. Harris 1954

Context Meaning

I enjoyed eating some pizza at the restaurant

Word I enjoyed eating some pizza at the restaurant

Word I enjoyed eating some pizza at the restaurant The company it keeps

I enjoyed eating some pizza at the restaurant I enjoyed eating some fiorentina at the restaurant

I enjoyed eating some pizza at the restaurant I enjoyed eating some fiorentina at the restaurant

I enjoyed eating some pizza at the restaurant I enjoyed eating some fiorentina at the restaurant Same context

I enjoyed eating some pizza at the restaurant I enjoyed eating some fiorentina at the restaurant Same context Pizza = Fiorentina?

A BIT OF THEORY word2vec

Vector Calculation

Vector Calculation Goal: learn vec(word)

Vector Calculation Goal: learn vec(word) 1. Choose objective function

Vector Calculation Goal: learn vec(word) 1. Choose objective function 2. Init: random vectors

Vector Calculation Goal: learn vec(word) 1. Choose objective function 2. Init: random vectors 3. Run gradient descent

I enjoyed eating some pizza at the restaurant

I enjoyed eating some pizza at the restaurant

I enjoyed eating some pizza at the restaurant

I enjoyed eating some pizza at the restaurant Maximise the likelihood of the context given the focus word

I enjoyed eating some pizza at the restaurant Maximise the likelihood of the context given the focus word P(i pizza) P(enjoyed pizza) P(restaurant pizza)

Example I enjoyed eating some pizza at the restaurant

Example I enjoyed eating some pizza at the restaurant Iterate over context words

Example I enjoyed eating some pizza at the restaurant bump P( i pizza )

Example I enjoyed eating some pizza at the restaurant bump P( enjoyed pizza )

Example I enjoyed eating some pizza at the restaurant bump P( eating pizza )

Example I enjoyed eating some pizza at the restaurant bump P( some pizza )

Example I enjoyed eating some pizza at the restaurant bump P( at pizza )

Example I enjoyed eating some pizza at the restaurant bump P( the pizza )

Example I enjoyed eating some pizza at the restaurant bump P( restaurant pizza )

Example I enjoyed eating some pizza at the restaurant Move to next focus word and repeat

Example I enjoyed eating some pizza at the restaurant bump P( i at )

Example I enjoyed eating some pizza at the restaurant bump P( enjoyed at )

Example I enjoyed eating some pizza at the restaurant you get the picture

P( eating pizza )

P( eating pizza )??

Output word Input word P( eating pizza )

Output word Input word P( eating pizza ) P( vec(eating) vec(pizza) )

Output word Input word P( eating pizza ) P( vec(eating) vec(pizza) ) P( vout vin )

Output word Input word P( eating pizza ) P( vec(eating) vec(pizza) ) P( vout vin )???

P( vout vin )

cosine( vout, vin )

cosine( vout, vin ) [-1, 1]

softmax(cosine( vout, vin ))

softmax(cosine( vout, vin )) [0, 1]

softmax(cosine( vout, vin )) P (v out v in )= exp(cosine(v out, v in )) P k2v exp(cosine(v k, v in ))

Vector Calculation Recap

Vector Calculation Recap Learn vec(word)

Vector Calculation Recap Learn vec(word) by gradient descent

Vector Calculation Recap Learn vec(word) by gradient descent on the softmax probability

Plot Twist

Paragraph Vector a.k.a. doc2vec i.e. P(vout vin, label)

A BIT OF PRACTICE

pip install gensim

Case Study 1: Skills and CVs

Case Study 1: Skills and CVs Data set of ~300k resumes Each experience is a sentence Each experience has 3-15 skills Approx 15k unique skills

Case Study 1: Skills and CVs from gensim.models import Word2Vec fname = 'candidates.jsonl' corpus = ResumesCorpus(fname) model = Word2Vec(corpus)

Case Study 1: Skills and CVs model.most_similar('chef') [('cook', 0.94), ('bartender', 0.91), ('waitress', 0.89), ('restaurant', 0.76),...]

Case Study 1: Skills and CVs model.most_similar('chef', negative=['food']) [('puppet', 0.93), ('devops', 0.92), ('ansible', 0.79), ('salt', 0.77),...]

Case Study 1: Skills and CVs Useful for: Data exploration Query expansion/suggestion Recommendations

Case Study 2: Beer!

Case Study 2: Beer! Data set of ~2.9M beer reviews 89 different beer styles 635k unique tokens 185M total tokens

Case Study 2: Beer! from gensim.models import Doc2Vec fname = 'ratebeer_data.csv' corpus = RateBeerCorpus(fname) model = Doc2Vec(corpus)

Case Study 2: Beer! from gensim.models import Doc2Vec fname = 'ratebeer_data.csv' corpus = RateBeerCorpus(fname) model = Doc2Vec(corpus) 3.5h on my laptop remember to pickle

Case Study 2: Beer! model.docvecs.most_similar('stout') [('Sweet Stout', 0.9877), ('Porter', 0.9620), ('Foreign Stout', 0.9595), ('Dry Stout', 0.9561), ('Imperial/Strong Porter', 0.9028),...]

Case Study 2: Beer! model.most_similar([model.docvecs['stout']]) [('coffee', 0.6342), ('espresso', 0.5931), ('charcoal', 0.5904), ('char', 0.5631), ('bean', 0.5624),...]

Case Study 2: Beer! model.most_similar([model.docvecs['wheat Ale']]) [('lemon', 0.6103), ('lemony', 0.5909), ('wheaty', 0.5873), ('germ', 0.5684), ('lemongrass', 0.5653), ('wheat', 0.5649), ('lime', 0.55636), ('verbena', 0.5491), ('coriander', 0.5341), ('zesty', 0.5182)]

PCA

Dark beers

Strong beers

Sour beers

Lagers

Wheat beers

Case Study 2: Beer! Useful for: Understanding the language of beer enthusiasts Planning your next pint Classification

FINAL REMARKS

But we ve been doing this for X years

But we ve been doing this for X years Approaches based on co-occurrences are not new Think SVD / LSA / LDA but they are usually outperformed by word2vec and don t scale as well as word2vec

Efficiency

Efficiency There is no co-occurrence matrix (vectors are learned directly) Softmax has complexity O(V) Hierarchical Softmax only O(log(V))

Garbage in, garbage out

Garbage in, garbage out Pre-trained vectors are useful until they re not The business domain is important The pre-processing steps are important > 100K words? Maybe train your own model > 1M words? Yep, train your own model

Summary

Summary Word Embeddings are magic! Big victory of unsupervised learning Gensim makes your life easy

Credits & Readings

Credits & Readings Credits Lev Konstantinovskiy (@gensim_py) Chris E. Moody (@chrisemoody) see videos on lda2vec Readings Deep Learning for NLP (R. Socher) http://cs224d.stanford.edu/ word2vec parameter learning explained by Xin Rong More readings GloVe: global vectors for word representation by Pennington et al. Dependency based word embeddings and Neural word embeddings as implicit matrix factorization by O. Levy and Y. Goldberg

THANK YOU @MarcoBonzanini GitHub.com/bonzanini marcobonzanini.com