Jure Leskovec Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S.

Similar documents
Jure Leskovec, Computer Science Dept., Stanford

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Jure Leskovec Stanford University

-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

-- CS341 info session is on Thu 3/18 7pm in Gates Final exam logistics

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

What makes a good muffin? Ivan Ivanov. CS229 Final Project

Case Study A Year of Social-Local Success

Learning Connectivity Networks from High-Dimensional Point Processes

Targeting Influential Nodes for Recovery in Bootstrap Percolation on Hyperbolic Networks

DIR2017. Training Neural Rankers with Weak Supervision. Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps, and W.

Predicting Wine Quality

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

RESEARCH UPDATE from Texas Wine Marketing Research Institute by Natalia Kolyesnikova, PhD Tim Dodd, PhD THANK YOU SPONSORS

What Makes a Cuisine Unique?

LEARNING AS A MACHINE CROSS-OVERS BETWEEN HUMANS AND MACHINES

Opportunities. SEARCH INSIGHTS: Spotting Category Trends and. thinkinsights THE RUNDOWN

ARM4 Advances: Genetic Algorithm Improvements. Ed Downs & Gianluca Paganoni

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

Amazon Fine Food Reviews wait I don t know what they are reviewing

Introduction to Management Science Midterm Exam October 29, 2002

2017 National Sponsorship OpportunitieS

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

Eco-Schools USA Sustainable Food Audit

Wine Rating Prediction

Food Image Recognition by Deep Learning

AVEINE The brand that enhances the pleasure of sharing

The Future Tortilla Market: Organic, Ancient Grains, Transitional

NVIVO 10 WORKSHOP. Hui Bian Office for Faculty Excellence BY HUI BIAN

+ + + =? Which Winery should you visit? ABOUT WHICHWINERY THE BACKGROUND FIND. TRACK. SHARE. LEARN.

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Mahout

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Reaction to the coffee crisis at the beginning of last decade

RESTAURANT AND FOOD SERVICE MANAGEMENT SERIES EVENT PARTICIPANT INSTRUCTIONS

A CASE STUDY: HOW CONSUMER INSIGHTS DROVE THE SUCCESSFUL LAUNCH OF A NEW RED WINE

Terroir: a concept to bring added value for producers and consumers. Alessandra Roversi

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

LISTEN A MINUTE.com. Eggs. One minute a day is all you need to improve your listening skills.

Is Fair Trade Fair? ARKANSAS C3 TEACHERS HUB. 9-12th Grade Economics Inquiry. Supporting Questions

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Trends analysis. Trends analysis is the practice of collecting information and attempting to spot a pattern in the information.

Social Media: Content Drives Community Groups

HERZLIA MIDDLE SCHOOL

Innovations for a better world. Ingredient Handling For bakeries and other food processing facilities

GrillCam: A Real-time Eating Action Recognition System

Comparative Advantage. Chapter 2. Learning Objectives

Background & Literature Review The Research Main Results Conclusions & Managerial Implications

The R&D-patent relationship: An industry perspective

Find the wine you are looking for at the best prices.

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

Table of Contents. Toast Inc. 2

Egg-cellent Osmosis Lab

Global Takeaway Food Delivery Market: Trends & Opportunities (2015 Edition) January 2016

ADVERTISING PARTNERSHIP WITH WHICHWINERY

Engineering Sustainability

Boosting innovation in Wine tourism through University-business collaboration: outcomes, experiences and recommendations from The Wine Lab project

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

Trends. in retail. Issue 8 Winter The Evolution of on-demand Food and Beverage Delivery Options. Content

Roaster/Production Operative. Coffee for The People by The Coffee People. Our Values: The Role:

North America Ethyl Acetate Industry Outlook to Market Size, Company Share, Price Trends, Capacity Forecasts of All Active and Planned Plants

Semantic Web. Ontology Engineering. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

On the Trail of the Blue Crab

4 Steps to Survive the Fast Casual Digital Ordering & Delivery Revolution

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

Subject Area: High School French State-Funded Course: French III

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

Social Influence Models based on Starbucks Networks

Properties of Water Lab: What Makes Water Special? An Investigation of the Liquid That Makes All Life Possible: Water!

NO TO ARTIFICIAL, YES TO FLAVOR: A LOOK AT CLEAN BALANCERS

Yelp Chanllenge. Tianshu Fan Xinhang Shao University of Washington. June 7, 2013

12% Baking Mad. Page views increased by. Ridgeway. FOOD AND DRINK

What are the Driving Forces for Arts and Culture Related Activities in Japan?

Using Six Sigma for Process Improvement. Office of Continuous Improvement, Information Technology

Multiple Imputation for Missing Data in KLoSA

Religion and Innovation

IT 403 Project Beer Advocate Analysis

Using Data to Transform the Fast-Casual Customer Experience

STABILITY IN THE SOCIAL PERCOLATION MODELS FOR TWO TO FOUR DIMENSIONS

Consumer Responses to Food Products Produced Near the Fukushima Nuclear Plant

Firebox X Edge e-series Hardware

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

DOI /j. cnki 欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟欟. R Rapid Miner Mahout

Evaluation copy. Falling Objects. Experiment OBJECTIVES MATERIALS

BANKCOIN.global WHITE PAPER VER 1.6. BANKCOIN.global WHITE PAPER SEE 1.6

MBA 503 Final Project Guidelines and Rubric

Roya Survey Developers Bil Doyle Brad Johns Greg Johnson Robin McNal y Kirsti Wal Graduate Consultant Mohammad Sajib Al Seraj Avinash Subramanian

Note Taking Study Guide UNDERSTANDING OUR PAST

Geographic Information Systemystem

THE NEW ERA OF CUSTOMER SERVICE AND CONVENIENCE

Overview Location Event Activities Exhibitor information

Value Alignment. Michele Morehouse. University of Phoenix BUS/475. Scott Romeo

Get Schools Cooking Application

Managing Multiple Ontologies in Protégé

Noun-Verb Decomposition

Portail Orange Actu, Sport, Assistance Internet, Web Consultez votre mail Orange, accdez votre espace client et retrouvez toute l actualit, le sport

An application of cumulative prospect theory to travel time variability

Transcription:

Jure Leskovec (@jure) Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S. Myers

Jure Leskovec, ICDM 2012 2 Data mining has rich history and methods for analyzing tabular data textual data time series & streams market baskets Bag of features What about relations and dependencies?

Jure Leskovec, ICDM 2012 3 Networks allow for modeling dependencies!

Jure Leskovec, ICDM 2012 4 Networks are a general language for describing realworld systems

Infrastructure Jure Leskovec, ICDM 2012 5

Economy Jure Leskovec, ICDM 2012 6

Human cell Jure Leskovec, ICDM 2012 7

Brain Jure Leskovec, ICDM 2012 8

Friends & Family Jure Leskovec, ICDM 2012 9

Jure Leskovec, ICDM 2012 10 domain2 domain1 router domain3 Internet

Media & Information Jure Leskovec, ICDM 2012 11

Society Jure Leskovec, ICDM 2012 12

Network! Jure Leskovec, ICDM 2012 13

Network! Jure Leskovec, ICDM 2012 14

Networks, why now? Jure Leskovec, ICDM 2012 15

Jure Leskovec, ICDM 2012 16 Online friendships [Ugander-Karrer-Backstrom-Marlow, 11] Corporate e-mail communication [Adamic-Adar, 05] Web: a Social and a Technological network Profound transformation in: How knowledge is produced and shared How people interact and communicate The scope of CS as a discipline

Jure Leskovec, ICDM 2012 17 Network data brings several questions: Working with network data is messy Not just wiring diagrams but also dynamics and data (features, attributes) on nodes and edges Computational challenges Large scale network data Algorithmic models as vocabulary for expressing complex scientific questions Social science, physics, biology

Jure Leskovec, ICDM 2012 18 Plan for the talk: Algorithms for network data Part 1) How to we make online social networks more useful Finding Friends Organizing Friends Part 2) Web as sensor into society Understanding Social Media Content

Growing body of research captures dynamics of social network graphs [Latanzi, Sivakumar 08] [Zheleva, Sharara, Getoor 09] [Kumar, Novak, Tomkins 06] [Kossinets, Watts 06] [L., Kleinberg, Faloutsos 05] What links will occur next?[libennowell, Kleinberg 03] Networks + many other features: Location, School, Job, Hobbies, Interests, etc. Jure Leskovec, ICDM 2012 19

[WSDM 11] Jure Leskovec, ICDM 2012 20 Learn to recommend potential friends Facebook link creation [Backstrom, L. 11] 92% of new friendships on FB are friend-of-a-friend Triadic closure [Granovetter, 73] More common friends helps: Social capital [Coleman, 88] v u w z

[WSDM 11] Jure Leskovec, ICDM 2012 21 Goal: Given a user s, recommend friends s Positive: Nodes to which s links to in the future Negative: Nodes to which s does not link Supervised ranking problem: Assign higher scores to positive nodes than to negative nodes

[WSDM 11] Jure Leskovec, ICDM 2012 22 Q: How to combine network structure and node and edge features? A: Combine PageRank with Supervised learning PageRank is great to capture importances of nodes based on the network structure Supervised learning is great with features Idea: Use node and edge features to guide the random walk

[WSDM 11] s s Run Random Walk with Restarts on the weighted graph Network Set edge strengths (want strong edges to point towards positive nodes) Q: How to set edge strengths? Idea: Set edge strengths such that SRW correctly ranks the nodes on the training data RWR assigns an importance score (visiting probability) to every node Recommend top k nodes with highest score Jure Leskovec, ICDM 2012 23

[WSDM 11] Goal: Learn an edge strength function f θ x, y = exp θ i ψ i (x, y) i ψ(x, y) features of edge (x, y) θ i parameter vector we want to learn Find f θ u, v based on training data: arg min θ δ r p < r n + λ θ 2 Positive nodes p P n N Negative nodes Penalty for violating constraint r p > r n r x score of node x on a weighted graph with edge weights f θ x, y Jure Leskovec, ICDM 2012 24

[WSDM 11] Jure Leskovec, ICDM 2012 25 Facebook Iceland network 174,000 nodes (55% of population) Avg. degree 168 Avg. person added 26 friends/month Node and edge features: Node: Age, Gender, School Edge: Age of an edge, Communication, Profile visits, Co-tagged photos s

[WSDM 11] Jure Leskovec, ICDM 2012 26 Results on Facebook Iceland: Correctly predicts 8 out of 20 (40%) new friends 2.3x improvement over previous FB-PYMK 2.3x Fraction of friending based on recommendations

Jure Leskovec, ICDM 2012 27 Supervised Random Walks are a general framework for ranking nodes on a graph There is nothing specific to link prediction here Can use any features to learn the ranking Applications: Social recommendations, ranking, filtering Friends: Trust, Homophily Others: Experts, People like you Link sentiment: Positive vs. Negative

[WWW 10] Jure Leskovec, ICDM 2012 28 Not just if you link to someone but also what do you think of them Start with the intuition [Heider 46] The friend of my friend is my friend The enemy of enemy is my friend The enemy of friend is my enemy The friend of my enemy is my enemy Balanced Unbalanced + +? + + + + + + - - + + - -

[WWW 10] Jure Leskovec, ICDM 2012 29 Model: Count the triads in which edge u v is embedded: 16 features Train Logistic Regression Predictive accuracy: >90% Signs can be modeled u - + + - - + - + v from the local network structure alone!

[NIPS 12] Jure Leskovec, ICDM 2012 30 Discover circles and why they exist

[NIPS 12] Jure Leskovec, ICDM 2012 31 Why is it useful? Organize friend lists Control privacy and access Filter and organize content On Facebook 273 people know I am a dog. The rest can only see my limited profile. All social networks have this feature: Facebook (groups), Twitter (lists), G+ (circles) But circles have to be created manually!

[NIPS 12] Jure Leskovec, ICDM 2012 32 Connections to graph partitioning & community detection [Karypis, Kumar 98] [Girvan, Newman 02] [Dhillon, Guan, Kulis 07] [Yang, Sun, Pandit, Chawla, Han 11]... but we can also use node profile information! Q: How to cluster using network as well as node feature information?

[NIPS 12] Suppose we know all the circles For a given circle C model edge prob.: p x, y exp( i θ ci ψ i (x, y) ) ψ(x, y) is edge feature vector describing (x, y) Are x and y from same school, same town, same age,... θ c parameters that we aim to estimate High θ ci means being similar in i is important for circle c Example: 1. 4 0 0 0. 3 0 0. 2 1. 1 Jure Leskovec, ICDM 2012 33 ψ x, y = θ c =

[NIPS 12] Jure Leskovec, ICDM 2012 34 Given graph G and edge features ψ(x, y) Want to discover Member nodes of each circle C Circle similarity function parameters θ c such that we maximize the likelihood of the observed network: P G; C = p(x, y) x,y G 1 p(x, y) x,y G

F1 score [NIPS 12] Given only the network (no labels) try to find the circles. How well are we doing? Ask people to hand label the circles. Compare Net+Atts Atts only Net only Our method Facebook Net+Attrs Atts only Net only Our method Google+ Jure Leskovec, ICDM 2012 35

[NIPS 12] Jure Leskovec, ICDM 2012 36 How well do we recover human circles? Social circles of a particular person:

Jure Leskovec, ICDM 2012 37 Beyond graph partitioning Overlapping clustering of networks with node/edge attributes [Yoshida 10] [McAuley, L. 12] Temporal dynamics of circles and groups Predict group evolution over time [Kairam, Wang, L. 12] [Ducheneaut, Yee, Nickell, Moore 07] Modeling circles of non-friends Node role discovery in networks [Henderson, Gallagher, Li, Akoglu, Eliassi-Rad, Tong, Faloutsos, 11]

[KDD 11] Jure Leskovec, ICDM 2012 38 What s the relation between human mobility and social networks? Location-based online social networks Brightkite, Gowalla: 10m check-ins Cell phones Portugal: 500M calls In terms of mobility the datasets are indistinguishable!

[KDD 11] Jure Leskovec, ICDM 2012 39 Goal: Model and predict human movement patterns Observation: Low location entropy at night/morning Higher entropy over the weekend 3 ingredients of the model: Spatial, Temporal, Social

[KDD 11] Jure Leskovec, ICDM 2012 40 Spatial model: Home vs. Work Location Temporal model: Mobility Home vs. Work

[KDD 11] Jure Leskovec, ICDM 2012 41

[KDD 11] Social network plays particularly important role on weekends Include social network into the model Prob. that user visits location X depends on: Distance(X, F) Time since a friend was at location F F = Friend s last known location Mobility similarity Jure Leskovec, ICDM 2012 42

[KDD 11] Cellphones: Whenever user receives or makes a call predict her location G model by Gonzalez&Barabasi RW predict last known location MF predict most frequent location PMM periodic mobility model PSMM periodic social mobility model Jure Leskovec, ICDM 2012 43

Media & Information Jure Leskovec, ICDM 2012 44

Jure Leskovec, ICDM 2012 45 Information flows from a node to node like an epidemic How does information transmitted by mainstream Engadget BBC Slashdot Obscure tech story Small tech blog NYT media interact with social networks? Wired CNN

Since August 2008 we have been collecting 30M articles/day: 6B articles, 20TB of data Challenge: How to track information as it spreads? Jure Leskovec, ICDM 2012 46

[WWW 13] Goal: Trace textual phrases that spread through many news articles Challenge 1: Phrases mutate! Mutations of a meme about the Higgs boson particle. Jure Leskovec, ICDM 2012 47

[KDD 09] Goal: Find mutational variants of a phrase Objective: In a DAG of approx. phrase inclusion, delete min total edge weight such that BDXCY each component has a single sink BCD ABC ABCD ABXCE Nodes are phrases Edges are inclusions Edges have weights ABCEFG ABCDEFGH CEF CEFP CEFPQR UVCEXF Jure Leskovec, ICDM 2012 48

[WWW 13] Jure Leskovec, ICDM 2012 49 Challenge 2: 20TB of data! Solution: Incremental phrase clustering Phrases arrive in a stream Simultaneously cluster the graph and attach new phrases to the graph Dynamically remove completed clusters Overall, it takes 1 server, 60GB memory and 4 days to process 6B documents

[WWW 13] Visualization of 1 month of data from October 2012 Browse all 4 years of data at http://snap.stanford.edu/nifty Jure Leskovec, ICDM 2012 50

[KDD 09] Jure Leskovec, ICDM 2012 51 Do blogs lead mass media in reporting news? Blogs trail for 2.5h

[KDD 10] Jure Leskovec, ICDM 2012 52 Challenge 3: Information network is hidden Goal: Infer the information diffusion network There is a hidden network, and We only see times when nodes get infected a b c e d Yellow info: (a,1), (c,2), (b,3), (e,4) Blue info: (c,1), (a,4), (b,5), (d,6)

[KDD 10] Process We observe It s hidden Virus propagation Viruses propagate through the network We only observe when people get sick But NOT who infected them Word of mouth & Viral marketing Recommendations and influence propagate We only observe when people buy products But NOT who influenced them Can we infer the underlying network? Yes, convex optimization problem! [Gomez-Rodriguez, L., Krause, 10, Myers, L., 10] Jure Leskovec, ICDM 2012 53

[KDD 10] 5,000 news sites: Blogs Mainstream media Jure Leskovec, ICDM 2012 54

[KDD 10] Blogs Mainstream media Jure Leskovec, ICDM 2012 55

[KDD 12] Jure Leskovec, ICDM 2012 56 Observe times when nodes adopt the information Potential node-to-node spread TV External News Influence sites But where did the first node find the information? How did the information jump?

[KDD 12] Jure Leskovec, ICDM 2012 57 External source Model the arrival of external exposures using event profile Neighbors Adopt The user Model the prob. of adoption using the adoption curve 21 exposures. exposure. Do I adopt? Adopt! Adopt

[KDD 12] max P(k) k at max P(k) More details: Myers, Zhu, L. : Information diffusion and external influence in networks, KDD 2012. Jure Leskovec, ICDM 2012 58

Jure Leskovec, ICDM 2012 59 Can we recognize fundamental patterns of human behavior from raw digital traces? Can such analysis help identify dynamics of polarization? [Adamic, Glance 05] Connections to mutation of information: How does attitude and sentiment change in different parts of the network? How does information change in different parts of the network?

Networks: What s beyond? Jure Leskovec, ICDM 2012 60

Networks are a natural language for reasoning about problems spanning society, technology and information Jure Leskovec, ICDM 2012 61

Jure Leskovec, ICDM 2012 62 Only recently has large scale network data become available Opportunity for large scale analyses Benefits of working with massive data Observe invisible patterns Lots of interesting networks questions both in CS as well as in general science Need scalable algorithms & models

Jure Leskovec, ICDM 2012 63 Social networks implicit for millenia are being recorded in our information systems Software has a complete trace of your activities and increasingly knows more about your behavior than you do Models based on algorithmic ideas will be crucial in understanding these developments

Jure Leskovec, ICDM 2012 64 From models of populations to models of individuals Distributions over millions of people leave open several possibilities: Individual are highly diverse, and the distribution only appears in aggregate, or Each individual personally follows (a version of) the distribution Recent studies suggests that sometimes the second option may in fact be true [Barabasi 05]

Research on networks is both algorithmic and empirical Need to network data: Stanford Large Network Dataset Collection Over 60 large online networks with metadata http://snap.stanford.edu/data SNAP: Stanford Network Analysis Platform A general purpose, high performance system for dynamic network manipulation and analysis Can process 1B nodes, 10B edges http://snap.stanford.edu Jure Leskovec, ICDM 2012 65

Jure Leskovec, ICDM 2012 67

Jure Leskovec, ICDM 2012 68 Supervised Random Walks: Predicting and Recommending Links in Social Networks by L. Backstrom, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2011. Predicting Positive and Negative Links in Online Social Networks by J. Leskovec, D. Huttenlocher, J. Kleinberg. ACM WWW International conference on World Wide Web (WWW), 2010. Learning to Discover Social Circles in Ego Networks by J. McAuley, J. Leskovec. Neural Information Processing Systems (NIPS), 2012. Defining and Evaluating Network Communities based on Ground-truth by J. Yang, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2012. The Life and Death of Online Groups: Predicting Group Growth and Longevity by S. Kairam, D. Wang, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2012.

Meme-tracking and the Dynamics of the News Cycle by J. Leskovec, L. Backstrom, J. Kleinberg. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009. Inferring Networks of Diffusion and Influence by M. Gomez-Rodriguez, J. Leskovec, A. Krause. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2010. On the Convexity of Latent Social Network Inference by S. A. Myers, J. Leskovec. Neural Information Processing Systems (NIPS), 2010. Structure and Dynamics of Information Pathways in Online Media by M. Gomez-Rodriguez, J. Leskovec, B. Schoelkopf. ACM International Conference on Web Search and Data Mining (WSDM), 2013. Modeling Information Diffusion in Implicit Networks by J. Yang, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2010. Information Diffusion and External Influence in Networks by S. Myers, C. Zhu, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2012. Clash of the Contagions: Cooperation and Competition in Information Diffusion by S. Myers, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2012. Jure Leskovec, ICDM 2012 69