Jure Leskovec Stanford University

Similar documents
CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Jure Leskovec, Computer Science Dept., Stanford

Jure Leskovec Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S.

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Predicting Wine Quality

-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

Managing Multiple Ontologies in Protégé

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

What Makes a Cuisine Unique?

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

Introduction to Management Science Midterm Exam October 29, 2002

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Sandringham, Auckland

4 Steps to Survive the Fast Casual Digital Ordering & Delivery Revolution

Targeting Influential Nodes for Recovery in Bootstrap Percolation on Hyperbolic Networks

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

+ + + =? Which Winery should you visit? ABOUT WHICHWINERY THE BACKGROUND FIND. TRACK. SHARE. LEARN.

Is Your Restaurant Ready for the Growing Online Ordering Trend?

Click to edit Master title style Delivering World-Class Customer Service Through Lean Thinking

-- CS341 info session is on Thu 3/18 7pm in Gates Final exam logistics

Sustainable Coffee Challenge FAQ

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

An Advanced Tool to Optimize Product Characteristics and to Study Population Segmentation

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

STUDY REGARDING THE RATIONALE OF COFFEE CONSUMPTION ACCORDING TO GENDER AND AGE GROUPS

Table of Contents. Contact Information

LEARNING AS A MACHINE CROSS-OVERS BETWEEN HUMANS AND MACHINES

A CASE STUDY: HOW CONSUMER INSIGHTS DROVE THE SUCCESSFUL LAUNCH OF A NEW RED WINE

Esri Demographic Data Release Notes: Israel

Social Media: Content Drives Community Groups

UTZ Inspiration Communications examples for Out of Home

Foodservice EUROPE. 10 countries analyzed: AUSTRIA BELGIUM FRANCE GERMANY ITALY NETHERLANDS PORTUGAL SPAIN SWITZERLAND UK

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Mahout

Out of Home ROI and Optimization in the Media Mix Summary Report

Reaction to the coffee crisis at the beginning of last decade

Understanding consumer health choices

Subject Area: High School French State-Funded Course: French III

Making inspection results public for a better food safety, also in Belgium

DIR2017. Training Neural Rankers with Weak Supervision. Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps, and W.

ARM4 Advances: Genetic Algorithm Improvements. Ed Downs & Gianluca Paganoni

Salmon Brand Building in Asia

Global Takeaway Food Delivery Market: Trends & Opportunities (2015 Edition) January 2016

TOOLS AND TECHNIQUES FOR MEASURING THE OBESOGENIC ENVIRONMENT

What makes a good muffin? Ivan Ivanov. CS229 Final Project

Fractions with Frosting

R A W E D U C A T I O N T R A I N I N G C O U R S E S. w w w. r a w c o f f e e c o m p a n y. c o m


Report Brochure HISPANIC WINE DRINKERS IN THE US MARKET NOVEMBER REPORT PRICE: GBP 1000 EUR 1200 USD 1600 AUD 1700 or 2 Report Credits

The. Strauss Group. Company Presentation April 2015

Eco-Schools USA Sustainable Food Audit

DEVELOPING PROBLEM-SOLVING ABILITIES FOR MIDDLE SCHOOL STUDENTS

CASE STUDY: HOW STARBUCKS BREWS LOGISTICS SUCCESS

Opportunities. SEARCH INSIGHTS: Spotting Category Trends and. thinkinsights THE RUNDOWN

The Future Tortilla Market: Organic, Ancient Grains, Transitional

CS 387: GAME AI PROCEDURAL CONTENT GENERATION

Semantic Web. Ontology Engineering. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

CLAC LATIN AMERICAN AND THE CARIBBEAN NETWORK OF FAIR TRADE SMALL-SCALE PRODUCERS AND WORKERS

The World's Greatest Coffee Franchise Guide By Stanley Victors READ ONLINE

Charlie to Go Online Ordering Guide

Trending Now. E f f e c t. The E-commerce. Strategic Insights & Category Management

Learning Connectivity Networks from High-Dimensional Point Processes

The Definitive Guide to Crushing ICOs

101 Cupcake, Cookie & Brownie Recipes (101 Cookbook Collection) By Gooseberry Patch READ ONLINE

NVIVO 10 WORKSHOP. Hui Bian Office for Faculty Excellence BY HUI BIAN

-PhoCusWright, Social Media in Travel, June 2010

Housing Quality in Europe A Comparative Analysis Based on EU-SILC Data

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

Using Six Sigma for Process Improvement. Office of Continuous Improvement, Information Technology

An application of cumulative prospect theory to travel time variability

Biocides IT training Helsinki - 27 September 2017 IUCLID 6

UNIVERSITY OF PLYMOUTH FAIRTRADE PLAN

Achieve Optimal Health!

Coffee Culture, Destinations And Tourism (Tourism And Cultural Change) READ ONLINE

MANAGEMENT PROGRAMME. Term-End Examination June, MS-68 : MANAGEMENT OF MARKETING COMMUNICATION AND ADVERTISING (L)

Biocides IT training Vienna - 4 December 2017 IUCLID 6

Background & Literature Review The Research Main Results Conclusions & Managerial Implications

Starbucks Geography Summary

# 1 in exports of Chilean wine (exports 33.3% of bottled total).

PageRank Based Network Algorithms for Weighted Graphs with Applications to Wine Tasting and Scientometrics *

Find the wine you are looking for at the best prices.

AST Live November 2016 Roasting Module. Presenter: John Thompson Coffee Nexus Ltd, Scotland

New from Packaged Facts!

2016 AGU Fall Meeting Scientific Program Public Affairs

IWC Online Resources. Introduction to Essay Writing: Format and Structure

Cook With Us. dishdivvy. Connecting Fabulous HomeCooks with Hungry Neighbors. #enjoyyourshare. All Rights Reserved

HEALTHY ITALIAN GELATO GELATO-GO FRANCHISING 2017

Parent Self Serve Mobile

Terroir: a concept to bring added value for producers and consumers. Alessandra Roversi

FOR PERSONAL USE. Capacity BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN ACTIVITY ASSESSMENT OPPORTUNITIES. Grade 3 Quarter 1 Activity 2

The Financing and Growth of Firms in China and India: Evidence from Capital Markets

CUPS that CARE RANGE GUIDE 2015

A world of premium powders for every application

SPLENDID SOIL (1 Hour) Addresses NGSS Level of Difficulty: 2 Grade Range: K-2

Sample. TO: Prof. Hussain FROM: GROUP (Names of group members) DATE: October 09, 2003 RE: Final Project Proposal for Group Project

WINE MANAGAMENT PLATFORM FOR WAREHOUSES

C est à toi! Level Three, 2 nd edition. Correlated to MODERN LANGUAGE CURRICULUM STANDARDS EXPANDING LEVEL

Welcome to Coffee Planet

A Hedonic Analysis of Retail Italian Vinegars. Summary. The Model. Vinegar. Methodology. Survey. Results. Concluding remarks.

Transcription:

Jure Leskovec Stanford University

Online friendships [Ugander-Karrer-Backstrom-Marlow, 11] Corporate e-mail communication [Adamic-Adar, 05] Social Transformation of Computing Technological networks intertwined with social Profound transformation in: How knowledge is produced and shared How people interact and communicate The scope of CS as a discipline 6/28/2012 Jure Leskovec, Stanford University 2

6/28/2012 Jure Leskovec, Stanford University 3 Two issues for foundations of computing (1) How do we design in this space? Combine social models with core ideas from computing Complex networks: design, analysis, models Algorithmic game theory: designing with incentives Social media: reputation, recommendation, contagion

6/28/2012 Jure Leskovec, Stanford University 4 Two issues for foundations of computing (2) Science advanced when the invisible becomes visible. Can we recognize fundamental patterns of human behavior from raw digital traces? Can new computational models address long-standing social-science questions?

6/28/2012 Jure Leskovec, Stanford University 5 We are surrounded by linked objects Social networks: Friendships/informal contacts among people Collaboration in companies, organizations, Information networks: Content creation, markets People seeking information Traditionally networks were hard to obtain

6/28/2012 Jure Leskovec, Stanford University 6 Now: Large on-line systems Social networks: On-line communities: Facebook, Twitter,... E-mail, blogging, electronic markets Information networks: Hypertext, Wikipedia, Web What have we learned about these networks?

6/28/2012 Jure Leskovec, Stanford University 7 We know a lot about the structure Network Property Social Networks (MSN [Leskovec,Horvitz 08]) Information Networks (Web [Broder et al. 00]) Connectivity: Well connected Degrees: Heavy-tailed Giant component of 99.9% nodes Log-normal Giant component of 90% nodes Power-law Diameter: Small 6-degrees of separation ~20 Small-world Bow-tie Model In Core 40% Out

6/28/2012 Jure Leskovec, Stanford University 8 We know much less about processes! What process is common to both? Navigation! How people find their way through social networks? How people find information on the Web, Wikipedia?

6/28/2012 Jure Leskovec, Stanford University 9 Browsing the Web Literature search Consulting an encyclopedia

6/28/2012 Jure Leskovec, Stanford University 10 Milgram s small-world experiment [ 67] People forward letters via friends to far-away targets they don t know Six steps on avg. Six degrees of separation Milgram experiment (Travers-Milgram 70)

6/28/2012 Jure Leskovec, Stanford University 11 You are here Get there!

6/28/2012 Jure Leskovec, Stanford University 12 Study navigation in social as well as information networks What is common? What differs? What are the design implications for computing applications and systems? Common theme: Use large-scale online data to as a telescope into these processes

6/28/2012 Jure Leskovec, Stanford University 13 Sharon, MA Boston, MA Omaha, NE Council Bluffs, IA Pittsburgh, PA Why should strangers be able to find short chains of acquaintances linking them together? Models for decentralized routing in social networks [Kleinberg 00, Watts-Dodds-Newman 02,...]

[Leskovec-Horvitz, 08] The MSN Messenger network: 180 million people, 1.3 billion edges Fraction of country s population on MSN: Iceland: 35% Spain: 28% Netherlands, Canada, Sweden, Norway: 26% France, UK: 18% USA, Brazil: 8% 6/28/2012 Jure Leskovec, Stanford University 14

[Leskovec-Horvitz, 08] 6/28/2012 Jure Leskovec, Stanford University 15 Av.g degree of separation Avg. degree of separation = 6.6, mode=6 Long paths (>30) exist in the network Network is robust to removal of hubs

[Leskovec-Horvitz, 12] What are characteristics of short paths? How hard is it to find them? Strategy: S-T shortest-paths Pick random S-T, run Dijkstra, examine the paths Source S T C B A E Target U Def: Node is lucrative, if it leads closer to T 6/28/2012 Jure Leskovec, Stanford University 16 D F

[Leskovec-Horvitz, 12] Many good choices High degree nodes Node Degree # Lucrative Nodes Steps to-go to T S 2 1 T 6/28/2012 Jure Leskovec, Stanford University 17

P(Lucrative) Probability of success if we forward to a random neighbor Steps to-go to T S 2 1 T 6/28/2012 Jure Leskovec, Stanford University 18

Geo-distance to T [10 3 km] Path makes longest strides towards T in steps 4 and 3 Steps to-go to T S 2 1 T 6/28/2012 Jure Leskovec, Stanford University 19

6/28/2012 Jure Leskovec, Stanford University 20 How good are heuristics at navigation? Heuristics: Jump to a node X chosen: R: Random G: min geo(xx, TT) D: max deg (XX) DG: min gggggg(xx,tt) deg 2 (XX) P(Lucrative) Steps to-go to T

Bottom line: P(hit T in 10 steps) = 0.001 P(get in 10km of T in 10 steps) = 1 P hit (T) Geography provides an important cue but fails in local neighborhoods P 10km (T) Steps 6/28/2012 Jure Leskovec, Stanford University 21

6/28/2012 Jure Leskovec, Stanford University 22 How do these translate to navigation in information networks? Web-browsing Encyclopedia navigation

[West-Leskovec, 12] Large-scale study of navigation in Wikipedia Understand how humans navigate Wikipedia 6/28/2012 Jure Leskovec, Stanford University Get an idea of how people connect concepts 23

[West-Leskovec, 12] Optimal solution: DIK-DIK, WATER, GERMANY, EINSTEIN 6/28/2012 Jure Leskovec, Stanford University 24 Goal-directed navigation of Wikipedia

6/28/2012 Jure Leskovec, Stanford University [West et al., 09] 25

[West-Leskovec, 12] 6/28/2012 Jure Leskovec, Stanford University 26 Graph: Wikipedia Selection for schools 4,000 articles, 120,000 links Shortest paths between all pairs: median 3, mean 3.2, max 9 Wikispeedia 30,000 games since Aug 2009 9,400 distinct IP addresses Important: We know the target!

6/28/2012 Jure Leskovec, Stanford University 27 optimal solutions mode 3, median 3, mean 2.9 incl. back-clicks mode 4, median 5, mean 5.8 excl. back-clicks mode 4, median 4, mean 4.9 Larger variance in human than opt. paths Overall, humans not much worse than opt.

Only missions of SPL 3 Distance to-go to the target Distance to-go to the target 6/28/2012 Jure Leskovec, Stanford University 28

6/28/2012 Jure Leskovec, Stanford University 29 For each path position: Logistic regression to predict human choice Inspect weights for similarity KOREA and degree MUSIC current chosen by human (pos. example) KIMCHI target... ORPHEUS not chosen by human (neg. example)

For each path position: Logistic regression to predict human choice Inspect weights for content similarity & degree Feature weight content degree Step on the path 6/28/2012 Jure Leskovec, Stanford University 30

6/28/2012 Jure Leskovec, Stanford University 31 For each path position: Logistic regression to predict human choice Inspect weights for content similarity & degree Feature weight content degree Step on the path

6/28/2012 Jure Leskovec, Stanford University 32 Path:... Water Germany Albert Einstein Endgame strategy: Map last 3 articles to categories: Science Geography People Few popular endgame strategies (Target category)³ typically most popular Among non-target categories, Geography most popular

Overhead = human game length optimal game length optimal game length people technology geography most pop. multi-cat. all games single-cat. 6/28/2012 Jure Leskovec, Stanford University 33

6/28/2012 Jure Leskovec, Stanford University 34 Can we build machines that navigate better than humans?

6/28/2012 Jure Leskovec, Stanford University 35 No common sense, only low-level knowledge such as word counts Common sense and high-level background knowledge Who is better?

6/28/2012 Jure Leskovec, Stanford University 36 An agent aims to navigate to target T A C E T Target U D F Agent is currently at node U and navigates to neighbor W s.t. WW = arg max UU WW VV(WW UU, TT) Ideally: VV(EE UU, TT) > VV(CC UU, TT) What is the value function?

6/28/2012 Jure Leskovec, Stanford University 37 (1) Human (2) Similarity based (TXT): VV WW UU, TT = tf idf(ww, TT) Go to W that is textually most similar to T (3) Machine learning agents (ML): Use human/shortest paths to learn the value function Support Vector Machines Reinforcement Learning

6/28/2012 Jure Leskovec, Stanford University 38 Features for the machine learning agents Inspired by analysis of human behavior sim(next, target) sim(current, next) (TF-IDF cosine) deg(next) taxdist(next, target) (taxonomical distance) linkcos(next, target) (cosine similarity in outgoing hyperlinks)

6/28/2012 Jure Leskovec, Stanford University 39 H ML TXT H ML TXT Machine beats human! But, machines can get terribly lost Humans are sloppy (83% they miss a direct link)

6/28/2012 Jure Leskovec, Stanford University 40 Can we predict where the user is going?

6/28/2012 Jure Leskovec, Stanford University 41 to be predicted given Task: Given first few clicks Predict the target player is trying to reach

6/28/2012 Jure Leskovec, Stanford University 42 Markov model of human navigation next current target params features Predict the most likely target given path prefix

to be predicted & given for training given Fit Θ in learning-to-rank setup [Weston et al. 10] initial Θ Kimchi Gopher Albert Einstein training final Θ Albert Einstein Football Orpheus... 6/28/2012 Jure Leskovec, Stanford University 43

6/28/2012 Jure Leskovec, Stanford University 44 Given choice of 2, choose true target chance 3 clicks observed 2 clicks observed 1 click observed Rank articles such that true target gets high rank

6/28/2012 Jure Leskovec, Stanford University 45 Humans manage to find their ways in large networks, despite having only local information How do they do it? Analyze large-scale data from the MSN network and Wikispeedia game Answer: They leverage expectations about network connectivity, based on background knowledge

6/28/2012 Jure Leskovec, Stanford University 46 Computational ideas play 2 crucial roles Designing systems in this new space Modeling the social processes Designing systems: Search engines User click-trails for web search ranking [Bilenko-White, 08] Web revisitation patterns for crawling [Adar et al. 08]

6/28/2012 Jure Leskovec, Stanford University 47 Designing systems: Navigational tools Is user lost? Where is she trying to go? User facing tools and browsers: ScentTrails [Olston-Chi, 03] Creating navigable networks Navigable maps, ontologies [Helic-Strohmaier et al., 11] Social browsing

6/28/2012 Jure Leskovec, Stanford University 48 Models: How we search for information Information scent [Chi et al., 01] Information foraging [Pirolli, 99] Networks facilitate new ways of interacting with information Targeted search vs. Casual browsing Can all this help us understand ourselves and each other any better?

6/28/2012 Jure Leskovec, Stanford University 49

6/28/2012 Jure Leskovec, Stanford University 50