CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Similar documents
CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Jure Leskovec, Computer Science Dept., Stanford

-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

Introduction to Management Science Midterm Exam October 29, 2002

Jure Leskovec Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S.

-- CS341 info session is on Thu 3/18 7pm in Gates Final exam logistics

Jure Leskovec Stanford University

Targeting Influential Nodes for Recovery in Bootstrap Percolation on Hyperbolic Networks

What makes a good muffin? Ivan Ivanov. CS229 Final Project

Learning Connectivity Networks from High-Dimensional Point Processes

Lecture 9: Tuesday, February 10, 2015

STABILITY IN THE SOCIAL PERCOLATION MODELS FOR TWO TO FOUR DIMENSIONS

Predicting Wine Quality

Association Rule Mining

Planning: Regression Planning

A Note on H-Cordial Graphs

ENGI E1006 Percolation Handout

Algorithms in Percolation. Problem: how to identify and measure cluster size distribution

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

Managing Multiple Ontologies in Protégé

Grapes of Class. Investigative Question: What changes take place in plant material (fruit, leaf, seed) when the water inside changes state?

Objective: Decompose a liter to reason about the size of 1 liter, 100 milliliters, 10 milliliters, and 1 milliliter.

Comparative Advantage. Chapter 2. Learning Objectives

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

A Note on a Test for the Sum of Ranksums*

PageRank Based Network Algorithms for Weighted Graphs with Applications to Wine Tasting and Scientometrics *

Divisor Cordial Graphs

Quotient Cordial Labeling of Graphs

Square Divisor Cordial, Cube Divisor Cordial and Vertex Odd Divisor Cordial Labeling of Graphs

Difference Cordial Labeling of Graphs Obtained from Triangular Snakes

Brewhouse technology

Economics Homework 4 Fall 2006

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

3-Total Sum Cordial Labeling on Some New Graphs

Lecture 3: How Trade Creates Wealth. Benjamin Graham

A Recipe Recommendation System Based on Regional Flavor Similarity Lin-rong GUO, Shi-zhong YUAN *, Xue-hui MAO and Yi-ning GU

What Is This Module About?

Compiler. --- Lexical Analysis: Principle&Implementation. Zhang Zhizheng.

Out of Home ROI and Optimization in the Media Mix Summary Report

Lesson 23: Newton s Law of Cooling

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

P.Premkumar 1 and S.Mohammed Shanawaz 2,, Asst. Prof., Dept. Of Maths, Nehru Arts and Science College, Coimbatore , India

Specialty Coffee Market Research 2013

Moving Molecules The Kinetic Molecular Theory of Heat

Research Background: Weedy radish is considered one of the world s

Economics 101 Spring 2016 Answers to Homework #1 Due Tuesday, February 9, 2016

DEVELOPMENT OF A RAPID METHOD FOR THE ASSESSMENT OF PHENOLIC MATURITY IN BURGUNDY PINOT NOIR

LIVE Wines Backgrounder Certified Sustainable Northwest Wines

A CLT for winding angles of the paths for critical planar percolation

Large scale networks security strategy

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

MUMmer 2.0. Original implementation required large amounts of memory

Roaster/Production Operative. Coffee for The People by The Coffee People. Our Values: The Role:

What do Calls to Restaurants Signify?

Concepts and Vocabulary

Sustainable Coffee Challenge FAQ

Illinois Geometry Lab. Percolation Theory. Authors: Michelle Delcourt Kaiyue Hou Yang Song Zi Wang. Faculty Mentor: Kay Kirkpatrick

STUDY REGARDING THE RATIONALE OF COFFEE CONSUMPTION ACCORDING TO GENDER AND AGE GROUPS

1. right 2. obtuse 3. obtuse. 4. right 5. acute 6. acute. 7. obtuse 8. right 9. acute. 10. right 11. acute 12. obtuse

Adam Gardner, Fairtrade Foundation, September 2018

Midterm Economics 181 International Trade Fall 2005

Missing Data Treatments

Math Practice Use Operations

Artisan-made ice cream, just as you like it. And much, much more...

Chapter 3 Labor Productivity and Comparative Advantage: The Ricardian Model

Semantic Web. Ontology Engineering. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

Jetinno,a science and technology company concentrating on innovating, manufacturing and providing service for commercial coffee equipment.

Percolation By Bela Bollobás;Oliver Riordan READ ONLINE

Concepts/Skills. Materials

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Please be sure to save a copy of this activity to your computer!

Grade 5 / Scored Student Samples ITEM #5 SMARTER BALANCED PERFORMANCE TASK

Exchange and Opportunity Cost. Absolute Advantage. Exchange and Opportunity Cost. Comparative Advantage

WineScan All-in-one wine analysis including free and total SO2. Dedicated Analytical Solutions

Is Fair Trade Fair? ARKANSAS C3 TEACHERS HUB. 9-12th Grade Economics Inquiry. Supporting Questions

Parameters Effecting on Head Brown Rice Recovery and Energy Consumption of Rubber Roll and Stone Disk Dehusking

Food Matters. Main Core Tie. Additional Core Ties. Group Size

Unit 4P.2: Heat and Temperature

NU 620 Performance Evaluation

R A W E D U C A T I O N T R A I N I N G C O U R S E S. w w w. r a w c o f f e e c o m p a n y. c o m

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

S t u d e n t Workbook 2017

Memorandum of understanding

After your yearly checkup, the doctor has bad news and good news.

16.1 Volume of Prisms and Cylinders

The restaurateur s guide to online ordering

Grilling is not an art, it is a science Kay Smarsly

Chapter 4: Folk and Popular Culture. Unit 3

Environmental Monitoring for Optimized Production in Wineries

Pizza Builder Bundle. Cut & Build Pizza Toppings Sort Counting Line Tracing Writing Practice Shape Practice

Cotton Crop Maturity Determination

Swiss Trade Mediamatics (Sample for year 2017)

Chapter 1. Introduction

Mastering Measurements

Skip Treatments: When to use them and how to decide

FOR PERSONAL USE. Capacity BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN ACTIVITY ASSESSMENT OPPORTUNITIES. Grade 3 Quarter 1 Activity 2

U.S. Retail Coffee. Joe Stanziano Senior Vice President and General Manager, Coffee

2. What is percolation? ETH Zürich, Spring semester 2018

Module 0. Domaine commun Exemple de document inconnu COFFEE

Figure 1: Percentage of Pennsylvania Wine Trail 2011 Pennsylvania Wine Industry Needs Assessment Survey

Transcription:

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Progress reports are due on Thursday! What do we expect from you? About half of the work should be done Milestone/progress report Hand din a short write up of your current results (what have you accomplished so far) And a very briefly what further plans you have 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 2

Networks of tightly connected groups Network communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Communities, clusters, groups, modules 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 3

How to automatically find such densely connected groups ofnodes? Ideally such automatically detected clusters would then correspond to real groups For example: Communities, clusters, groups, modules 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 4

Find micro markets markets by partitioning the query x advertiser graph: query advertiser 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 5

Zachary s Karate club network: 11/10/2009 Observe social ties and rivalries in a university karate club During his observation, conflicts led the group to split Split could be explained by a minimum cut in the network Why would we expect such clusters to arise? Jure Leskovec, Stanford CS322: Network Analysis 6

[Backstrom et al. KDD 06] In a social network nodes explicitly declare group membership: Facebook groups, Publication venue Can think of groups as node colors Gives insights into social dynamics: Recruits friends? Memberships spread along edges Doesn t recruit? Spread randomly What factors influence a person s decision to join a group? 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 7

[Backstrom et al. KDD 06] Analogous to diffusion Group memberships spread over the network: Red circles represent existing group members Yellow squares may join Question: How does prob. of joining a group depend on the number of friends already in the group? 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 8

[Backstrom et al. KDD 06] LiveJournal: 1 million users 250,000 groups DBLP: 400,000 papers 100,000000 authors 2,000 conferences Diminishing returns: Probability of joining increases with the number of friends in the group But increases get smaller and smaller 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 9

[Backstrom et al. KDD 06] Connectedness of friends: x and y have three friends in the group x s fi friends are independent d y s friends are all connected Who is more likely to join? x y 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 10

[Backstrom et al. KDD 06] Competingsociological theories: x y Information argument [Granovetter 73] Social capital argument [Coleman 88] Information argument: Unconnected friends give independent support Social capital argument: Safety/trust advantage in having friends who know each other 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 11

[Backstrom et al. KDD 06] LiveJournal: 1 million users, 250,000 groups Social capital argument wins! Prob. of joining increases with the number of adjacent members. 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 12

A person is more likely to join a group if she has more friends who are already in the group friends have more connections between themselves So, groups form clusters of tightly connected nodes 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 13

How to extract groups? Many methods: Linear (low rank) methods: If Gaussian, then low rank space is good Kernel (non linear) methods: If low dimensional i l manifold, then kernels are good Hierarchical methods: Top downandbottom up common in social sciences Graph partitioning methods: Define edge counting metric conductance, expansion, modularity, etc. and optimize! 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 14

[Onnela et al. 07] Real edge strengths in mobile call graph 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 15

[Girvan Newman PNAS 02] Divisive hierarchical clustering based on the notion of edge betweenness: Number of shortest paths passing through the edge Remove edges in decreasing betweenness Example: 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 16

[Girvan Newman PNAS 02] 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 17

[Newman Girvan PhysRevE 03] Zachary s Karate club: hierarchical decomposition 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 18

[Newman Girvan PhysRevE 03] Communities in physics collaborations 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 19

Breath first search starting ti from A: Want to compute betweenness of paths starting at node A 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 20

Count the number of shortest paths from A to all other nodes of the network: 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 21

Compute betweenness by working up the tree: If there are multiple paths count them fractionally Repeat the BFS procedure for each node of the network Add edge scores 1 path to K Split evenly 1+1 paths to H Split evenly 1+0.5 paths to J Split 1:2 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 22

[Kumar et al. 99] Searching for small communities in a web graph (1) The signature of a community/discussion A dense 2 layer graph Intuition: a bunch of people all talking about the same things 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 23

(2) A more well defined problem: enumerate all complete bipartite subgraphs K s,t = s nodes each links to the same t other nodes 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 24

A) From (2) get back to (1): Via: any dense enough graph as in (1) contains a smaller K s,t as a subgraph B) How do we solve (2) in a giant graph? What similar problems have been solved on a giant non graph datsets? (3) Frequent itemset enumeration [Agrawal Srikant Sik t 99] 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 25

[Agrawal Srikant 94] Example: What items are bought together in a store? Setting: Universe U of n items m subsets of U: S 1, S 2,, S m U (S i is a set of items one person bought) Frequency threshold f Goal: Find all subsets T st s.t. T S i of f sets S i (items in T were bought together f times) 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 26

[Agrawal Srikant 94] Example: U={1,2,3,4,5} S 1 ={1,3,5}, {135} S 2 ={2,3,4}, {234} S 3 ={2,4,5}, {245} S 4 ={3,4,5}, {345} S 5 ={1,3,4,5}, S 6 ={2,3,4,5} f=3 Algorithm: build up the lists Insight: for a frequent set of size k all its subsets are also frequent 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 27

U={1,2,3,4,5} U={12345} S 1 ={1,3,5}, S 2 ={2,3,4}, S 3 ={2,4,5}, S 4 ={3,4,5}, S 5 ={1,3,4,5},,, S 6 ={2,3,4,5},, f=3 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 28

For i = 1,,k Find all frequent sets of size iby composing sets of sizei 1 i 1 that differ in 1 element Open question: Efficiently find only maximal frequent sets 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 29

Claim: (3) (itemsets) solves (2) (bipartite subgraphs) How? View each node i as a set S i of nodes i points to K s,t = a set y of size t that occurs in s sets S i Looking for K s,t set of frequency threshold h to s and look at layer t all frequent sets of size t. 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 30

(2) (1): Informally, every dense enough bipartite graph G contains a K s,t subgraph where s and t depend on size (# of nodes) and density (avg. degree) of G [Kovan Sos Turan 53] Theorem: Let G=(X,Y,E), X = Y =n with avg. degree: 1/ t 1 1 / t d s n then G contains K s,t as a subgraph t 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 31

Proof: Recall: a b a( a 1)...( a b 1) b! Ltf( Let f(x) = x(x 1)(x 2) (x k) ( Once x k, f(x) curves upward (convex) Supposed g is convex, want to min n g(x i ) where n x i =x To minimize n g(x i ) make each x i = x/n 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 32

Node i, degree d i : Potential right hand sides of K s,t (i.e., all size t subsets of Y) Put node i in buckets for all size t subsets of its neighbors 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 33

As soon as s people appear in a bucket we have a K s,t How many buckets node i contributes? What is the total size of all buckets? 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 34

So the total height of all buckets is 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 35

How many buckets are there? What is the average height of buckets? So by pigeonhole principle, there must be a bucket with more than s nodes in it. 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Girvan Newman: based on strength of weak ties Remove edge of highest h bt betweenness Extracting complete bipartite subgraphs: Frequent itemsets and dynamic programming Theorem that complete bipartite subgraphsare embedded in bigger graphs 11/10/2009 Jure Leskovec, Stanford CS322: Network Analysis 37