Jure Leskovec, Computer Science Dept., Stanford

Similar documents
Jure Leskovec Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S.

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

-- CS341 info session is on Thu 3/18 7pm in Gates Final exam logistics

Jure Leskovec Stanford University

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

Your key to the nut and dried fruit industry. Media Kit

Opportunities. SEARCH INSIGHTS: Spotting Category Trends and. thinkinsights THE RUNDOWN

Find the wine you are looking for at the best prices.

PROFESSIONAL COOKING, 8TH EDITION BY WAYNE GISSLEN DOWNLOAD EBOOK : PROFESSIONAL COOKING, 8TH EDITION BY WAYNE GISSLEN PDF

GLOBALIZATION UNIT 1 ACTIVATE YOUR KNOWLEDGE LEARNING OBJECTIVES

Promote and support advanced computing to further Tier-One research and education at the University of Houston

Sandringham, Auckland

Global Online Takeaway Food Delivery Market ( Edition) December 2018

Hamburger Pork Chop Deli Ham Chicken Wing $6.46 $4.95 $4.03 $3.50 $1.83 $1.93 $1.71 $2.78

FLORIDA DEPARTMENT OF CITRUS. Communicating About Citrus in a Changing World presented to the International Citrus and Beverage Conference

Beachhead Market. BHM Technology Early Adopters (that like coffee)

UberEats Overview and Outlook

DISTILLERY REPORT. Prepared for Colorado Distillers Guild

Coffee market settles lower amidst strong global exports

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

National Soyfoods Month A P R I L

The Rise of Pop-Up Dining Events and the Experiential Diner

Fairview Middle School Website District Google Calendar Global Connect Phone Notification System Publications Social Media

Trends analysis. Trends analysis is the practice of collecting information and attempting to spot a pattern in the information.

The Function of English on the Spread of Chinese Tea Culture under the Background of Cross-Border E-Commerce

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

Jetinno,a science and technology company concentrating on innovating, manufacturing and providing service for commercial coffee equipment.

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

2017 National Sponsorship OpportunitieS

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

Coffee prices maintain downward trend as 2015/16 production estimates show slight recovery

First air coffee First coffee on the blockchain

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

MONTHLY COFFEE MARKET REPORT

Challenges in Fluid Milk Consumption. October 25, 2017

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

Prices for all coffee groups increased in May

Trends. in retail. Issue 8 Winter The Evolution of on-demand Food and Beverage Delivery Options. Content

Sample. TO: Prof. Hussain FROM: GROUP (Names of group members) DATE: October 09, 2003 RE: Final Project Proposal for Group Project

Thought Starter. European Conference on MRL-Setting for Biocides

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

ADVERTISING PARTNERSHIP WITH WHICHWINERY

Hard sweet biscuit & crackers

Learning Connectivity Networks from High-Dimensional Point Processes

Chapter 4: Folk and Popular Culture. Unit 3

THE TOP 100 BABY FOOD RECIPES: EASY PUREES & FIRST FOODS FOR 6-12 MONTHS

Testing phase of the first solar restaurant of France (Europe) Pierre-André Aubert. Association Rêves Germés Restaurant Le Présage

Report Brochure P O R T R A I T S U K REPORT PRICE: GBP 2,500 or 5 Report Credits* UK Portraits 2014

AN INTRODUCTION TO CONSTELLIUM S PACKAGING AND RECYCLING CAPABILITIES Don Farrington October 25-26, 2017

Report on Italian Desserts in China

The Future Tortilla Market: Organic, Ancient Grains, Transitional

The Future of the Confectionery Market in South Africa to 2019

Bishop Druitt College Food Technology Year 10 Semester 2, 2018

We give priority to speaker requests that make the most significant contribution to achieving our priorities

Selling Australian Truffles

Jetinno R&D and Manufacturing Center Guangzhou Hi-tech Industrial

Application & Method. doughlab. Torque. 10 min. Time. Dough Rheometer with Variable Temperature & Mixing Energy. Standard Method: AACCI

Your Professional Partner in Instant Coffee. A Company of Neumann Kaffee Gruppe

Multiple Imputation for Missing Data in KLoSA

Environmental Monitoring for Optimized Production in Wineries

COMMUNICATIONS PLAN. October 26, 2016 Marcus Tuttle

MANAGEMENT PROGRAMME. Term-End Examination June, MS-68 : MANAGEMENT OF MARKETING COMMUNICATION AND ADVERTISING (L)

We give a twist to the classic American Grilled Cheese!

LEARNING AS A MACHINE CROSS-OVERS BETWEEN HUMANS AND MACHINES

THE WYBORCZA NEWS TRUCK

STUDY AND IMPROVEMENT FOR SLICE SMOOTHNESS IN SLICING MACHINE OF LOTUS ROOT

Yum! Brands Drive Profitable International Expansion. Graham Allan YRI President

The following slides collate the insights relating to food and drink only.

FRANCHISE BROCHURE Planet Grilled Cheese Business Overview

A NOTE FROM FRANCISCO NOGUEIRA THE OFFICE COFFEE CULTURE

Chilli Jam Recipes: Easy Stove-top Recipes Anyone Can Make At Home Without Canning Equipment By Amanda Kent ( ) READ ONLINE

Using Data to Transform the Fast-Casual Customer Experience

The restaurateur s guide to online ordering

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Engineering Sustainability

Record Exports for Coffee Year 2016/17

GREAT EXPECTATIONS FOR THE AMERICAS. CarneTec and Meatingplace Deliver the Largest and Fastest Growing Markets

INFLUENCER GENERATED CONTENT

Encyclopedia Of Coffee And Espresso From Beans To Brew - Complete Guide For The Home Preparation Of Filter Drip Coffee... By Krups North America

ACSI Restaurant Report 2014

Fromage Frais and Quark Market in Portugal: Market Profile to 2019

ALCOHOL & CONSUMER DRINKING TRENDS MARCH 2017

CHAPTER 7.3 FOCUS ON FAIRTRADE PRODUCTS COCOA

TITBIT WHITEPAPER TITBIT HELPS ORBIT CAFÉ INCREASE CHECK AVERAGES BY 20% AND IMPROVE EFFICIENCY AT REDUCED COST

Coffee weather report November 10, 2017.

External Trade And Income Distribution (Development Centre Studies) By Francois Bourguignon;Christian Morrisson READ ONLINE

Networking. Optimisation. Control. WMF Coffee Machines. Digital Solutions 2017.

Better Punctuation Prediction with Hierarchical Phrase-Based Translation

2013 Annual Quantification Report: Media Feedback Report Coffee in South Africa

Happy STEP BACK Into Your Healthy Lifestyle with Our Reset Plan! What s Happening this Month: IN THIS ISSUE

SAFE MILK PRODUCTION IN INDIA AS A STRATEGY FOR BETTER PRODUCER PRICE. A Journey from Quality to Safety KULDEEP SHARMA JUNE

China s Export of Key Products of Pharmaceutical Raw Materials

Reliable Profiling for Chocolate and Cacao

(a) Dead-end/conventional filtration fluid flow perpendicular to the filter medium. (b) Crossflow filtration fluid flow parallel to the filter

Chapter 1. Introduction

BrandAbout By Andrea Syverson READ ONLINE

Sustainable oenology and viticulture: new strategies and trends in wine production

Unilever and tea sustainability. The World of Tea

Transcription:

Jure Leskovec, Computer Science Dept., Stanford Includes joint work with Jaewon Yang, Manuel Gomez-Rodriguez, Jon Kleinberg, Lars Backstrom, and Andreas Krause http://memetracker.org

Jure Leskovec (jure@cs.stanford.edu) 2 Global vs. Local effects: Interaction of global effects from mass media and local effects carried by the social structure (e.g., blogs, Twitter) Internet, blogs, social media: Social media means the dichotomy between global and local influence is evaporating Speed of media reporting and discussion has intensified: very rapid progression of stories How does information transmitted by the media interact with social networks?

Jure Leskovec (jure@cs.stanford.edu) 3 In principle, we can collect nearly all (online) news media content: 10 million articles/day (50GB of data) Collecting data since Aug 08 ~10TB Could study media ecosystem at large Challenges: Humans don t scale Develop automatic computational methods What are basic units of information? Units that propagate between the nodes

[w/ Backstrom-Kleinberg, KDD 09] Jure Leskovec (jure@cs.stanford.edu) 4 Would like units that: Correspond to aggregates of articles, vary over the order of days, and can be handled at terabyte scale Plan: Identify textual fragments, phrases, memes that travel relatively intact through many articles Things that don t work: Cascasding hyper-links to articles: too fine-grained Topics as probabilistic term mixtures: too coarse-grained Named entities: too coarse-grained Common sequence of words: too noisy Idea: Quoted phrases:.* Are integral parts of journalistic practices Tend to follow iterations of a story as it evolves Are attributed to individuals and have time and location

Jure Leskovec (jure@cs.stanford.edu) [w/ Backstrom-Kleinberg, KDD 09] 5 Data from Spinn3r on the 3 months leading up to the 2008 U.S. Presidential Election: 1 million news articles and blog posts per day Essentially a complete online media coverage: 20,000 sites that are part of Google News 1.6 million blogs From August 1 to October 31 2008 90 million documents from 1.65 million sites, 390GB We extract 112 million quotes (phrases)

[w/ Backstrom-Kleinberg, KDD 09] Phrase: Our opponent is someone who sees America, it seems, as being so imperfect, imperfect enough that he s palling around with terrorists who would target their own country. 6

is periodic (weekly), no trends The bandwidth of the online media is constant 7

http://memetracker.org August October Volume over time of top 50 largest total volume phrases Jure Leskovec (jure@cs.stanford.edu) 8

9

Peak blog intensity comes about 2.5 hours after news peak. Using Google News we label: Mainstream media: 20,000 sites (44% vol.) Blog (everything else): 1.6 million sites (56% vol.) Jure Leskovec (jure@cs.stanford.edu) 10

Can classify individual sources by their typical timing relative to the peak aggregate intensity Professional blogs News media Jure Leskovec (jure@cs.stanford.edu) 11

Jure Leskovec (jure@cs.stanford.edu) 12 The oscillation of attention between mainstream and social media

Jure Leskovec (jure@cs.stanford.edu) [w/ Yang, ICDM 10] 13 Question: If New York Times mentions a meme at time t How many subsequent mentions of meme does this generate at time t+1, t+2,? Formulation: We want to predict the volume x(t) of phrase x at time t as a function of influences of sites that mentioned the meme before time t

Jure Leskovec (jure@cs.stanford.edu) [w/ Yang, ICDM 10] 14 LIM model: Given a volume over time x(t) of meme x And let: I A (t): influence curve of site A t A : time when A mentioned x Then we model: x(t+1) = W I W (t - t W ) For each site W estimate I W (t) It boils down to a least squares-like problem

[w/ Yang, ICDM 10] 15 Task: Predict volume x(t+1) of phrase x based on influences of sites that already mentioned x Setting: Using 1,000 phrases, and only 20 websites Improvement in L1 error over 1-time lag predictor By monitoring only 20 sites, we can reliably predict the overall future volume of a phrase (link, hashtag)

Jure Leskovec (jure@cs.stanford.edu) [w/ Yang, ICDM 10] 16 Business and politics are driven by mainstream media Entertainment (and sports) is driven by blogs and TV Newspapers and news agencies do not influence the volume

Jure Leskovec (jure@cs.stanford.edu) [w/ Gomez-Krause, KDD 10] 17 But how does information really spread? We only see the mentions but not the propagation Can we reconstruct (hidden) diffusion network?

[w/ Gomez-Krause, KDD 10] There is a hidden diffusion network: a b We only see times when nodes get infected: c 1 : (a,1), (c,2), (b,3), (e,4) c 2 : (c,1), (a,4), (b,5), (d,6) Want to infer who-infects-whom network The problem is NP-hard c e Our algorithm can do it near-optimally in O(N 2 ) Jure Leskovec (jure@cs.stanford.edu) d 18

[w/ Gomez-Krause, KDD 10] 5,000 news sites: Blogs Mainstream media Jure Leskovec (jure@cs.stanford.edu) 19

[w/ Gomez-Krause, 10] Blogs Mainstream media Jure Leskovec (jure@cs.stanford.edu) 20

Want to read things before others do. Detect blue & yellow soon but miss red. Jure Leskovec (jure@cs.stanford.edu) Detect all stories but late. 21

Jure Leskovec (jure@cs.stanford.edu) 22 Given a budget (e.g., of 3 blogs) Select sites to cover the most of the Web Bad news: Solving this exactly is NP-hard Good news: Theorem: Our algorithm can do it in linear time near-optimally Blogosphere

Question: Which websites should one read to catch big stories? Idea: Each blog covers part of the Web Each dot is a blog Proximity is based on the number of common cascades Jure Leskovec (jure@cs.stanford.edu) 23

Which blogs to read to be most up to date? Our solution % of stories detected (higher is better) In-links Out-links # posts (used by Technorati) Random Number of selected blogs www.blogcascades.org Jure Leskovec (jure@cs.stanford.edu) 24

Jure Leskovec (jure@cs.stanford.edu) 25

Jure Leskovec (jure@cs.stanford.edu) 26 Meme-tracking and the Dynamics of the News Cycle, by J. Leskovec, L. Backstrom, J. Kleinberg. KDD, 2009 http://cs.stanford.edu/people/jure/pubs/quoteskdd09.pdf Modeling Information Diffusion in Implicit Networks, by J. Yang, J. Leskovec, ICDM, 2010 http://cs.stanford.edu/people/jure/pubs/lim-icdm10.pdf Inferring networks of diffusion and influence, by M. Gomez-Rodriguez, J. Leskovec, A. Krause. KDD 2010 http://cs.stanford.edu/people/jure/pubs/netinf-kdd2010.pdf Covering the great recession by Pew research center's project for excellence in journalism, 2009 http://www.journalism.org/analysis_report/covering_great_recession Cost-effective Outbreak Detection in Networks by J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. KDD 2007. http://cs.stanford.edu/people/jure/pubs/detect-kdd07.pdf Cascading Behavior in Large Blog Graphs by J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. SDM, 2007. http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf