-- CS341 info session is on Thu 3/18 7pm in Gates Final exam logistics

Similar documents
-- Final exam logistics -- Please fill out course evaluation forms (THANKS!!!)

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Jure Leskovec, Computer Science Dept., Stanford

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Week 5 Objectives. Subproblem structure Greedy algorithm Mathematical induction application Greedy correctness

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Incremental Record Linkage. Anja Gruenheid!! Xin Luna Dong!!! Divesh Srivastava

Managing Multiple Ontologies in Protégé

Introduction to Management Science Midterm Exam October 29, 2002

Cloud Computing CS

Lecture 9: Tuesday, February 10, 2015

What makes a good muffin? Ivan Ivanov. CS229 Final Project

A Note on H-Cordial Graphs

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Jure Leskovec Stanford University

Letter: Aa. Food Item for Tasting: Apricot - four 15 ounce cans of apricots (or two 15 ounce bags of dried apricots) per 20 students.

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

2018 CONVENTION & TRADE SHOW CALL FOR POSTERS & ORAL PRESENTATIONS

Association Rule Mining

Predicting Wine Quality

Ohio Grape-Wine Electronic Newsletter

An application of cumulative prospect theory to travel time variability

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

After your yearly checkup, the doctor has bad news and good news.

Is Fair Trade Fair? ARKANSAS C3 TEACHERS HUB. 9-12th Grade Economics Inquiry. Supporting Questions

Investigation 1: Ratios and Proportions and Investigation 2: Comparing and Scaling Rates

Investigation 1: Ratios and Proportions and Investigation 2: Comparing and Scaling Rates

Optimal Feed Rate for Maximum Ethanol Production. Conor Keith Loyola Marymount University March 2, 2016

Learning Connectivity Networks from High-Dimensional Point Processes

Healthy Hunger Free Kids Act 2010: Nutrition Standards

Lesson 23: Newton s Law of Cooling

Semantic Web. Ontology Engineering. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

PRODUCT REGISTRATION: AN E-GUIDE

MyPlate. National FCS Standard: Apply various dietary guidelines in planning to meet nutrition and wellness needs.

Wine Rating Prediction

Characteristics of Wine Consumers in the Mid-Atlantic States: A Statistical Analysis

Jake Bernstein Trading Webinar

Flavour Legislation Past Present and Future or From the Stone Age to the Internet Age and Beyond. Joy Hardinge

Title: Evaluation of Apogee for Control of Runner Growth in Annual Plasticulture Strawberries

DEVELOPING PROBLEM-SOLVING ABILITIES FOR MIDDLE SCHOOL STUDENTS

Lecture 3: How Trade Creates Wealth. Benjamin Graham

Jure Leskovec Stanford University Including joint work with L. Backstrom, D. Huttenlocher, M. Gomez-Rodriguez, J. Kleinberg, J. McAuley, S.

Promote and support advanced computing to further Tier-One research and education at the University of Houston

Fine Chocolates Since salesmaker FannieMay_Salesmaker_Spring2011.indd 1

Biocidal Product Families instead of Frame Formulations The right step forward? Sara Kirkham

Since the cross price elasticity is positive, the two goods are substitutes.

Thermal Hydraulic Analysis of 49-2 Swimming Pool Reactor with a. Passive Siphon Breaker

Multiple Imputation for Missing Data in KLoSA

Biocides IT training Vienna - 4 December 2017 IUCLID 6

Archdiocese of New York Practice Items

Biocides IT training Helsinki - 27 September 2017 IUCLID 6

Fractions with Frosting

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Why PAM Works. An In-Depth Look at Scoring Matrices and Algorithms. Michael Darling Nazareth College. The Origin: Sequence Alignment

Pg. 2-3 CS 1.2: Comparing Ratios. Pg CS 1.4: Scaling to Solve Proportions Exit Ticket #1 Pg Inv. 1. Additional Practice.

Out of Home ROI and Optimization in the Media Mix Summary Report

Mastering Measurements

What Is This Module About?

BPR Requirements for Treated Articles. A.I.S.E. Biocides WG First revision - December 2017

Bounty71 rootstock an update

MAMA SID'S PIZZA by Faith Goddard-Allen

Planning: Regression Planning

MBA 503 Final Project Guidelines and Rubric

Unit of competency Content Activity. Element 1: Organise coffee workstation n/a n/a. Element 2: Select and grind coffee beans n/a n/a

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Mahout

Driving ROI from Events. Workshop Session January 23, :00 12:00

Defining Food Justice. Food Justice Work Group, Portland / Multnomah Food Policy Council

Coffee Roasting Using Gene Café (GC) - Tips and Techniques

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

Biologist at Work! Experiment: Width across knuckles of: left hand. cm... right hand. cm. Analysis: Decision: /13 cm. Name

2 nd Midterm Exam-Solution

Divisor Cordial Graphs

EXECUTIVE SUMMARY OVERALL, WE FOUND THAT:

101 Cupcake, Cookie & Brownie Recipes (101 Cookbook Collection) By Gooseberry Patch READ ONLINE

Angel Rebollar-Alvitar and Michael A. Ellis The Ohio State University/OARDC Department of Plant Pathology 1680 Madison Avenue Wooster, OH 44691

Control of treated articles in the Biocidal Products Regulation ECHA Biocides Stakeholders Day 25 June 2013

Jake Bernstein Trading Webinar

Roya Survey Developers Bil Doyle Brad Johns Greg Johnson Robin McNal y Kirsti Wal Graduate Consultant Mohammad Sajib Al Seraj Avinash Subramanian

AWRI Refrigeration Demand Calculator

Fast Track Mentoring

Hispanic Retail Pilot Test Summary

Minute Enrico Tanuwidjaja, William Guo, Joshua Perline, Brandon Maushund, Paul Tawfik CS160 UI Design Fall 2014

Engineering Sustainability

Weather Sensitive Adjustment Using the WSA Factor Method

Gasoline Empirical Analysis: Competition Bureau March 2005

Academic Year 2014/2015 Assessment Report. Bachelor of Science in Viticulture, Department of Viticulture and Enology

Module 6. Yield and Fruit Size. Presenter: Stephan Verreynne

Starbucks Coffee Company Company Headquarters

L I V E W E L L, W O R K W E L L

European Union comments for the. CODEX COMMITTEE ON CONTAMINANTS IN FOOD (CCCF) 4th Session. Izmir, Turkey, April 2010.

Flexible Imputation of Missing Data

Food Inspection Violation, Anticipating Risk (FIVAR) Montgomery County, MD

*****Special note: On 11/06/18 St. Anthony School will be in session but Winsted does not have school so buses may be early!*****

Guided Study Program in System Dynamics System Dynamics in Education Project System Dynamics Group MIT Sloan School of Management 1

CONSEQUENCES OF THE BPR

Little Read 2013: Rules by Cynthia Lord

FIRST MIDTERM EXAM. Economics 452 International Trade Theory and Policy Spring 2011

2017 FINANCIAL REVIEW

Transcription:

-- CS341 info session is on Thu 3/18 7pm in Gates104 -- Final exam logistics CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 Alternate final: Fri 3/14 7:00-10:00pm in Cubberley Auditorium Final: Mon 3/17 12:15-3:15pm NVidia (Lastname starting with A-M) GatesB01 (Lastname starting with N-Z) See http://campus-map.stanford.edu Practice finals + Gradiance quizzes are on Piazza Open book, open computer, no internet SCPD students can take the exam at Stanford!

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 4 Exam protocol for SCPD students: On Friday 3/14 your exam proctor will receive the PDF of the final exam from SCPD If you take the exam at Stanford: Ask the exam monitor to delete the SCP email If you don t take the exam at Stanford: Arrange a 3h slot with your exam monitor You can take the exam anytime but return it in time Email exam PDF to cs246.mmds@gmail.com by Tuesday 3/15 11:59pm Pacific time

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 5

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 6 Data mining research project on real data Groups of 3 students We provide interesting data, computing resources (Amazon EC2) and mentoring You provide project ideas Class meets once a week + individual group mentoring Information session: Tuesday 3/18 7:00pm in Gates 104 (there will be pizza!)

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 7 Tue 3/18: Info session We will introduce datasets, problems, ideas Students form groups and project proposals Mon 3/24: Project proposals are due We evaluate the proposals Mon 3/31: Admission results 10 to 15 groups/projects will be admitted Mon 5/5, Wed 5/7: Midterm presentations Thu 6/10: Presentations, poster session More info: http://cs341.stanford.edu

CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 9 Redundancy leads to a bad user experience Uncertainty around information need => don t put all eggs in one basket How do we optimize for diversity directly?

Monday, January 14, 2013 France intervenes Chuck for Defense Argo wins big Hagel expects fight 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 10

Monday, January 14, 2013 France intervenes Chuck for Defense Argo wins big New gun proposals 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 11

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 12 Idea: Encode diversity as coverage problem Example: Word cloud of news for a single day Want to select articles so that most words are covered

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 13

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 14 Q: What is being covered? A: Concepts (In our case: Named entities) France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Hagel expects fight Q: Who is doing the covering? A: Documents

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 15 Suppose we are given a set of documents V Each document d covers a set XX dd of words/topics/named entities W For each set of documents A we define FF AA = XX dd dd AA Goal: We want to max AA kk FF(AA) Note: F(A) is a set function: FF AA : SSSSSSSS N

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 16 Given universe of elements WW = {ww 11,, ww nn } and sets XX 11,, XX mm WW X 3 X 2 X 4 X 1 W Goal: Find k sets X i that cover the most of W More precisely: Find k sets X i whose size of the union is the largest Bad news: A known NP-complete problem

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 17 Simple Heuristic: Greedy Algorithm: Start with AA 00 = { } For ii = 11 kk Take set dd that mmmmmm FF(AA ii 11 {dd}) Let AA ii = AA ii 11 {dd} Example: Eval. FF dd 11,, FF({dd mm }), pick best (say dd 11 ) Eval. FF dd 11 } {dd 22,, FF({dd 11 } {dd mm }), pick best (say dd 11 ) Eval. FF({dd 11, dd 22 } {dd 33 }),, FF({dd 11, dd 22 } {dd mm }), pick best And so on FF AA = XX dd dd AA

Goal: Maximize the covered area 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 18

Goal: Maximize the covered area 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 19

Goal: Maximize the covered area 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 20

Goal: Maximize the covered area 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 21

Goal: Maximize the covered area 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 22

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 23 A B C Goal: Maximize the size of the covered area Greedy first picks A and then C But the optimal way would be to pick B and C

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 24 Greedy produces a solution A where: F(A) (1-1/e)*OPT (F(A)>0.63*OPT) [Nemhauser, Fisher, Wolsey 78] Claim holds for functions F( ) with 2 properties: F is monotone: (adding more docs doesn t decrease coverage) if A B then F(A) F(B) and F({})=0 F is submodular: adding an element to a set gives less improvement than adding it to one of its subsets

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 25 Definition: Set function F( ) is called submodular if: For all A,B W: F(A) + F(B) F(A B) + F(A B) + B A A B A B +

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 26 Diminishing returns characterization Equivalent definition: Set function F( ) is called submodular if: For all A B, s B: F(A d) F(A) F(B d) F(B) Gain of adding d to a small set Gain of adding d to a large set B A + + d d Large improvement Small improvement

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 27 F( ) is submodular: A B F(A d) F(A) F(B d) F(B) Gain of adding X d to a small set Natural example: Sets XX 1,, XX mm FF AA = dd AA XX dd (size of the covered area) Claim: FF(AA) is submodular! A Gain of adding X d to a large set B X d X d

Submodularity is discrete analogue of concavity F( ) F(B) F(B d) A B F(A d) F(A) Adding d to B helps less than adding it to A! Solution size A F(A d) F(A) F(B d) F(B) Gain of adding X d to a small set Gain of adding X d to a large set 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 28

Marginal gain: ΔΔ FF dd AA = FF AA XX dd FF(AA) Submodular: FF AA dd FF AA FF BB dd FF(BB) Concavity: ff aa + dd ff aa ff bb + dd ff(bb) AA BB aa bb F(A) 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 29 A

3/11/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 30 Let FF 11 FF mm be submodular and λλ 11 λλ mm > 00 mm then FF AA = ii λλ ii FF ii AA is submodular Submodularity is closed under non-negative linear combinations! This is an extremely useful fact: Average of submodular functions is submodular: FF AA = PP ii FF ii AA ii Multicriterion optimization: FF AA = λλ ii FF ii AA ii

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 31 Q: What is being covered? A: Concepts (In our case: Named entities) France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Hagel expects fight Q: Who is doing the covering? A: Documents

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 32 Objective: pick k docs that cover most concepts France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Enthusiasm for Inauguration wanes Inauguration weekend F(A): the number of concepts covered by A Elements concepts, Sets concepts in docs F(A) is submodular and monotone! We can use greedy to optimize F

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33 Objective: pick k docs that cover most concepts France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Enthusiasm for Inauguration wanes Inauguration weekend The good: Penalizes redundancy Submodular The bad: Concept importance? All-or-nothing too harsh

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 34

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 35 Objective: pick k docs that cover most concepts France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Enthusiasm for Inauguration wanes Inauguration weekend Each concept cc has importance weight ww cc

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36 Document coverage function probability document d covers concept c [e.g., how strongly d covers c] Obama Romney Enthusiasm for Inauguration wanes

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 37 Document coverage function: probability document d covers concept c Cover d (c) can model how relevant is concept c for user u Set coverage function: Prob. that at least one document in A covers c Objective: concept weights

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 38 The objective function is also submodular Intuitive diminishing returns property Greedy algorithm leads to a (1 1/e) ~ 63% approximation, i.e., a near-optimal solution

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 39 Objective: pick k docs that cover most concepts France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL Enthusiasm for Inauguration wanes Inauguration weekend Each concept cc has importance weight ww cc Documents partially cover concepts: ccccccccrr dd (cc)

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 40

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 41 a b c Greedy Marginal gain: F(A x)-f(a) Greedy algorithm is slow! At each iteration we need to re-evaluate marginal gains of all remaning documents Runtime OO( VV KK) for selecting KK documents d e Add document with highest marginal gain

[Leskovec et al., KDD 07] In round ii: So far we have AA ii 11 = {dd 11,, dd ii 11 } Now we pick dd ii = aaaaaa mmmmmm dd VV FF(AA ii 11 {dd}) FF(AA ii 11 ) Greedy algorithm maximizes the marginal benefit ΔΔ ii dd = FF(AA ii 11 {dd}) FF(AA ii 11 ) By submodularity property: FF AA ii dd FF AA ii FF AA jj dd FF AA jj for ii < jj Observation: By submodularity: For every dd VV ΔΔ ii (dd) ΔΔ jj (dd) for ii < jj since AA ii AA jj Marginal benefits ΔΔ ii (dd) only shrink! (as i grows) i (d) j (d) Selecting document d in step i covers more words than selecting d at step j (j>i) 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42 d

[Leskovec et al., KDD 07] 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43 Idea: Use i as upper-bound on j (j > i) Lazy Greedy: Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune Marginal gain a b c d e A 1 ={a} F(A {d}) F(A) F(B {d}) F(B) A B

[Leskovec et al., KDD 07] 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44 Idea: Use i as upper-bound on j (j > i) Lazy Greedy: Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune Marginal gain a b c d e A 1 ={a} F(A {d}) F(A) F(B {d}) F(B) A B

[Leskovec et al., KDD 07] 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 45 Idea: Use i as upper-bound on j (j > i) Lazy Greedy: Keep an ordered list of marginal benefits i from previous iteration Re-evaluate i only for top node Re-sort and prune Marginal gain a d b e c A 1 ={a} A 2 ={a,b} F(A {d}) F(A) F(B {d}) F(B) A B

Summary so far: Diversity can be formulated as a set cover Set cover is submodular optimization problem Can be (approximately) solved using greedy algorithm Lazy-greedy gives significant speedup 400 Lower is better running time (seconds) 300 200 100 exhaustive search (all subsets) naive greedy Lazy 0 1 2 3 4 5 6 7 8 9 10 number of blogs selected 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 46

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 47 But what about personalization? Election trouble model Songs of Syria Sandy delays Recommendations

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 48 We assumed same concept weighting for all users France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL France intervenes Chuck for Defense Argo wins big

Each user has different preferences over concepts France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL politico France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL movie buff 3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 49

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 50 Assume each user has different preference vector over concepts Goal: Learn personal concept weights from user feedback

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 51 France Mali Hagel Pentagon Obama Romney Zero Dark Thirty Argo NFL France intervenes Chuck for Defense Argo wins big

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 52 Multiplicative Weights algorithm Assume each concept cc has weight ww cc We recommend document dd and receive feedback, say rr = +1 or -1 Update the weights: If cc XX dd then ww cc = ββ rr ww cc If cc XX dd then ww cc = ββ rr ww cc If concept c appears in X d and we received positive feedback r=+1 then we increase the weight w c by multiplying it by ββ (ββ > 11) otherwise we decrease the weight (divide by ββ) Normalize weights so that cc ww cc = 11

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 53 Steps of the algorithm: 1. Identify items to recommend from 2. Identify concepts [what makes items redundant?] 3. Weigh concepts by general importance 4. Define item-concept coverage function 5. Select items using probabilistic set cover 6. Obtain feedback, update weights

3/10/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 68