THE STATISTICAL SOMMELIER

Similar documents
Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Oenometrics VII Conference Reims, May 11-13, Predicting Italian wines quality from weather data and experts ratings (DRAFT)

PARENTAL SCHOOL CHOICE AND ECONOMIC GROWTH IN NORTH CAROLINA

Analysis of Things (AoT)

Valuation in the Life Settlements Market

Predicting Wine Quality

A Note on a Test for the Sum of Ranksums*

Investment Wines. - Risk Analysis. Prepared by: Michael Shortell & Adiam Woldetensae Date: 06/09/2015

then to explore the effect that this has on the initial and final prices of the wines. 1. Vineyards and Vintages

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

Flexible Working Arrangements, Collaboration, ICT and Innovation

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

OF THE VARIOUS DECIDUOUS and

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

STAT 5302 Applied Regression Analysis. Hawkins

Climate change may alter human physical activity patterns

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

THE ECONOMICS OF WINE: PRICING, QUALITY AND RATE OF RETURN PART III: EXPERT OPINION, REPUTATION, AND THE PRICE OF WINE

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Introduction to Management Science Midterm Exam October 29, 2002

Quantifying Agricultural Drought: An Assessment Using Western Canadian Spring Wheat

The Hedonic Approach to Vineyard Site Selection: Adaptation to Climate Change and Grape Growing in Emerging Markets*

STACKING CUPS STEM CATEGORY TOPIC OVERVIEW STEM LESSON FOCUS OBJECTIVES MATERIALS. Math. Linear Equations

Should We Put Ice in Wine? A Difference-in-Differences Approach from Switzerland

WINE ANALYTICS. The Impact of Weather and Liv-ex 100 Index on En Primeur Prices

A.P. Environmental Science. Partners. Mark and Recapture Lab addi. Estimating Population Size

Regression Models for Saffron Yields in Iran

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

MATERIALS AND METHODS

Gasoline Empirical Analysis: Competition Bureau March 2005

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Relation between Grape Wine Quality and Related Physicochemical Indexes

PREDICTING THE QUALITY AND PRICES OF BORDEAUX WINE*

Handling Missing Data. Ashley Parker EDU 7312

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

IT 403 Project Beer Advocate Analysis

Coffee weather report November 10, 2017.

Missing Data Treatments

Problem Set #3 Key. Forecasting

Survival of the Fittest: The Impact of Eco-certification on the Performance of German Wineries Patrizia FANASCH

The age of reproduction The effect of university tuition fees on enrolment in Quebec and Ontario,

The organoleptic control of a wine appellation in France

The Legacy of Gurus: The Impact of Armin Diel and Joel Payne on Winery Ratings in Germany. Bernd Frick 1 2

L&S 39G. Health, Human Behavior, and Data. Prof. Ryan Edwards. Class 8 Strong exogeneity: Weather, beer and Student's t, wine quality models

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

Multiple Imputation for Missing Data in KLoSA

Diffusion, Osmosis, and Water Potential Lab Report

Timmie s tops in customer satisfaction

PROCEDURE million pounds of pecans annually with an average

Evaluating Population Forecast Accuracy: A Regression Approach Using County Data

The price of grange: an oenometric investigation

An application of cumulative prospect theory to travel time variability

The Development of a Weather-based Crop Disaster Program

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

Grape Growers of Ontario Developing key measures to critically look at the grape and wine industry

Wine Futures: Pricing and Allocation as Levers against Quality Uncertainty

The connoisseurs choice for a portfolio with Fine French Wines

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

BORDEAUX 2016: A NEW DAWN FOR EN PRIMEUR?

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

*During the 2000s, investing in wine became very. *We observed an increase in the number of investment

Using Growing Degree Hours Accumulated Thirty Days after Bloom to Help Growers Predict Difficult Fruit Sizing Years

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Perspective of the Labor Market for security guards in Israel in time of terror attacks

Imputation Procedures for Missing Data in Clinical Research

Internet Appendix for Does Stock Liquidity Enhance or Impede Firm Innovation? *

*p <.05. **p <.01. ***p <.001.

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Quality, Trade, and Exchange Rate Pass-Through

Please sign and date here to indicate that you have read and agree to abide by the above mentioned stipulations. Student Name #4

HONDURAS. A Quick Scan on Improving the Economic Viability of Coffee Farming A QUICK SCAN ON IMPROVING THE ECONOMIC VIABILITY OF COFFEE FARMING

Results from the First North Carolina Wine Industry Tracker Survey

21/06/2009. Metric Tons (000) '95 '96 '97 '98 '99 '00 '01 '02 '03 '

TexaS Wine Journal. Category Report Merlot

Imputation of multivariate continuous data with non-ignorable missingness

Effects of political-economic integration and trade liberalization on exports of Italian Quality Wines Produced in Determined Regions (QWPDR)

Corn Quality for Alkaline Cooking: Analytical Challenges

Modeling Regional Endogenous Growth

The Economics of Dollarware

The Elasticity of Substitution between Land and Capital: Evidence from Chicago, Berlin, and Pittsburgh

Chemical Components and Taste of Green Tea

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Introduction to the Practical Exam Stage 1. Presented by Amy Christine MW, DC Flynt MW, Adam Lapierre MW, Peter Marks MW

Name: Adapted from Mathalicious.com DOMINO EFFECT

PREDICTION MODEL FOR ESTIMATING PEACH FRUIT WEIGHT AND VOLUME ON THE BASIS OF FRUIT LINEAR MEASUREMENTS DURING GROWTH

November 9, Myde Boles, Ph.D. Program Design and Evaluation Services Multnomah County Health Department and Oregon Public Health Division

Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Statistics & Agric.Economics Deptt., Tocklai Experimental Station, Tea Research Association, Jorhat , Assam. ABSTRACT

Summary of Main Points

Vintage stuff. Gambit special report

2017 FINANCIAL REVIEW

THE WINEMAKER S TOOL KIT UCD V&E: Recognizing Non-Microbial Taints; May 18, 2017

Transcription:

THE STATISTICAL SOMMELIER An Introduction to Linear Regression 15.071 The Analytics Edge

Bordeaux Wine Large differences in price and quality between years, although wine is produced in a similar way Meant to be aged, so hard to tell if wine will be good when it is on the market Expert tasters predict which ones will be good Can analytics be used to come up with a different system for judging wine? 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

Predicting the Quality of Wine March 1990 - Orley Ashenfelter, a Princeton economics professor, claims he can predict wine quality without tasting the wine 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2

Building a Model Ashenfelter used a method called linear regression Predicts an outcome variable, or dependent variable Predicts using a set of independent variables Dependent variable: typical price in 1990-1991 wine auctions (approximates quality) Independent variables: Age older wines are more expensive Weather Average Growing Season Temperature Harvest Rain Winter Rain 15.071x The Statistical Sommelier: An Introduction to Linear Regression 3

The Data (1952 1978) 6.5 7.5 8.5 5 10 15 20 25 30 Age of Wine (Years) (Logarithm of) Price 6.5 7.5 8.5 (Logarithm of) Price 6.5 7.5 8.5 (Logarithm of) Price (Logarithm of) Price 6.5 7.5 8.5 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) 50 100 150 200 250 300 400 500 600 700 800 Harvest Rain (mm) Winter Rain (mm) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 4

The Expert s Reaction Robert Parker, the world's most influential wine expert: Ashenfelter is an absolute total sham rather like a movie critic who never goes to see the movie but tells you how good it is based on the actors and the director 15.071x The Statistical Sommelier: An Introduction to Linear Regression 5

One-Variable Linear Regression (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

The Regression Model One-variable regression model y i x i i 0 1 y i = 0 + 1 x i + i = dependent variable (wine price) for the i th observation = independent variable (temperature) for the i th observation = error term for the i th observation = intercept coefficient = regression coefficient for the independent variable The best model (choice of coefficients) has the smallest error terms 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2

Selecting the Best Model (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 3

Selecting the Best Model (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 SSE = 10.15 SSE = 6.03 SSE = 5.73 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 4

Other Error Measures SSE can be hard to interpret Depends on N Units are hard to understand Root-Mean-Square Error (RMSE) r SSE RMSE = N Normalized by N, units of dependent variable 15.071x The Statistical Sommelier: An Introduction to Linear Regression 5

R 2 (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) Compares the best model to a baseline model The baseline model does not use any variables Predicts same outcome (price) regardless of the independent variable (temperature) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 6

R 2 (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 SSE = 5.73 SST = 10.15 15.0 15.5 16.0 16.5 17.0 17.5 Avg Growing Season Temp (Celsius) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 7

Interpreting R 2 R 2 =1 SSE SST R 2 captures value added from using a model R 2 = 0 means no improvement over baseline R 2 = 1 means a perfect predictive model Unitless and universally interpretable Can still be hard to compare between problems Good models for easy problems will have R 2 1 Good models for hard problems can still have R 2 0 15.071x The Statistical Sommelier: An Introduction to Linear Regression 8

Available Independent Variables So far, we have only used the Average Growing Season Temperature to predict wine prices Many different independent variables could be used Average Growing Season Temperature Harvest Rain Winter Rain Age of Wine (in 1990) Population of France 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

Multiple Linear Regression Using each variable on its own: R 2 = 0.44 using Average Growing Season Temperature R 2 = 0.32 using Harvest Rain R 2 = 0.22 using France Population R 2 = 0.20 using Age R 2 = 0.02 using Winter Rain Multiple linear regression allows us to use all of these variables to improve our predictive ability 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2

The Regression Model Multiple linear regression model with k variables y i x i j i 0 j y i = 0 + 1 x i 1 + 2 x i 2 +...+ k x i k + i = dependent variable (wine price) for the i th observation = j th independent variable for the i th observation = error term for the i th observation = intercept coefficient = regression coefficient for the j th independent variable Best model coefficients selected to minimize SSE 15.071x The Statistical Sommelier: An Introduction to Linear Regression 3

Adding Variables Variables R 2 Average Growing Season Temperature (AGST) 0.44 AGST, Harvest Rain 0.71 AGST, Harvest Rain, Age 0.79 AGST, Harvest Rain, Age, Winter Rain 0.83 AGST, Harvest Rain, Age, Winter Rain, Population 0.83 Adding more variables can improve the model Diminishing returns as more variables are added 15.071x The Statistical Sommelier: An Introduction to Linear Regression 4

Selecting Variables Not all available variables should be used Each new variable requires more data Causes overfitting: high R 2 on data used to create model, but bad performance on unseen data We will see later how to appropriately choose variables to remove 15.071x The Statistical Sommelier: An Introduction to Linear Regression 5

Understanding the Model and Coefficients 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

Correlation A measure of the linear relationship between variables +1 = perfect positive linear relationship 0 = no linear relationship -1 = perfect negative linear relationship 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

Examples of Correlation (Logarithm of) Price 6.5 7.0 7.5 8.0 8.5 400 500 600 700 800 Winter Rain (mm) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2

Examples of Correlation Avg Growing Season Temp (Celsius) 15.0 15.5 16.0 16.5 17.0 17.5 50 100 150 200 250 300 Harvest Rain (mm) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 3

Examples of Correlation Population of France (thousands) 44000 48000 52000 5 10 15 20 25 30 Age of Wine (Years) 15.071x The Statistical Sommelier: An Introduction to Linear Regression 4

Predictive Ability Our wine model had a value of R 2 = 0.83 Tells us our accuracy on the data that we used to build the model But how well does the model perform on new data? Bordeaux wine buyers profit from being able to predict the quality of a wine years before it matures 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

Out-of-Sample R 2 Variables Better model R 2 does not necessarily mean better test set R 2 Need more data to be conclusive Out-of-sample R 2 can be negative! Model R 2 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2 Test R 2 AGST 0.44 0.79 AGST, Harvest Rain 0.71-0.08 AGST, Harvest Rain, Age 0.79 0.53 AGST, Harvest Rain, Age, Winter Rain 0.83 0.79 AGST, Harvest Rain, Age, Winter Rain, Population 0.83 0.76

The Results Parker: 1986 is very good to sometimes exceptional Ashenfelter: 1986 is mediocre 1989 will be the wine of the century and 1990 will be even better! In wine auctions, 1989 sold for more than twice the price of 1986 1990 sold for even higher prices! Later, Ashenfelter predicted 2000 and 2003 would be great Parker has stated that 2000 is the greatest vintage Bordeaux has ever produced 15.071x The Statistical Sommelier: An Introduction to Linear Regression 1

The Analytics Edge A linear regression model with only a few variables can predict wine prices well In many cases, outperforms wine experts opinions A quantitative approach to a traditionally qualitative problem 15.071x The Statistical Sommelier: An Introduction to Linear Regression 2