wine 1 wine 2 wine 3 person person person person person

Similar documents
Statistics 5303 Final Exam December 20, 2010 Gary W. Oehlert NAME ID#

PSYC 6140 November 16, 2005 ANOVA output in R

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Comparing R print-outs from LM, GLM, LMM and GLMM

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Missing Data Treatments

Appendix A. Table A.1: Logit Estimates for Elasticities

HW 5 SOLUTIONS Inference for Two Population Means

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS. Subject CS1B Actuarial Statistics

Predicting Wine Quality

Homework 1 - Solutions. Problem 2

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

Table 1: Number of patients by ICU hospital level and geographical locality.

Final Exam Financial Data Analysis (6 Credit points/imp Students) March 2, 2006

Level 2 Mathematics and Statistics, 2016

Lesson 23: Newton s Law of Cooling

STAT 5302 Applied Regression Analysis. Hawkins

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Table A.1: Use of funds by frequency of ROSCA meetings in 9 research sites (Note multiple answers are allowed per respondent)

Which of your fingernails comes closest to 1 cm in width? What is the length between your thumb tip and extended index finger tip? If no, why not?

Biologist at Work! Experiment: Width across knuckles of: left hand. cm... right hand. cm. Analysis: Decision: /13 cm. Name

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Imputation of multivariate continuous data with non-ignorable missingness

You know what you like, but what about everyone else? A Case study on Incomplete Block Segmentation of white-bread consumers.

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

A Note on a Test for the Sum of Ranksums*

Thermal Properties and Temperature

Multiple Imputation for Missing Data in KLoSA

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Buying Filberts On a Sample Basis

WINE GRAPE TRIAL REPORT

Influence of Cultivar and Planting Date on Strawberry Growth and Development in the Low Desert

Relation between Grape Wine Quality and Related Physicochemical Indexes

Review for Lab 1 Artificial Selection

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Statistics: Final Project Report Chipotle Water Cup: Water or Soda?

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

5 Populations Estimating Animal Populations by Using the Mark-Recapture Method

THE EFFECT OF DIFFERENT APPLICATIONS ON FRUIT YIELD CHARACTERISTICS OF STRAWBERRIES CULTIVATED UNDER VAN ECOLOGICAL CONDITION ABSTRACT

Chapter 1: The Ricardo Model

Handling Missing Data. Ashley Parker EDU 7312

The R&D-patent relationship: An industry perspective

Flexible Working Arrangements, Collaboration, ICT and Innovation

Investment Wines. - Risk Analysis. Prepared by: Michael Shortell & Adiam Woldetensae Date: 06/09/2015

Detecting Melamine Adulteration in Milk Powder

2 nd Midterm Exam-Solution

1ACE Exercise 2. Name Date Class

Faculty of Science FINAL EXAMINATION MATH-523B Generalized Linear Models

1.3 Box & Whisker Plots

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

Problem Set #3 Key. Forecasting

Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

CAUTION!!! Do not eat anything (Skittles, cylinders, dishes, etc.) associated with the lab!!!

Growth in early yyears: statistical and clinical insights

A.P. Environmental Science. Partners. Mark and Recapture Lab addi. Estimating Population Size

The Development of a Weather-based Crop Disaster Program

ENGI E1006 Percolation Handout

MBA 503 Final Project Guidelines and Rubric

Plant Population Effects on the Performance of Natto Soybean Varieties 2008 Hans Kandel, Greg Endres, Blaine Schatz, Burton Johnson, and DK Lee

Introduction to the Practical Exam Stage 1

Alcoholic Fermentation in Yeast A Bioengineering Design Challenge 1

February 26, The results below are generated from an R script.

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

Materials and Methods

Thought: The Great Coffee Experiment

A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation

Experiment 7: The Clock Reaction

Name. AGRONOMY 375 EXAM III May 4, points possible

Unit 4P.2: Heat and Temperature

WALNUT HEDGEROW PRUNING AND TRAINING TRIAL 2010

Effect of Inocucor on strawberry plants growth and production

Lack of Credibility, Inflation Persistence and Disinflation in Colombia

What does radical price change and choice reveal?

Can You Tell the Difference? A Study on the Preference of Bottled Water. [Anonymous Name 1], [Anonymous Name 2]

The Wild Bean Population: Estimating Population Size Using the Mark and Recapture Method

Appendix Table A1 Number of years since deregulation

ONLINE APPENDIX APPENDIX A. DESCRIPTION OF U.S. NON-FARM PRIVATE SECTORS AND INDUSTRIES

Summary of Main Points

Oenometrics VII Conference Reims, May 11-13, Predicting Italian wines quality from weather data and experts ratings (DRAFT)

From VOC to IPA: This Beer s For You!

Flexible Imputation of Missing Data

Experimental Procedure

Effect of paraquat and diquat applied preharvest on canola yield and seed quality

depend,: upon the temperature, the strain of

Effects of Preharvest Sprays of Maleic Hydrazide on Sugar Beets

Introduction to Measurement and Error Analysis: Measuring the Density of a Solution

Introduction to the Practical Exam Stage 1. Presented by Amy Christine MW, DC Flynt MW, Adam Lapierre MW, Peter Marks MW

Bags not: avoiding the undesirable Laurie and Winifred Bauer


Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

1. Identify environmental conditions (temperature) and nutritional factors (i.e. sugar and fat) that encourages the growth of bacteria.

Transcription:

1. A trendy wine bar set up an experiment to evaluate the quality of 3 different wines. Five fine connoisseurs of wine were asked to taste each of the wine and give it a rating between 0 and 10. The order of tasting was randomized and the judges did not know which wine they were drinking. The following table displays the collected data: We use the following model: wine 1 wine 2 wine 3 person 1 1 7 5 person 2 0 4 0 person 3 1 6 4 person 4 1 5 2 person 5 1 8 10 Y ij = µ + α i + β j + ɛ ij, where Y ij are the ratings and α i, β j the (fixed) effects of wine type and person, respectively. We use the sum-to-zero constraint 3 α i = i=1 5 β j = 0 j=1 and the standard assumptions for the errors ɛ ij. The following R-output is available: > options(contrasts = c("contr.sum", "contr.sum")) > fit <- aov(y ~ wine + person, data = wine_tasting) > summary(fit) wine 2 69.7 34.9 10.90 0.0052 ** person 4 42.0 10.5 3.28 0.0717. Residuals 8 25.6 3.2 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 > coef(fit) (Intercept) wine1 wine2 person1 person2 3.7e+00-2.9e+00 2.3e+00 6.7e-01-2.3e+00 person3 person4-5.6e-17-1.0e+00 > dummy.coef(fit) Full coefficients are (Intercept): 3.7 wine: wine 1 wine 2 wine 3-2.87 2.33 0.53 person: 1 2 3 4 6.7e-01-2.3e+00-5.6e-17-1.0e+00 (Intercept): wine: person: 5 2.7e+00

a) What design do we have here? What is the role of the different factors? b) Does wine type have an effect on rating? Use the global test. State the null hypothesis with respect to the corresponding parameters, the p-value and the test result. c) What is the estimated rating difference between wine 1 and wine 2? d) Should we really include the effect of the person (β j ) in the model or not? Motivate your answer. e) Your colleague wants to run the following code in R > fit2 <- aov(y ~ wine * person, data = wine_tasting) > summary(fit2) What model is he using? Will he be able to perform statistical tests? f) If we assume that raters were randomly selected, we could model them as random effects. Determine a 95% confidence interval for the standard deviation of this random effect using the outputs below. > library(lmertest) > fit3 <- lmer(y ~ wine + (1 person), data = wine_tasting) > fit4 <- lmer(y ~ person + (1 wine), data = wine_tasting) > confint(fit3, oldnames = FALSE) 2.5 % 97.5 % sd_(intercept) person 0.0 3.5 sigma 1.1 2.7 (Intercept) 1.9 5.5 wine1-4.1-1.6 wine2 1.7 3.2 > confint(fit4, oldnames = FALSE) 2.5 % 97.5 % sd_(intercept) wine 0.91 6.33 sigma 1.03 2.32 (Intercept) 0.19 7.14 person1-0.94 2.27 person2-3.94-0.73 person3-1.60 1.60 person4-2.60 0.60 > rand(fit3) Analysis of Random effects Table: Chi.sq Chi.DF p.value person 2.03 1 0.2 > rand(fit4) Analysis of Random effects Table: Chi.sq Chi.DF p.value wine 6.14 1 0.01 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 2

2. A pharmaceutical company wants to study the effect of alcohol consumption in conjunction with two types of drugs, A and B. They consider the following treatments: 1. Control 2. Drug A alone 3. Drug A and alcohol consumption 1 hour before 4. Drug A and alcohol consumption 1 hour after 5. Drug B alone 6. Drug B and alcohol consumption 1 hour before 7. Drug B and alcohol consumption 1 hour after A completely randomized design was used. Every individual in the experiment was given to test one treatment only and the effect of the drug was measured on some scale and recorded as variable Y. a) We want to ask precise questions about the data and use contrasts to do so. Propose contrasts to test the following questions: L1: The difference between the drug A and the drug B L2: The effect of alcohol on drug A L3: The difference between taking alcohol before and after b) Are the previous 3 contrasts orthogonal to each other? Justify your answer. c) What do the two following contrasts test? Explain with words. L4: (6, 1, 1, 1, 1, 1, 1) L5: (0, 0, +1, 1, 0, 1, +1) d) We fit the following model in R > fit <- aov(y ~ treatment, data = drug) > summary(fit) treatment 6 4738965 789828 238.5 <2e-16 *** Residuals 833 2758944 3312 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 How many people took part in this study? e) What procedure would you suggest if someone asks you to perform all possible pairwise comparisons between the different treatments? f) Say you want to test all the 5 contrasts from above. Is it necessary to adjust the corresponding p-values and if yes, how? 3

3. A pizzeria wants to optimize its least sold Pizza Margherita to guarantee a maximum taste experience. To find the best combination of baking temperature and baking time, they perform an experiment with two factors (temp = 180 C, 210 C and 240 C, time = 10 min and 15 min). For every combination, 6 pizzas get baken and judged on a scale of 1 to 100 where 100 corresponds to maximum taste experience. As analysis a two-way anova is performed with the following output: temp 2 268 134.0 4.25 0.023 time 1 2 2.2 0.07 0.791 Residuals 32 1009 31.5 a) From the R-Output above, what is the estimated error variance? b) What are the standard assumptions about the errors? Are these assumptions fullfilled? Motivate your answer with the help of the following residual plots. Residuals vs Fitted Normal Q Q Residuals 10 5 0 5 10 13 32 Standardized residuals 25 1 0 1 2 25 32 13 40 41 42 43 44 45 46 47 2 1 0 1 2 Fitted values Theoretical Quantiles c) Have a look at the following plot. What does the plot show? Explain why the model above can lead to wrong conclusion? Mean Quality 38 40 42 44 46 48 50 52 Baking Time 15 10 180 210 240 Baking Temperature 4

d) Have a look at the following output. Complete the missing parts in the first 3 rows. temp 2 8.13 0.0015 time 1 2 2.25 0.7145 temp:time 514 15.60 2.3e-05 Residuals 30 495 16.49 e) From the output above, which is the final model you would choose and why? Write down your selected model in the form Y ij = µ + α i +. f) Assume you want to repeat the analysis after a year. However, some data got lost and the number of judgements for each pizza is therefore not the same anymore. Does the loss of data have an influence on the calculation of the ANOVA table? What R-function would you use? Motivate your answers. 5

4. Misc 1) You are given the following experimental design with two block factors X and Y each having 4 levels and a treatment factor A with 4 levels: A 1, A 2, A 3, A 4. Which type of design is this? a) Split-plot design b) 2 3 design c) Latin square design d) Balanced incomplete block design e) Completely randomized design f) None of the previous designs Blocks X 1 2 3 4 Blocks 1 A 3 A 2 A 4 A 1 Y 2 A 1 A 3 A 2 A 4 3 A 4 A 1 A 3 A 2 4 A 2 A 4 A 1 A 3 2) A toothpaste company is testing 10 new toothpaste types. 15 participants (=blocks) have been selected. You wish to run BIBD with block size 3 such that each toothpaste type is being tested a total of 6 times. Is this possbible? a) Yes b) No c) Not enough information to make a statement 3) The toothpaste factory has selected 4 out of 10 types from the previous test. They are now considering these 4 types of toothpaste and 3 types of packaging. 60 participants have been selected for the experiment, and each participant is supposed to test and rate every packaging of exactly one toothpaste type on a 1-5 scale. Which type of design is this? a) Split-plot design with participants as whole-plots b) Split-plot design with toothpaste as whole-plots c) Split-plot design with packagings as whole-plots d) None of the previous designs 4) Consider a (balanced) one-way ANOVA model. What happens to the 95%- quantile of the F -distribution of the global test if we increase the number of observations (but keep the design fixed otherwise)? a) The quantile gets larger b) The quantile gets smaller c) The quantile stays the same d) No statement possible 5) We have an unbalanced data-set with two factors A and B and fit the model aov(y A * B) in R. a) Type II and type III sum of squares coincide for factor B. b) Type I and type II sum of squares coincide for factor A. 6

c) The sum of squares of the interaction are the same for all types (I - III). 6) The following table contains mean values of an experiment with two crossed factors A and B Level B 1 B 2 B 3 A 1 4 8 6 A 2 10 12 What value would you put in the missing cell if you assume an additive model (that is, no interaction between A and B)? a) 11 b) 13 c) 14 d) 16 7