Comparing R print-outs from LM, GLM, LMM and GLMM

Similar documents
Poisson GLM, Cox PH, & degrees of freedom

> Y=degre=="deces" > table(y) Y FALSE TRUE

INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS. Subject CS1B Actuarial Statistics

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Summary of Main Points

Faculty of Science FINAL EXAMINATION MATH-523B Generalized Linear Models

PSYC 6140 November 16, 2005 ANOVA output in R

wine 1 wine 2 wine 3 person person person person person

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Final Exam Financial Data Analysis (6 Credit points/imp Students) March 2, 2006

Missing Data Treatments

Multiple Imputation for Missing Data in KLoSA

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

STAT 5302 Applied Regression Analysis. Hawkins

Model Log-Linear (Bagian 2) Dr. Kusman Sadik, M.Si Program Studi Pascasarjana Departemen Statistika IPB, 2018/2019

> library(sem) > cor.mat<-read.moments(names=c("ten1", "ten2", "ten3", "wor1", "wor2", + "wor3", "irthk1", "irthk2", "irthk3", "body1", "body2",

Homework 1 - Solutions. Problem 2

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Figure S2. Measurement locations for meteorological stations. (data made available by KMI:

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Missing Data: Part 2 Implementing Multiple Imputation in STATA and SPSS. Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 4/24/13

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

R Analysis Example Replication C10

Flexible Imputation of Missing Data

Appendix A. Table A.1: Logit Estimates for Elasticities

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Table 1: Number of patients by ICU hospital level and geographical locality.

Online Appendix to The Effect of Liquidity on Governance

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

Growth in early yyears: statistical and clinical insights

From VOC to IPA: This Beer s For You!

February 26, The results below are generated from an R script.

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Predicting Wine Quality

Imputation of multivariate continuous data with non-ignorable missingness

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

IT 403 Project Beer Advocate Analysis

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Relation between Grape Wine Quality and Related Physicochemical Indexes

Appendix Table A1 Number of years since deregulation

Cointegration Analysis of Commodity Prices: Much Ado about the Wrong Thing? Mindy L. Mallory and Sergio H. Lence September 17, 2010

Flexible Working Arrangements, Collaboration, ICT and Innovation

J. Best 1 A. Tepley 2

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

Protest Campaigns and Movement Success: Desegregating the U.S. South in the Early 1960s

Comparative Analysis of Dispersion Parameter Estimates in Loglinear Modeling

Valuation in the Life Settlements Market

Appendix A. Table A1: Marginal effects and elasticities on the export probability

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

Handling Missing Data. Ashley Parker EDU 7312

Update to A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

Preferred citation style

Analysis of Things (AoT)

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

Which of your fingernails comes closest to 1 cm in width? What is the length between your thumb tip and extended index finger tip? If no, why not?

Imputation Procedures for Missing Data in Clinical Research

Biologist at Work! Experiment: Width across knuckles of: left hand. cm... right hand. cm. Analysis: Decision: /13 cm. Name

Detecting Melamine Adulteration in Milk Powder

November K. J. Martijn Cremers Lubomir P. Litov Simone M. Sepe

Statistics: Final Project Report Chipotle Water Cup: Water or Soda?

Online Appendix for. To Buy or Not to Buy: Consumer Constraints in the Housing Market

Problem Set #3 Key. Forecasting

Bags not: avoiding the undesirable Laurie and Winifred Bauer

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Statistics 5303 Final Exam December 20, 2010 Gary W. Oehlert NAME ID#

Wine Rating Prediction

Lesson 23: Newton s Law of Cooling

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

Northern Region Central Region Southern Region No. % of total No. % of total No. % of total Schools Da bomb

Rheological and physicochemical studies on emulsions formulated with chitosan previously dispersed in aqueous solutions of lactic acid

The Development of a Weather-based Crop Disaster Program

On-line Appendix for the paper: Sticky Wages. Evidence from Quarterly Microeconomic Data. Appendix A. Weights used to compute aggregate indicators

The SAS System 09:38 Wednesday, December 2, The CANDISC Procedure

2 nd Midterm Exam-Solution

AST Live November 2016 Roasting Module. Presenter: John Thompson Coffee Nexus Ltd, Scotland

1.3 Box & Whisker Plots

Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

Catherine A. Durham, Oregon State University Iain Pardoe, University of Oregon Esteban Vega, Oregon State University. August 27,

Not to be published - available as an online Appendix only! 1.1 Discussion of Effects of Control Variables

COMPARATIVE JUDGMENTS UNDER UNCERTAINTY 1. Supplemental Materials. Under Uncertainty. Oliver Schweickart and Norman R. Brown. University of Alberta

PEEL RIVER HEALTH ASSESSMENT

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

5 Populations Estimating Animal Populations by Using the Mark-Recapture Method

Introduction to Management Science Midterm Exam October 29, 2002

Supplementary Table 1. Glycemic load (GL) and glycemic index (GI) of individual fruits. Carbohydrate (g/serving)

Method for the imputation of the earnings variable in the Belgian LFS

The Effects of Presidential Politics on CEO Compensation

Climate change may alter human physical activity patterns

The International Food & Agribusiness Management Association. Budapest, Hungary. June 20-21, 2009

ONLINE APPENDIX APPENDIX A. DESCRIPTION OF U.S. NON-FARM PRIVATE SECTORS AND INDUSTRIES

THE STATISTICAL SOMMELIER

Caffeine And Reaction Rates

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

A Hedonic Analysis of Retail Italian Vinegars. Summary. The Model. Vinegar. Methodology. Survey. Results. Concluding remarks.

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

Transcription:

3. Inference: interpretation of results, plotting results, confidence intervals, hypothesis tests (Wald,LRT). 4. Asymptotic distribution of maximum likelihood estimators and tests. 5. Checking the adequacy of the model (deviance, AIC), choose between models (nested=lrt or AIC, not nested=aic), how well it fits the data (residuals, qqplots - but very little focus in our course). ü: writing this out in more detail in class. Comparing R print-outs from LM, GLM, LMM and GLMM Below we have fit a model to a data set, and then printed the summary of the model. For each of the print-outs you need to know (be able to identify and explain) every entry. In particular identify and explain: which model: model requirements how is the model fitted (versions of maximum likelihood) parameter estimates for inference about the : how to find CI and test hypotheses (which hypothesis is reported test statistic, and possibly p-value for) model fit (deviance, AIC, R-squared, F) In addition, further inference can be made using anova(fit1,fit2), confint, residuals, fitted, AIC and other functions. MLR - multiple linear regression library(gamlss.data) fitlm=lm(rent~area+location+bath+kitchen+cheating,data=rent99) summary(fitlm) fitglm=glm(rent~area+location+bath+kitchen+cheating,data=rent99) summary(fitglm) Call: lm(formula = rent ~ area + location + bath + kitchen + cheating, data = rent99) Residuals: Min 1Q Median 3Q Max -633.41-89.17-6.26 82.96 1000.76 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -21.9733 11.6549-1.885 0.0595. area 4.5788 0.1143 40.055 < 2e-16 *** location2 39.2602 5.4471 7.208 7.14e-13 *** location3 126.0575 16.8747 7.470 1.04e-13 *** bath1 74.0538 11.2087 6.607 4.61e-11 *** kitchen1 120.4349 13.0192 9.251 < 2e-16 *** cheating1 161.4138 8.6632 18.632 < 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 145.2 on 3075 degrees of freedom 3

Multiple R-squared: 0.4504, Adjusted R-squared: 0.4494 F-statistic: 420 on 6 and 3075 DF, p-value: < 2.2e-16 Call: glm(formula = rent ~ area + location + bath + kitchen + cheating, data = rent99) Deviance Residuals: Min 1Q Median 3Q Max -633.41-89.17-6.26 82.96 1000.76 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -21.9733 11.6549-1.885 0.0595. area 4.5788 0.1143 40.055 < 2e-16 *** location2 39.2602 5.4471 7.208 7.14e-13 *** location3 126.0575 16.8747 7.470 1.04e-13 *** bath1 74.0538 11.2087 6.607 4.61e-11 *** kitchen1 120.4349 13.0192 9.251 < 2e-16 *** cheating1 161.4138 8.6632 18.632 < 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for gaussian family taken to be 21079.53) Null deviance: 117945363 on 3081 degrees of freedom Residual deviance: 64819547 on 3075 degrees of freedom AIC: 39440 Number of Fisher Scoring iterations: 2 GLM - Binomial regresion with logit-link library(investr) fitgrouped=glm(cbind(y, n-y) ~ ldose, family = "binomial", data = investr::beetle) summary(fitgrouped) Call: glm(formula = cbind(y, n - y) ~ ldose, family = "binomial", data = investr::beetle) Deviance Residuals: Min 1Q Median 3Q Max -1.5941-0.3944 0.8329 1.2592 1.5940 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -60.717 5.181-11.72 <2e-16 *** ldose 34.270 2.912 11.77 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 4

(Dispersion parameter for binomial family taken to be 1) Null deviance: 284.202 on 7 degrees of freedom Residual deviance: 11.232 on 6 degrees of freedom AIC: 41.43 Number of Fisher Scoring iterations: 4 GLM - Poisson regression with log-link crab=read.table("https://www.math.ntnu.no/emner/tma4315/2017h/crab.txt") colnames(crab)=c("obs","c","s","w","wt","sa") crab=crab[,-1] #remove column with Obs crab$c=as.factor(crab$c) model3=glm(sa~w+c,family=poisson(link=log),data=crab,contrasts=list(c="contr.sum")) summary(model3) Call: glm(formula = Sa ~ W + C, family = poisson(link = log), data = crab, contrasts = list(c = "contr.sum")) Deviance Residuals: Min 1Q Median 3Q Max -3.0415-1.9581-0.5575 0.9830 4.7523 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -2.92089 0.56010-5.215 1.84e-07 *** W 0.14934 0.02084 7.166 7.73e-13 *** C1 0.27085 0.11784 2.298 0.0215 * C2 0.07117 0.07296 0.975 0.3294 C3-0.16551 0.09316-1.777 0.0756. --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 632.79 on 172 degrees of freedom Residual deviance: 559.34 on 168 degrees of freedom AIC: 924.64 Number of Fisher Scoring iterations: 6 LMM - random intercept and slope library(lme4) Warning: package lme4 was built under R version 3.4.2 Loading required package: Matrix 5

fm1 <- lmer(reaction ~ Days + (Days Subject), sleepstudy) summary(fm1) Linear mixed model fit by REML [ lmermod ] Formula: Reaction ~ Days + (Days Subject) Data: sleepstudy REML criterion at convergence: 1743.6 Scaled residuals: Min 1Q Median 3Q Max -3.9536-0.4634 0.0231 0.4634 5.1793 Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) 612.09 24.740 Days 35.07 5.922 0.07 Residual 654.94 25.592 Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value (Intercept) 251.405 6.825 36.84 Days 10.467 1.546 6.77 Correlation of Fixed Effects: (Intr) Days -0.138 GLMM - random intercept Poisson library("aed") data(rikz) library(lme4) fitri=glmer(richness~nap +(1 Beach),data=RIKZ,family=poisson(link=log)) summary(fitri) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmermod] Family: poisson ( log ) Formula: Richness ~ NAP + (1 Beach) Data: RIKZ AIC BIC loglik deviance df.resid 220.8 226.2-107.4 214.8 42 Scaled residuals: Min 1Q Median 3Q Max -1.9648-0.6155-0.2243 0.2236 3.1869 Random effects: Groups Name Variance Std.Dev. Beach (Intercept) 0.2249 0.4743 6

Number of obs: 45, groups: Beach, 9 Fixed effects: Estimate Std. Error z value Pr(> z ) (Intercept) 1.66233 0.17373 9.569 < 2e-16 *** NAP -0.50389 0.07535-6.687 2.28e-11 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Correlation of Fixed Effects: (Intr) NAP 0.013 Exam and exam preparation We take look at the information posted at Blackboard Exam at Blackboard and the relevant exams are found on the bottom of each module page. Dates for supervision are also found at the exam page on Bb. After TMA4315 - what is next? For the 4th year student TMA4250 Spatial statistics TMA4268 Statistical learning TMA4275 Survival analysis TMA4300 Computational statistics KLMED8005 Analysis of repeated measurements SMED8002 Epidemiology 2 TDT4300 Datavarehus og datagruvedrift TDT4173 Maskinlæring og case-based reasoning (Big overlap with TMA4268) NEVR3004 Nevrale nettverk For the 5th year student Computational statistics 2 Phd course Course evaluation in TMA4315 Please answer the course evaluation (anonymous): https://kvass.svt.ntnu.no/takesurvey.aspx?surveyid= tma4315h2017 7