PSYC 6140 November 16, 2005 ANOVA output in R

PSYC 6140 November 16, 2005 ANOVA output in R Type I, Type II and Type III Sums of Squares are displayed in ANOVA tables in a mumber of packages. The car library in R makes these available in R. This handout discusses how to use and interpret anova output for regression models with interaction. The following script shows output for a model with interaction using all three types of sum of squares. In summary: Note: 1. Type I and Type II SS compare models that obey the principle of marginality and, thus, these tables remain the same if the categorical variables is recoded or if the continuous variable is subject to an affine transformation. 2. Type III SS depend on the coding the library(car) source("http://www.math.yorku.ca/~georges/r/coursefun.r") coursefun.r Last update: Oct. 27, 2005 An R script containing functions and some datasets for the courses PSYC6140 and MATH6630 in 2005-06. Help on the following functions is available by typing the name of the function. This text is available by typing coursefun A current version of this file can be sourced or downloaded from http://www.math.yorku.ca/~georges/r/coursefun.r A copy is kept at: http://wiki.math.yorku.ca/r:_coursefun.r Please make corrections, changes and additions to the version on the wiki. They will be periodically transferred to the downloadable version. Functions: Tables: atotal: border an array with sums

ANOVA output in R 2 abind : glue two arrays together on a selected dimension Graphics: td : easy front end to trellis.device and related functions xqplot: extended quantile plots 3D graphics by John Fox: scatter3d identify3d ellipsoid Inference cell - a modified version of car::confidence.ellipse.lm that can add to a plot Graphics for linear algebra vplot - plots column vectors adding to current plot vell - ellipse as a 2 x n matrix vbox - box around unit circle orthog - 2 x 2 rotation orthog.prog 2 x 2 matrix of orthog projection To add functions, modify http://wiki.math.yorku.ca/r:_coursefun.r data( Prestige ) scatterplot.matrix(prestige) ## Let's just use those with non.missing type dd <- na.omit(prestige) dim(dd) [1] 98 6 dim(prestige) [1] 102 6 attach(dd)

ANOVA output in R 3 The following object(s) are masked from package:datasets : women table(type) type bc prof wc 44 31 23 income.log <- log(income + 8000) scatterplot( income.log, prestige, groups = type)

ANOVA output in R 4 prestige 20 30 40 50 60 70 80 bc prof wc 9.2 9.4 9.6 9.8 10.0 10.2 10.4 income.log

ANOVA output in R 5 Model: E( Y ) = R Syntax P. of Marg. Comments 1 α + β X + γ 1D1+ γ 2D2 + δ1dx 1 + δ2dx Y ~ X*G 2 Yes Full model 2N α + γ 1D1+ γ 2D2 + δ1dx 1 + δ2dx Y ~ G + X:G 2 No Y ~ X*G X 3N α + β X + δ1dx 1 + δ2dx Y ~ X + X:G 2 No Y ~ X*G G 4N α + δ1dx 1 + δ2dx Y ~ X:G 2 No 5 α β X + γ 1D1 γ 2D2 + + Y ~ X + G Yes Additive model 6 α + γ 1D1+ γ 2D2 Y ~ G Yes 7 α + β X Y ~ X Yes 8 α Y ~ 1 Yes Intercept only model fit.add <- lm( prestige ~ income.log + type ) Additive model fit.int <- lm( prestige ~ income.log * type ) Full interaction model summary(fit.int) Call: lm(formula = prestige ~ income.log * type) Residuals: Min 1Q Median 3Q Max -12.5078-5.3855 0.2769 3.7780 24.7486 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) -469.714 69.833-6.726 1.45e-09 *** income.log 53.239 7.358 7.236 1.35e-10 *** typeprof 364.246 85.907 4.240 5.31e-05 *** typewc 272.334 122.399 2.225 0.028528 * income.log:typeprof -35.542 8.956-3.968 0.000143 *** income.log:typewc -27.926 12.918-2.162 0.033227 * Exercise: Draw a sketch of the fitted model and label it with each of the estimated values in this table

ANOVA output in R 6 Residual standard error: 7.273 on 92 degrees of freedom Multiple R-Squared: 0.8283, Adjusted R-squared: 0.819 F-statistic: 88.79 on 5 and 92 DF, p-value: < 2.2e-16 anova(fit.int) ANOVA for full model using anova Analysis of Variance Table Type I SS (sequential) Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 298.2377 < 2.2e-16 *** (8) (7) / (1) M:ok type 2 6866.8 3433.4 64.9117 < 2.2e-16 *** (7) (5) / (1) M:ok income.log:type 2 839.3 419.6 7.9336 0.0006627 *** Residuals 92 4866.2 52.9 (5) (1) / (1) M:ok EXERCISE: What would you get if you specified the model as prestige ~ income.log * type anova( fit.add, fit.int) Analysis of Variance Table Model 1: prestige ~ income.log + type Model 2: prestige ~ income.log * type Res.Df RSS Df Sum of Sq F Pr(F) 1 94 5705.4 2 92 4866.2 2 839.3 7.9336 0.0006627 *** (5) (1) / (1) ## ## Interpreting coefficients ## contrasts(type) # bc is reference level prof wc bc 0 0 bc is reference level prof 1 0 wc 0 1

ANOVA output in R 7 fit.int$contrasts $type [1] "contr.treatment" anova( fit.int) Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 298.2377 < 2.2e-16 *** type 2 6866.8 3433.4 64.9117 < 2.2e-16 *** income.log:type 2 839.3 419.6 7.9336 0.0006627 *** Residuals 92 4866.2 52.9 Anova( fit.int, type = "II") ANOVA using Anova Type II Anova Table (Type II tests) in library(car) NOTE: All satisfy PoM Sum Sq Df F value Pr(F) income.log 2865.9 1 54.1822 7.516e-11 *** (6) (5) / (1) [Different] type 6866.8 2 64.9117 < 2.2e-16 *** (7) (5) / (1) [Same as Type I] income.log:type 839.3 2 7.9336 0.0006627 *** (5) (1) / (1) [Same as Type I] Residuals 4866.2 92 Anova( fit.int, type = "III") ANOVA using Anova Type III Anova Table (Type III tests) in library(car) Sum Sq Df F value Pr(F) (Intercept) 2393.0 1 45.2422 1.448e-09 *** income.log 2769.4 1 52.3578 1.353e-10 *** (2N) (1) / (1) [Depends on coding of G] type 955.4 2 9.0315 0.0002623 *** (3N) (1) / (1) [Depends on 0 of X] income.log:type 839.3 2 7.9336 0.0006627 *** (5) (1) / (1) [Same as I and II] Residuals 4866.2 92

ANOVA output in R 8 fit.int.s <- lm( prestige ~ income.log * type, contrasts = list( type = contr.sum ) ) Sum to 0 coding fit.int.s$contrasts $type [,1] [,2] bc 1 0 Note: No type is reference level. prof 0 1 Reference level is at the mean wc -1-1 of the 3 types. summary(fit.int.s) Call: lm(formula = prestige ~ income.log * type, contrasts = list(type = contr.sum)) Residuals: Min 1Q Median 3Q Max -12.5078-5.3855 0.2769 3.7780 24.7486 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) -257.521 44.077-5.843 7.70e-08 *** income.log 32.083 4.630 6.929 5.67e-10 *** type1-212.193 59.735-3.552 0.000605 *** type2 152.053 52.699 2.885 0.004871 ** income.log:type1 21.156 6.284 3.367 0.001111 ** income.log:type2-14.386 5.489-2.621 0.010265 * EXERCISE: Sketch the fitted model and label with values of fitted coefficients Residual standard error: 7.273 on 92 degrees of freedom Multiple R-Squared: 0.8283, Adjusted R-squared: 0.819 F-statistic: 88.79 on 5 and 92 DF, p-value: < 2.2e-16

ANOVA output in R 9 anova( fit.int.s ) Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 298.2377 < 2.2e-16 *** type 2 6866.8 3433.4 64.9117 < 2.2e-16 *** income.log:type 2 839.3 419.6 7.9336 0.0006627 *** Residuals 92 4866.2 52.9 anova( fit.add, fit.int.s ) Analysis of Variance Table EXERCISE: Fill in models Model 1: prestige ~ income.log + type Model 2: prestige ~ income.log * type Res.Df RSS Df Sum of Sq F Pr(F) 1 94 5705.4 EXERCISE: Fill in models 2 92 4866.2 2 839.3 7.9336 0.0006627 *** EXERCISE: What happens if we use fit.add.s instead? anova( fit.int.s) Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 298.2377 < 2.2e-16 *** type 2 6866.8 3433.4 64.9117 < 2.2e-16 *** income.log:type 2 839.3 419.6 7.9336 0.0006627 *** Residuals 92 4866.2 52.9

ANOVA output in R 10 Anova Type II SS Anova( fit.int.s, type = "II") EXERCISE: Fill in models? Anova Table (Type II tests) Do you get same output as before? Sum Sq Df F value Pr(F) income.log 2865.9 1 54.1822 7.516e-11 *** type 6866.8 2 64.9117 < 2.2e-16 *** income.log:type 839.3 2 7.9336 0.0006627 *** Residuals 4866.2 92 Anova Type III SS Anova( fit.int.s, type = "III") EXERCISE: Draw a sketch showing Anova Table (Type III tests) what is tested in this table. Sum Sq Df F value Pr(F) (Intercept) 1805.5 1 34.1355 7.704e-08 *** income.log 2539.5 1 48.0114 5.670e-10 *** type 955.4 2 9.0315 0.0002623 *** income.log:type 839.3 2 7.9336 0.0006627 *** Residuals 4866.2 92 Note: Type III SS using sum to 0 are quite popular in Psychology and Sociology. They are readily produced in SAS output. Two cautions: 1. Is the hypothesis meaningful in the presence of interaction? i.e. is the hypothesis that the average slope, averaging over levels of the factor, equals 0 of real interest? 2. If the number of observations in each category of the factor is quite variable, then the Type III SS will have low power in comparison with a hypothesis that uses a weighted average of slopes -- as the Type II SS.

ANOVA output in R 11 ## ## 3 way interaction ## summary( lm( prestige ~ income.log * type * women )) 3 way interaction EXERCISE: Draw 2 sketches, one for Call: women = 0 and another for lm(formula = prestige ~ income.log * type * women) women = 100. Label sketches appropriately Residuals: Min 1Q Median 3Q Max -12.46374-4.43942 0.06338 3.65335 14.62401 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) -614.9801 89.8299-6.846 1.07e-09 *** income.log 68.2368 9.4060 7.255 1.66e-10 *** typeprof 505.6793 108.2235 4.673 1.09e-05 *** typewc 0.3159 244.6589 0.001 0.999 women -3.1066 4.6481-0.668 0.506 income.log:typeprof -50.1349 11.2250-4.466 2.41e-05 *** income.log:typewc -0.5021 25.4992-0.020 0.984 income.log:women 0.3486 0.4980 0.700 0.486 typeprof:women 3.6411 5.7679 0.631 0.530 typewc:women -1.0002 6.0410-0.166 0.869 income.log:typeprof:women -0.4047 0.6135-0.660 0.511 income.log:typewc:women 0.1202 0.6446 0.187 0.852 Residual standard error: 6.604 on 86 degrees of freedom Multiple R-Squared: 0.8677, Adjusted R-squared: 0.8507 F-statistic: 51.26 on 11 and 86 DF, p-value: < 2.2e-16 fit3.s <- lm( prestige ~ income.log * type * women, contrasts = list( type = contr.sum)) Sum to 0 coding

ANOVA output in R 12 ANOVA tables with treatment coding and with sum to 0 coding anova(fit3) Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 361.6649 < 2.2e-16 *** type 2 6866.8 3433.4 78.7166 < 2.2e-16 *** women 1 265.5 265.5 6.0861 0.015608 * income.log:type 2 1105.4 552.7 12.6716 1.503e-05 *** income.log:women 1 5.5 5.5 0.1268 0.722623 type:women 2 533.1 266.6 6.1112 0.003299 ** income.log:type:women 2 44.9 22.4 0.5145 0.599639 Residuals 86 3751.1 43.6 Anova(fit3, type = "II") Anova Table (Type II tests) Sum Sq Df F value Pr(F) income.log 2670.5 1 61.2252 1.195e-11 *** type 3213.6 2 36.8384 2.780e-12 *** women 531.6 1 12.1877 0.0007616 *** income.log:type 1317.9 2 15.1081 2.382e-06 *** income.log:women 34.8 1 0.7985 0.3740302 type:women 533.1 2 6.1112 0.0032989 ** income.log:type:women 44.9 2 0.5145 0.5996390 Residuals 3751.1 86 Anova(fit3, type = "III") Anova Table (Type III tests) Sum Sq Df F value Pr(F) (Intercept) 2044.3 1 46.8684 1.065e-09 *** income.log 2295.5 1 52.6297 1.658e-10 *** type 1049.6 2 12.0320 2.470e-05 *** women 19.5 1 0.4467 0.5057

ANOVA output in R 13 income.log:type 959.4 2 10.9979 5.584e-05 *** income.log:women 21.4 1 0.4901 0.4858 type:women 39.4 2 0.4512 0.6383 income.log:type:women 44.9 2 0.5145 0.5996 Residuals 3751.1 86 anova(fit3.s) Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(F) income.log 1 15774.7 15774.7 361.6649 < 2.2e-16 *** type 2 6866.8 3433.4 78.7166 < 2.2e-16 *** women 1 265.5 265.5 6.0861 0.015608 * income.log:type 2 1105.4 552.7 12.6716 1.503e-05 *** income.log:women 1 5.5 5.5 0.1268 0.722623 type:women 2 533.1 266.6 6.1112 0.003299 ** income.log:type:women 2 44.9 22.4 0.5145 0.599639 Residuals 86 3751.1 43.6 Anova(fit3.s, type = "II") Anova Table (Type II tests) Sum Sq Df F value Pr(F) income.log 2670.5 1 61.2252 1.195e-11 *** type 3213.6 2 36.8384 2.780e-12 *** women 531.6 1 12.1877 0.0007616 *** income.log:type 1317.9 2 15.1081 2.382e-06 *** income.log:women 34.8 1 0.7985 0.3740302 type:women 533.1 2 6.1112 0.0032989 ** income.log:type:women 44.9 2 0.5145 0.5996390 Residuals 3751.1 86

ANOVA output in R 14 Anova(fit3.s, type = "III") Anova Table (Type III tests) Sum Sq Df F value Pr(F) (Intercept) 1231.4 1 28.2323 8.347e-07 *** income.log 1505.5 1 34.5170 7.794e-08 *** type 1049.6 2 12.0320 2.470e-05 *** women 40.4 1 0.9262 0.3385 income.log:type 959.4 2 10.9979 5.584e-05 *** income.log:women 46.5 1 1.0663 0.3047 type:women 39.4 2 0.4512 0.6383 income.log:type:women 44.9 2 0.5145 0.5996 Residuals 3751.1 86 ## Refit without 3-way interaction: fit3.2 <- lm( prestige ~ (income.log + type + women)^2 ) summary(fit3.2) Call: lm(formula = prestige ~ (income.log + type + women)^2) Residuals: Min 1Q Median 3Q Max -12.635-4.618-0.150 4.363 15.078 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) -622.38667 86.16402-7.223 1.73e-10 *** income.log 69.02334 9.01244 7.659 2.29e-11 *** typeprof 529.79127 101.85280 5.202 1.28e-06 *** typewc -84.43562 205.76943-0.410 0.6826 women -1.82992 2.20077-0.831 0.4079 income.log:typeprof -52.69919 10.53341-5.003 2.86e-06 *** income.log:typewc 8.39597 21.36586 0.393 0.6953 income.log:women 0.21184 0.23575 0.899 0.3713

ANOVA output in R 15 typeprof:women -0.18859 0.08645-2.181 0.0318 * typewc:women 0.14656 0.09683 1.514 0.1337 Residual standard error: 6.568 on 88 degrees of freedom Multiple R-Squared: 0.8661, Adjusted R-squared: 0.8524 F-statistic: 63.24 on 9 and 88 DF, p-value: < 2.2e-16 Anova( fit3.2, type = "II") Why would you prefer Type II here Anova Table (Type II tests) Sum Sq Df F value Pr(F) income.log 2670.5 1 61.9083 8.606e-12 *** type 3213.6 2 37.2494 1.905e-12 *** women 531.6 1 12.3237 0.0007076 *** income.log:type 1317.9 2 15.2767 2.019e-06 *** income.log:women 34.8 1 0.8074 0.3713334 type:women 533.1 2 6.1794 0.0030817 ** Residuals 3795.9 88 Drop income.log:women interaction fit3.22 <- lm( prestige ~ (income.log + women) * type ) summary(fit3.22) Call: lm(formula = prestige ~ (income.log + women) * type) Residuals: Min 1Q Median 3Q Max -12.49950-4.62355 0.09904 3.77958 16.25782 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) -633.85651 85.12105-7.447 5.85e-11 ***

ANOVA output in R 16 income.log 70.24140 8.90027 7.892 7.23e-12 *** women 0.14709 0.05130 2.867 0.00517 ** typeprof 528.04967 101.72412 5.191 1.31e-06 *** typewc -148.91725 192.64231-0.773 0.44156 income.log:typeprof -52.51131 10.51994-4.992 2.95e-06 *** income.log:typewc 15.15943 19.97450 0.759 0.44989 women:typeprof -0.14656 0.07262-2.018 0.04660 * women:typewc 0.16675 0.09409 1.772 0.07976. Residual standard error: 6.561 on 89 degrees of freedom Multiple R-Squared: 0.8649, Adjusted R-squared: 0.8527 F-statistic: 71.2 on 8 and 89 DF, p-value: < 2.2e-16 Anova( fit3.22, type = "II") Anova Table (Type II tests) Sum Sq Df F value Pr(F) income.log 2670.5 1 62.0425 7.773e-12 *** women 531.6 1 12.3504 0.0006955 *** type 4237.8 2 49.2283 3.997e-15 *** income.log:type 1422.3 2 16.5222 7.904e-07 *** women:type 503.8 2 5.8525 0.0040934 ** Residuals 3830.8 89

ANOVA output in R 17 Graphical presentation of models with higher-order interactions summary(income.log) Min. 1st Qu. Median Mean 3rd Qu. Max. 9.175 9.413 9.549 9.581 9.694 10.430 pred <- expand.grid( income.log = seq(9,10.5,.1), + type = levels(type), + women = c(0,100)) pred$prestige <- predict( fit3.22, newdata = pred) library(lattice) td(new=t) xyplot( prestige ~ income.log women, pred, groups = type, type = 'l', + auto.key = T)

ANOVA output in R 18 bc prof wc 9.0 9.5 10.0 10.5 150 women women 100 prestige 50 0 9.0 9.5 10.0 10.5 income.log

ANOVA output in R 19 td( col = c('red','blue','black'), lwd = 1.5) xyplot( prestige ~ income.log + factor(paste("women =",women,"%")), pred, + groups = type, type = 'l', + auto.key = list(columns= 3, lines=t, points = F))

ANOVA output in R 20 bc prof wc 9.0 9.5 10.0 10.5 women = 0 % women = 100 % 150 100 prestige 50 0 9.0 9.5 10.0 10.5 income.log

ANOVA output in R 21 xyplot( prestige ~ income.log type, + pred, + groups = factor(paste("women =",women,"%")), type = 'l', + auto.key = list(columns= 2, lines=t, points = F))

ANOVA output in R 22 women = 0 % women = 100 % wc 150 100 50 0 prestige 150 bc prof 100 50 0 \ 9.0 9.5 10.0 10.5 income.log