Lecture 15: Effect modfcaton, and confoundng n logstc regresson Sandy Eckel seckel@jhsph.edu 16 May 2008 1
Today s logstc regresson topcs Includng categorcal predctor create dummy/ndcator varables just lke for lnear regresson Comparng nested models that dffer by two or more varables for logstc regresson Ch-square (X 2 ) Test of Devance.e., lkelhood rato test analogous to the F-test for nested models n lnear regresson Effect Modfcaton and Confoundng 2
Example Mean SAT scores were compared for the 50 US states. The goal of the study was to compare overall SAT scores usng state-wde predctors such as per-pupl expendtures average teachers salary 3
Varables Outcome Total SAT score [sat_low] 1=low, 0=hgh Prmary predctor Average expendtures per pupl [expen] n thousands Contnuous, range: 3.65-9.77, mean: 5.9 Doesn t nclude 0: center at $5,000 per pupl Secondary predctor Mean teacher salary n thousands, n quartles salary1 lowest quartle salary2 2 nd quartle salary3 3 rd quartle salary4 hghest quartle four dummy varables for four categores; must exclude one category to create a reference group 4
Analyss Plan Assess prmary relatonshp (parent model) Add secondary predctor n separate model (extended model) Determne f secondary predctor s statstcally sgnfcant How? Use the Ch-square test of devance 5
Models and Results (note that only exponentated slopes are shown) p log = β + β1 1 p ( Expendture 5) 0 Model 1 (Parent): Only prmary predctor ------------------------------------------------------------------------------ sat_low Odds Rato Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- expenc 2.484706.8246782 2.74 0.006 1.296462 4.76201 ------------------------------------------------------------------------------ p log = β0 + β1( Expendture 5) + β2i ( Salary = 2) + β3i( Salary = 3) + β4i ( Salary = 1 p 4) Model 2 (Extended): Prmary Predctor and Secondary Predctor ------------------------------------------------------------------------------ sat_low Odds Rato Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- expenc 1.796861.7982988 1.32 0.187.7522251 4.292213 salary2 2.783137 2.815949 1.01 0.312.3830872 20.21955 salary3 2.923654 3.2716 0.96 0.338.326154 26.20773 salary4 4.362678 6.147015 1.05 0.296.2756828 69.03933 ------------------------------------------------------------------------------ 6
The X 2 Test of Devance We want to compare the parent model to an extended model, whch dffers by the three dummy varables for the four salary quartles. The X 2 test of devance compares nested logstc regresson models We use t for nested models that dffer by two or more varables because the Wald test cannot be used n that stuaton 7
Performng the Ch-square test of devance for nested logstc regresson 1. Get the log lkelhood (LL) from both models Parent model: LL = -28.94 Extended model: LL = -28.25 2. Fnd the devance for both models Devance = -2(log lkelhood) Parent model: Devance = -2(-28.94) = 57.88 Extended model: Devance = -2(-28.25) = 56.50 Devance s analogous to resdual sums of squares (RSS) n lnear regresson; t measures the devaton stll avalable n the model A saturated model s one n whch every Y s perfectly predcted 8
Performng the Ch-square test of devance for nested logstc regresson, cont 3. Fnd the change n devance between the nested models = devance parent devance extended = 57.88-56.50 = 1.38 = Test Statstc (X 2 ) 4. Evaluate the change n devance The change n devance s an observed Ch-square statstc df = # of varables added H 0 : all new β s are 0 n the populaton.e., H 0 : the parent model s better 9
The Ch-square test of devance for our nested logstc regresson example H 0 : After adjustng for per-pupl expendtures, all the slopes on salary ndcators are 0 (β 2 = β 3 = β 4 = 0 ) X 2 obs = 1.38 df = 3 Wth 3 df and α=0.05, X 2 cr s 7.81 X 2 obs < X 2 cr Fal to reject H 0 Conclude: After adjustng for per-pupl expendture, teachers salary s not a statstcally sgnfcant predctor of low SAT scores 10
Notes about Ch-square devance test The devance test gves us a framework n whch to add several predctors to a model smultaneously Can only handle nested models Analogous to F-test for lnear regresson Also known as "lkelhood rato test" 11
How can I do the Ch-square devance test n R? 1. Ft parent model ft.parent <- glm(y~x1, famly=bnomal()) 2. Ft the extended model (parent model s nested wthn the extended model) ft.extended <- glm(y~x1+x2+x3, famly=bnomal()) 3. Perform the Ch-square devance test anova(ft.parent, ft.extended, test="ch") Example output: Analyss of Devance Table Model 1: y ~ x1 Model 2: y ~ x1 + x2 + x3 Resd. Df Resd. Dev Df Devance P(> Ch ) 1 48 64.250 2 46 48.821 2 15.429 0.0004464 Ch-square Test Statstc Degrees of freedom P-value 12
Effect Modfcaton and Confoundng n Logstc Regresson Heart Dsease Smokng and Coffee Example 13
Effect modfcaton n logstc regresson Just lke wth lnear regresson, we may want to allow dfferent relatonshps between the prmary predctor and outcome across levels of another covarate We can model such relatonshps by fttng nteracton terms n logstc regressons Modellng effect modfcaton wll requre dealng wth two or more covarates 14
Logstc models wth two covarates logt(p) = β 0 + β 1 X 1 + β 2 X 2 Then: logt(p X 1 =X 1 +1,X 2 =X 2 ) = β 0 + β 1 (X 1 +1)+ β 2 X 2 logt(p X 1 =X 1,X 2 =X 2 ) = β 0 + β 1 (X 1 )+ β 2 X 2 n log-odds = β 1 β 1 s the change n log-odds for a 1 unt change n X 1 provded X 2 s held constant. 15
Interpretaton n General Also: log = β 1 2 1 odds(y = 1 X odds(y = 1 X + 1,X,X 1 And: OR = exp(β 1 )!! exp(β 1 ) s the multplcatve change n odds for a 1 unt ncrease n X 1 provded X 2 s held constant. ) 2 The result s smlar for X 2 ) What f the effects of each of X 1 and X 2 depend on the presence of the other? Effect modfcaton! 16
Data: Coronary Heart Dsease (CHD), Smokng and Coffee n = 151 17
Study Informaton Study Facts: Case-Control study (dsease = CHD) 40-50 year-old males prevously n good health Study questons: Is smokng and/or coffee related to an ncreased odds of CHD? Is the assocaton of coffee wth CHD hgher among smokers? That s, s smokng an effect modfer of the coffee-chd assocatons? 18
Fracton wth CHD by smokng and coffee Number n each cell s the proporton of the total number of ndvduals wth that smokng/coffee combnaton that have CHD 19
Pooled data (gnorng smokng) Odds rato of CHD comparng coffee to noncoffee drnkers.53/(1.34 /(1.53).34) = 2.2 95% CI = (1.14, 4.24) 20
Among Non-Smokers P(CHD Coffee drnker) = 15/(15+21) = 0.42 P(CHD Not Coffee drnker) = 15/(15+42) = 0.26 Odds rato of CHD comparng coffee to noncoffee drnkers.42 /(1.26 /(1.42).26) = 2.06 95% CI = (0.82, 4.9) 21
Among Smokers P(CHD Coffee drnker) = 25/(25+14) = 0.64 P(CHD Not Coffee drnker) = 11/(11+8) = 0.58 Odds rato of CHD comparng coffee to noncoffee drnkers.64 /(1.58/(1.64).58) = 1.29 95% CI = (0.42, 4.0) 22
Plot Odds Ratos and 95% CIs 23
Defne Varables Y = 1 f CHD case, 0 f control coffee = 1 f Coffee Drnker, 0 f not smoke = 1 f Smoker, 0 f not p = Pr (Y = 1) n = Number observed at pattern of Xs 24
Logstc Regresson Model Y are ndependent Random part Y are from a Bnomal (n, p ) dstrbuton Systematc part log odds (Y =1) (or logt( Y =1) ) s a functon of Coffee Smokng and coffee-smokng nteracton p log = + coffee + smoke p β0 β1 β2 1 + β coffee 3 smoke 25
Interpretatons stratfy by smokng status p log 1 p If smoke = 0 If smoke = 1 p = β0 + β1coffee p log 1 p + β smoke exp(β 1 ): odds rato of beng a CHD case for coffee drnkers -vs- non-drnkers among non-smokers exp(β 1 +β 3 ): odds rato of beng a CHD case for coffee drnkers -vs- non-drnkers among smokers 2 + β coffee = β 0 + β1coffee 3 smoke log = β 0 + β1coffee + β2 1+ β3coffee 1 = ( β0 + β2) + ( β1 + β3 1 p ) coffee 26
Interpretatons stratfy by coffee drnkng p log 1 p If coffee = 0 If coffee = 1 p = β0 + β1coffee + β2smoke + β3coffee p log = β 0 + β2smoke 1 p smoke log = β 0 + β1 1+ β2smoke + β31 smoke = ( β0 + β1) + ( β2 + β3 1 p exp(β 2 ): odds rato of beng a CHD case for smokers -vs- non-smokers among noncoffee drnkers exp(β 2 +β 3 ): odds rato of beng a CHD case for smokers -vs- non-smokers among coffee drnkers ) smoke 27
Interpretatons p log 1 p = β + 0 β coffee 1 + β smoke 2 + β coffee 3 smoke e β 0 β Probablty of CHD f all X s are zero 0 1 + e.e., fracton of cases among non- smokng noncoffee drnkng ndvduals n the sample (determned by samplng plan) exp(β 3 ): rato of odds ratos What do we mean by ths? 28
exp(β 3 ) Interpretatons p log 1 p = β + β coffee 0 1 + β smoke 2 + β coffee 3 smoke exp(β 3 ): factor by whch odds rato of beng a CHD case for coffee drnkers -vs- nondrnkers s multpled for smokers as compared to non-smokers or exp(β 3 ): factor by whch odds rato of beng a CHD case for smokers -vs- non-smokers s multpled for coffee drnkers as compared to non-coffee drnkers COMMON IDEA: Addtonal multplcatve change n the odds rato beyond the smokng or coffee drnkng effect alone when you have both of these rsk factors present 29
Some Specal Cases: No smokng or coffee drnkng effects Gven p log coffee p = β0 + β1 1 If β 1 = β 2 = β 3 = 0 + β smoke 2 + β coffee 3 smoke Nether smokng nor coffee drnkng s assocated wth ncreased rsk of CHD 30
Some Specal Cases: Only one effect Gven p log 1 p = β0 + β1coffee + β smoke If β 2 = β 3 = 0 Coffee drnkng, but not smokng, s assocated wth ncreased rsk of CHD If β 1 = β 3 = 0 Smokng, but not coffee drnkng, s assocated wth ncreased rsk of CHD 2 + β coffee 3 smoke 31
Some Specal Cases p log 1 p = β0 + β1coffee + β smoke If β 3 = 0 Smokng and coffee drnkng are both assocated wth rsk of CHD but the odds rato of CHD-smokng s the same at both levels of coffee Smokng and coffee drnkng are both assocated wth rsk of CHD but the odds rato of CHD-coffee s the same at both levels of smokng Common dea: the effects of each of these rsk factors s purely addtve (on the log-odds scale), there s no nteracton 2 + β coffee 3 smoke 32
Model 1: man effect of coffee p log 1 p = β + 0 β coffee 1 Logt estmates Number of obs = 151 LR ch2(1) = 5.65 Prob > ch2 = 0.0175 Log lkelhood = -100.64332 Pseudo R2 = 0.0273 ------------------------------------------------------------------------------ chd Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- coffee.7874579.3347123 2.35 0.019.1314338 1.443482 (Intercept) -.6539265.2417869-2.70 0.007-1.12782 -.1800329 ------------------------------------------------------------------------------ 33
Model 2: man effects of coffee and smoke p log 1 p = β + β coffee + β smoke 0 1 2 Logt estmates Number of obs = 151 LR ch2(2) = 15.19 Prob > ch2 = 0.0005 Log lkelhood = -95.869718 Pseudo R2 = 0.0734 ------------------------------------------------------------------------------ chd Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- coffee.5269764.3541932 1.49 0.137 -.1672295 1.221182 smoke 1.101978.3609954 3.05 0.002.3944404 1.809516 (Intercept) -.9572328.2703086-3.54 0.000-1.487028 -.4274377 ------------------------------------------------------------------------------ 34
Model 3: man effects of coffee and smoke AND ther nteracton p log 1 p = β + 0 β coffee 1 + β smoke 2 + β coffee 3 smoke Logt estmates Number of obs = 151 LR ch2(3) = 15.55 Prob > ch2 = 0.0014 Log lkelhood = -95.694169 Pseudo R2 = 0.0751 ------------------------------------------------------------------------------ chd Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- coffee.6931472.4525062 1.53 0.126 -.1937487 1.580043 smoke 1.348073.5535208 2.44 0.015.2631923 2.432954 coffee_smoke -.4317824.7294515-0.59 0.554-1.861481.9979163 (Intercept) -1.029619.3007926-3.42 0.001-1.619162 -.4400768 ------------------------------------------------------------------------------ 35
Comparng Models 1 & 2 Queston: Is smokng a confounder? Varable Intercept Coffee Est se Model1 -.65.24.79.33 z -2.7 2.4 Intercept Coffee Smokng Model 2 -.96.27.53.35 1.10.36-3.5 1.5 3.1 36
Look at Confdence Intervals Wthout Smokng OR = e 0.79 = 2.2 95% CI for log(or): 0.79 ± 1.96(0.33) = (0.13, 1.44) 95% CI for OR: (e 0.13, e 1.44 ) = (1.14, 4.24) Wth Smokng (adjustng for smokng) OR = e 0.53 = 1.7 Smokng does not confound the relatonshp between coffee drnkng and CHD snce 1.7 s n the 95% CI from the model wthout smokng 37
Concluson regardng confoundng So, gnorng smokng, the CHD and coffee OR s 2.2 (95% CI: 1.14-4.26) Adjustng for smokng, gves more modest evdence for a coffee effect However, smokng does not appear to be an mportant confounder 38
Interacton Model Queston: Is smokng an effect modfer of CHDcoffee assocaton? Varable Est se z Model 3 Intercept -1.0.30-3.4 Coffee.69.45 1.5 Smokng 1.3.55 2.4 Coffee*Smokng -.43.73 -.59 39
Testng Interacton Term Z= -0.59, p-value = 0.554 We fal to reject H 0 : nteracton slope= 0 And we conclude there s lttle evdence that smokng s an effect modfer! 40
Queston: Model selecton What model should we choose to descrbe the relatonshp of coffee and smokng wth CHD? 41
Ftted Values We can use transform to get ftted probabltes and compare wth observed proportons usng each of the three models Model 1: Model 2: Model 3: pˆ pˆ e = 1+ = pˆ = e 1+ e -.65+.79Coffee -.65+.79Coffee e -.96+.53Coffee+ 1.1Smokng -.96+.53Coffee+ 1.1Smokng e -.1.03+.69Coffee+ 1.3Smokng-.43(Coffee*Smokng) 1+ e -.1.03+.69Coffee+ 1.3Smokng-.43(Coffee*Smokng) 42
Observed vs Ftted Values 43
Saturated Model Note that ftted values from Model 3 exactly match the observed values ndcatng a saturated model that gves perfect predctons Although the saturated model wll always result n a perfect ft, t s usually not the best model (e.g., when there are contnuous covarates or many covarates) 44
Lkelhood Rato Test The Lkelhood Rato Test wll help decde whether or not addtonal term(s) sgnfcantly mprove the model ft Lkelhood Rato Test (LRT) statstc for comparng nested models s -2 tmes the dfference between the log lkelhoods (LLs) for the Null -vs- Extended models We ve already done ths earler n today s lecture!! Ch-square (X 2 ) Test of Devance s the same thng as the Lkelhood Rato Test Used to compare any par of nested logstc regresson models and get a p-value assocated wth the H 0 : the new β s all=0 45
Example summary wrte-up A case-control study was conducted wth 151 subjects, 66 (44%) of whom had CHD, to assess the relatve mportance of smokng and coffee drnkng as rsk factors. The observed fractons of CHD cases by smokng, coffee strata are 46
Example Summary: Unadjusted ORs The odds of CHD was estmated to be 3.4 tmes hgher among smokers compared to non-smokers 95% CI: (1.7, 7.9) The odds of CHD was estmated to be 2.2 tmes hgher among coffee drnkers compared to non-coffee drnkers 95% CI: (1.1, 4.3) 47
Example Summary: Adjusted ORs Controllng for the potental confoundng of smokng, the coffee odds rato was estmated to be 1.7 wth 95% CI: (.85, 3.4). Hence, the evdence n these data are nsuffcent to conclude coffee has an ndependent effect on CHD beyond that of smokng. 48
Example Summary: effect modfcaton Fnally, we estmated the coffee odds rato separately for smokers and non-smokers to assess whether smokng s an effect modfer of the coffee-chd relatonshp. For the smokers and non-smokers, the coffee odds rato was estmated to be 1.3 (95% CI:.42, 4.0) and 2.0 (95% CI:.82, 4.9) respectvely. There s lttle evdence of effect modfcaton n these data. 49
Summary of Lecture 15 Includng categorcal predctors n logstc regresson create dummy/ndcator varables just lke for lnear regresson Comparng nested models that dffer by two or more varables for logstc regresson Ch-square (X 2 ) Test of Devance.e., lkelhood rato test analogous to the F-test for nested models n lnear regresson Effect Modfcaton and Confoundng n logstc regresson 50