Poisson GLM, Cox PH, & degrees of freedom

Similar documents
Comparing R print-outs from LM, GLM, LMM and GLMM

R Analysis Example Replication C10

INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS. Subject CS1B Actuarial Statistics

Faculty of Science FINAL EXAMINATION MATH-523B Generalized Linear Models

> Y=degre=="deces" > table(y) Y FALSE TRUE

Summary of Main Points

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Missing Data Treatments

> library(sem) > cor.mat<-read.moments(names=c("ten1", "ten2", "ten3", "wor1", "wor2", + "wor3", "irthk1", "irthk2", "irthk3", "body1", "body2",

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Final Exam Financial Data Analysis (6 Credit points/imp Students) March 2, 2006

Model Log-Linear (Bagian 2) Dr. Kusman Sadik, M.Si Program Studi Pascasarjana Departemen Statistika IPB, 2018/2019

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Table 1: Number of patients by ICU hospital level and geographical locality.

Imputation of multivariate continuous data with non-ignorable missingness

wine 1 wine 2 wine 3 person person person person person

Comparative Analysis of Dispersion Parameter Estimates in Loglinear Modeling

PSYC 6140 November 16, 2005 ANOVA output in R

Multiple Imputation for Missing Data in KLoSA

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

Online Appendix to The Effect of Liquidity on Governance

From VOC to IPA: This Beer s For You!

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

November K. J. Martijn Cremers Lubomir P. Litov Simone M. Sepe

The Financing and Growth of Firms in China and India: Evidence from Capital Markets

Method for the imputation of the earnings variable in the Belgian LFS

Feeding habits of range-shifting herbivores: tropical surgeonfishes in a temperate environment

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

STAT 5302 Applied Regression Analysis. Hawkins

Eestimated coefficient. t-value

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Missing Data: Part 2 Implementing Multiple Imputation in STATA and SPSS. Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 4/24/13

Figure S2. Measurement locations for meteorological stations. (data made available by KMI:

The SAS System 09:38 Wednesday, December 2, The CANDISC Procedure

Flexible Imputation of Missing Data

February 26, The results below are generated from an R script.

Guatemala. 1. Guatemala: Change in food prices

MAIN FACTORS THAT DETERMINE CONSUMER BEHAVIOR FOR WINE IN THE REGION OF PRIZREN, KOSOVO

Protest Campaigns and Movement Success: Desegregating the U.S. South in the Early 1960s

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

Appendix A. Table A1: Marginal effects and elasticities on the export probability

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Preferred citation style

Flexible Working Arrangements, Collaboration, ICT and Innovation

Not to be published - available as an online Appendix only! 1.1 Discussion of Effects of Control Variables

Climate change may alter human physical activity patterns

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

Valuation in the Life Settlements Market

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

Wine Rating Prediction

Influence of Service Quality, Corporate Image and Perceived Value on Customer Behavioral Responses: CFA and Measurement Model

J. Best 1 A. Tepley 2

A brief history of Cactoblastis cactorum and its effects on Florida native Opuntia

Rheological and physicochemical studies on emulsions formulated with chitosan previously dispersed in aqueous solutions of lactic acid

Homework 1 - Solutions. Problem 2

Appendix A. Table A.1: Logit Estimates for Elasticities

Online Appendix for. Inattention and Inertia in Household Finance: Evidence from the Danish Mortgage Market,

A latent class approach for estimating energy demands and efficiency in transport:

*p <.05. **p <.01. ***p <.001.

Learning Connectivity Networks from High-Dimensional Point Processes

IT 403 Project Beer Advocate Analysis

Appendix Table A1 Number of years since deregulation

Supplementary Table 1. Glycemic load (GL) and glycemic index (GI) of individual fruits. Carbohydrate (g/serving)

Selection bias in innovation studies: A simple test

Imputation Procedures for Missing Data in Clinical Research

Missing Data Imputation Method Comparison in Ohio University Student Retention. Database. A thesis presented to. the faculty of

Supplementary Material Home range size and resource use of breeding and non-breeding white storks along a land use gradient

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

Relation between Grape Wine Quality and Related Physicochemical Indexes

Handling Missing Data. Ashley Parker EDU 7312

"Primary agricultural commodity trade and labour market outcome

2 nd Midterm Exam-Solution

On-line Appendix for the paper: Sticky Wages. Evidence from Quarterly Microeconomic Data. Appendix A. Weights used to compute aggregate indicators

Predicting Wine Quality

Valuing Health Risk Reductions from Air Quality Improvement: Evidence from a New Discrete Choice Experiment (DCE) in China

Integrating the Land Use and Travel Model in San Francisco County Lisa Zorn

Growth in early yyears: statistical and clinical insights

Lack of Credibility, Inflation Persistence and Disinflation in Colombia

Lesson 23: Newton s Law of Cooling

Loire Valley vs South Africa : which marketing practices regarding Chenin?

The Role of Calorie Content, Menu Items, and Health Beliefs on the School Lunch Perceived Health Rating

tutorial_archetypes_prototypes_siqd_ensembles.r michael Sat Oct 29 21:38:

Forecasting the Value of Fine Wines

Bags not: avoiding the undesirable Laurie and Winifred Bauer

Understanding the impact hopping rate has on the aroma quality and intensity of beer dry hopped with Cascade

Structural Reforms and Agricultural Export Performance An Empirical Analysis

Northern Region Central Region Southern Region No. % of total No. % of total No. % of total Schools Da bomb

Catherine A. Durham, Oregon State University Iain Pardoe, University of Oregon Esteban Vega, Oregon State University. August 27,

Measuring economic value of whale conservation

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Supporing Information. Modelling the Atomic Arrangement of Amorphous 2D Silica: Analysis

Survival of the Fittest: The Impact of Eco-certification on the Performance of German Wineries. Patrizia Fanasch University of Paderborn, Germany

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

Why PAM Works. An In-Depth Look at Scoring Matrices and Algorithms. Michael Darling Nazareth College. The Origin: Sequence Alignment

Evaluation and Analysis Model of Wine Quality Based on Mathematical Model

Transcription:

Poisson GLM, Cox PH, & degrees of freedom Michael C. Donohue Alzheimer s Therapeutic Research Institute Keck School of Medicine University of Southern California December 13, 2017 1 Introduction We discuss connections between the Cox proportional hazards model and Poisson generalized linear models as described in Whitehead (1980). We fit a sample dataset using coxph() and glm() and show that the model degrees of freedom differ by the number of events. 2 A simple Cox PH example 2.1 Generate data We generate proportional hazards mixed model data. options(width=75) library(phmm) Loading required package: survival Loading required package: lattice Loading required package: Matrix n <- 50 # total sample size nclust <- 5 # number of clusters clusters <- rep(1:nclust,each=n/nclust) beta0 <- c(1,2) set.seed(13) Z <-cbind(z1=sample(0:1,n,replace=true), Z2=sample(0:1,n,replace=TRUE), Z3=sample(0:1,n,replace=TRUE)) b <- cbind(rep(rnorm(nclust), each=n/nclust), rep(rnorm(nclust), each=n/nclust)) Wb <- matrix(0,n,2) for( j in 1:2) Wb[,j] <- Z[,j]*b[,j] Wb <- apply(wb,1,sum) T <- -log(runif(n,0,1))*exp(-z[,c('z1','z2')]%*%beta0-wb) C <- runif(n,0,1) time <- ifelse(t<c,t,c) event <- ifelse(t <= C,1,0) sum(event) 1

[1] 31 phmmd <- data.frame(z) phmmd$cluster <- clusters phmmd$time <- time phmmd$event <- event 2.2 Fit the Cox PH model fit.ph <- coxph(surv(time, event) ~ Z1 + Z2, phmmd, method="breslow", x=true, y=true) summary(fit.ph) Call: coxph(formula = Surv(time, event) ~ Z1 + Z2, data = phmmd, x = TRUE, y = TRUE, method = "breslow") n= 50, number of events= 31 coef exp(coef) se(coef) z Pr(> z ) Z1 0.8549 2.3513 0.3918 2.182 0.02909 * Z2 1.0888 2.9708 0.3684 2.955 0.00312 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower.95 upper.95 Z1 2.351 0.4253 1.091 5.067 Z2 2.971 0.3366 1.443 6.116 Concordance= 0.71 (se = 0.055 ) Rsquare= 0.237 (max possible= 0.984 ) Likelihood ratio test= 13.55 on 2 df, p=0.001141 Wald test = 13.52 on 2 df, p=0.001158 Score (logrank) test = 14.63 on 2 df, p=0.0006671 fit.ph$loglik[2] [1] -95.97131 Next we create data to fit an auxilary Poisson model as described in Whitehead (1980) using the pseudopoisphmm() function provided in the phmm package. This function also extracts the linear predictors as estimated from the Cox PH model so that we can calculate likelihoods and degrees of freedom. 2.3 Likelihood and degrees of freedom for Poisson GLM from Cox PH parameters ppd <- as.data.frame(as.matrix(pseudopoisphmm(fit.ph))) # pois likelihood poisl <- c() eventtimes <- sort(phmmd$time[phmmd$event == 1]) 2

for(h in 1:length(eventtimes)){ js <- ppd$time == eventtimes[h] & ppd$m >= 1 # j star j <- ppd$time == eventtimes[h] if(sum(js) > 1) stop("tied event times") poisl <- c(poisl, ppd[js, "N"]*exp(-1)*exp(ppd[js, "linear.predictors"])/ sum(ppd[j, "N"]*exp(ppd[j, "linear.predictors"]))) } Poisson likelihood: sum(log(poisl)) [1] -66.5633 sum(log(poisl)) - fit.ph$loglik[2] [1] 29.40801 Poisson degrees of freedom length(fit.ph$coef) + sum(phmmd$event) [1] 33 2.4 Fit auxiliary Poisson GLM We fit an auxiliary Poisson GLM and note that the parameter estimates for z1 and z2 are identical to the coxph() fit, and the likelihood and degrees of freedom are as expected. ppd$t <- as.factor(ppd$time) fit.glm <- glm(m~-1+t+z1+z2+offset(log(n)), ppd, family=poisson) summary(fit.glm) Call: glm(formula = m ~ -1 + t + z1 + z2 + offset(log(n)), family = poisson, data = ppd) Deviance Residuals: Min 1Q Median 3Q Max -0.9685-0.7531-0.5553 0.4293 1.6823 Coefficients: Estimate Std. Error z value Pr(> z ) t0.000277233256778163-5.0494 1.0704-4.717 2.39e-06 *** t0.000285092717793308-5.0035 1.0679-4.685 2.79e-06 *** t0.000382448373472765-4.9876 1.0683-4.669 3.03e-06 *** t0.00559427171447325-4.9388 1.0655-4.635 3.57e-06 *** t0.00764335258097282-4.8875 1.0625-4.600 4.22e-06 *** t0.00808285780728387-4.8648 1.0635-4.574 4.78e-06 *** t0.0216256697018544-4.8013 1.0609-4.526 6.02e-06 *** 3

t0.0219649983261458-4.7930 1.0622-4.512 6.41e-06 *** t0.0233956453029104-4.7681 1.0634-4.484 7.34e-06 *** t0.0235837855332384-4.7069 1.0598-4.441 8.95e-06 *** t0.0237625311885084-4.6797 1.0612-4.410 1.03e-05 *** t0.027482795605763-4.6127 1.0572-4.363 1.28e-05 *** t0.0278642961804028-4.5890 1.0573-4.340 1.42e-05 *** t0.0316525538364514-4.5401 1.0576-4.293 1.76e-05 *** t0.0357745779481545-4.5147 1.0578-4.268 1.97e-05 *** t0.0366185731334857-4.4351 1.0529-4.212 2.53e-05 *** t0.066999301944422-4.3869 1.0556-4.156 3.24e-05 *** t0.0742904888064418-4.3572 1.0557-4.127 3.67e-05 *** t0.09491415021304-4.2493 1.0513-4.042 5.30e-05 *** t0.125132209250348-4.2151 1.0513-4.010 6.08e-05 *** t0.132722661166308-4.1798 1.0513-3.976 7.01e-05 *** t0.140357744467437-4.0667 1.0439-3.896 9.79e-05 *** t0.163527928343998-3.9258 1.0448-3.757 0.000172 *** t0.193971448733795-3.7760 1.0443-3.616 0.000299 *** t0.204887967162952-3.7054 1.0458-3.543 0.000396 *** t0.227852125295401-3.6459 1.0457-3.486 0.000490 *** t0.266238317485871-3.5253 1.0513-3.353 0.000799 *** t0.276177426334698-3.2951 1.0356-3.182 0.001464 ** t0.360993505812205-3.2039 1.0353-3.095 0.001970 ** t0.426697507683412-2.7934 1.0367-2.694 0.007051 ** t0.511995413073629-1.8487 1.0105-1.830 0.067323. z1 0.8549 0.3918 2.182 0.029092 * z2 1.0888 0.3684 2.955 0.003123 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1743.184 on 121 degrees of freedom Residual deviance: 71.127 on 88 degrees of freedom AIC: 199.13 Number of Fisher Scoring iterations: 6 fit.ph$coef Z1 Z2 0.8549497 1.0888337 loglik(fit.glm) 'log Lik.' -66.5633 (df=33) loglik(fit.glm)[1] - sum(log(poisl)) [1] -1.421085e-14 The additional parameter estimates correspond to the estimated log baseline hazard, which we verify using the basehaz() function. bh <- basehaz(fit.ph, centered = FALSE) log(bh$hazard - c(0,bh$hazard[1:(length(bh$hazard)-1)]))[1:10] 4

[1] -5.049378-5.003546-4.987633-4.938810-4.887479-4.864823 -Inf [8] -4.801254-4.793001-4.768072 3 Extending to PHMM 3.1 Fit PHMM fit.phmm <- phmm(surv(time, event) ~ Z1 + Z2 + (Z1 + Z2 cluster), phmmd, Gbs = 100, Gbsvar = 1000, VARSTART = 1, NINIT = 10, MAXSTEP = 100, CONVERG=90) alpha: alpha=0.0000 alpha=0.0000 alpha=0.0000 alpha=0.0000 alpha=0.0000 alpha=0.0000 alpha=0.0000 alpha=0.00 b: b=0.0000 b=0.0000 b=0.0000 Lambexp: Lambexp=0.0281 Lambexp=0.0132 Lambexp=0.0926 Lambexp=0.1284 Lambexp=0.0743 Lambexp=0.2084 Lambexp=0.0 ww: w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 5

w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 w1=1.0000 w2=0.0000 omega: omega=4.1303 omega=3.3159 omega=2.7789 omega=2.7789 omega=0.5359 omega=4.1303 omega=0.9924 omega=0.9924 omega=2.3823 omega=4.1303 omega=2.3823 omega=0.4882 omega=1.7323 omega=3.3159 omega=0.4834 omega=3.3159 omega=2.7789 omega=2.3823 omega=0.9398 omega=0.8638 omega=0.3697 omega=0.7356 omega=1.7323 omega=0.7118 omega=0.8317 omega=3.3159 omega=0.5359 omega=0.8638 omega=3.3159 omega=1.7323 6

omega=0.7118 omega=0.9384 omega=0.8638 omega=1.4173 omega=0.4834 omega=0.9398 omega=0.4882 omega=1.7323 omega=0.5359 omega=0.8638 omega=0.5359 omega=0.9398 omega=0.7356 omega=0.7356 omega=0.3697 omega=0.7356 omega=0.9398 omega=1.4173 omega=0.4834 omega=0.7356 a: 5.0000 3.0000 9.0000 8.0000 3.0000 2.0000 5.0000 2.0000 9.0000 6.0000 betahat: 0.8171 1.5158 summary(fit.phmm) Proportional Hazards Mixed-Effects Model fit by MCMC-EM Model: Surv(time, event) ~ Z1 + Z2 + (Z1 + Z2 cluster) Data: phmmd Log-likelihood: Conditional Laplace RIS -83.43-122.61-122.53 Fixed effects: Surv(time, event) ~ Z1 + Z2 Estimate Std.Error Z1 0.8171 0.3888 Z2 1.5158 0.2903 Random effects: (Z1 + Z2 cluster) Estimated variance-covariance matrix: (Intercept) Z1 Z2 (Intercept) 0.1951 0.0000 0.0000 Z1 0.0000 0.4559 0.0000 Z2 0.0000 0.0000 0.3465 7

Number of Observations: 50 Number of Groups: 5 3.2 Likelihood and degrees of freedom for Poisson GLMM from PHMM parameters ppd <- as.data.frame(as.matrix(pseudopoisphmm(fit.phmm))) poisl <- c() eventtimes <- sort(phmmd$time[phmmd$event == 1]) for(h in 1:length(eventtimes)){ js <- ppd$time == eventtimes[h] & ppd$m >= 1 # j star j <- ppd$time == eventtimes[h] if(sum(js) > 1) stop("tied event times") poisl <- c(poisl, ppd[js, "N"]*exp(-1)*exp(ppd[js, "linear.predictors"])/ sum(ppd[j, "N"]*exp(ppd[j, "linear.predictors"]))) } Poisson likelihood: sum(log(poisl)) [1] -93.46472 sum(log(poisl)) - fit.phmm$loglik[1] Conditional -10.03456 Poisson degrees of freedom # Poisson GLMM degrees of freedom length(unique(x$cluster)) * x$nrandom + x$nfixed tracehat(fit.phmm, "pseudopois") # + 2*sum(phmmd$event) [1] 6.505417 3.3 Fit auxiliary Poisson GLMM We fit an auxiliary Poisson GLMM, although with a general variance-covariance matrix for the random effects (phmm() only fits models with diagonal variance-covariance matrix). library(lme4) ppd$t <- as.factor(ppd$time) fit.lmer <- glmer(m~-1+t+z1+z2+ (z1+z2 cluster)+offset(log(n)), data=ppd, family=poisson, nagq=0) summary(fit.lmer)$coef Estimate Std. Error z value Pr(> z ) t0.000277233256778163-5.958010 1.1564660-5.151911 2.578456e-07 8

t0.000285092717793308-5.812091 1.1439927-5.080531 3.763810e-07 t0.000382448373472765-5.793228 1.1454929-5.057411 4.249876e-07 t0.00559427171447325-5.695820 1.1403300-4.994887 5.887020e-07 t0.00764335258097282-5.587889 1.1353871-4.921572 8.585185e-07 t0.00808285780728387-5.579025 1.1366552-4.908283 9.187736e-07 t0.0216256697018544-5.354631 1.1140436-4.806482 1.536091e-06 t0.0219649983261458-5.351292 1.1147513-4.800436 1.583209e-06 t0.0233956453029104-5.305904 1.1172975-4.748873 2.045536e-06 t0.0235837855332384-5.003952 1.0894229-4.593214 4.364714e-06 t0.0237625311885084-4.939091 1.0914764-4.525147 6.035353e-06 t0.027482795605763-4.909025 1.0906328-4.501080 6.760915e-06 t0.0278642961804028-4.876374 1.0907983-4.470464 7.805026e-06 t0.0316525538364514-4.818245 1.0926035-4.409875 1.034302e-05 t0.0357745779481545-4.766444 1.0943791-4.355386 1.328324e-05 t0.0366185731334857-4.466495 1.0662061-4.189148 2.800043e-05 t0.066999301944422-4.341048 1.0720559-4.049274 5.137683e-05 t0.0742904888064418-4.317349 1.0714179-4.029566 5.588003e-05 t0.09491415021304-4.260289 1.0698365-3.982187 6.828400e-05 t0.125132209250348-4.196863 1.0699310-3.922555 8.761487e-05 t0.132722661166308-4.181387 1.0700511-3.907652 9.319747e-05 t0.140357744467437-4.050659 1.0570046-3.832206 1.269992e-04 t0.163527928343998-3.850829 1.0550657-3.649848 2.623955e-04 t0.193971448733795-3.567027 1.0528487-3.387976 7.041032e-04 t0.204887967162952-3.445926 1.0538681-3.269789 1.076278e-03 t0.227852125295401-3.386892 1.0531723-3.215896 1.300380e-03 t0.266238317485871-3.255781 1.0602680-3.070715 2.135471e-03 t0.276177426334698-3.082084 1.0473420-2.942767 3.252930e-03 t0.360993505812205-2.859386 1.0472010-2.730504 6.323763e-03 t0.426697507683412-2.392130 1.0413226-2.297203 2.160716e-02 t0.511995413073629-1.634348 1.0219183-1.599294 1.097553e-01 z1 0.811712 0.4574939 1.774257 7.602058e-02 z2 1.609640 0.4527416 3.555317 3.775234e-04 fit.phmm$coef Z1 Z2 0.8170856 1.5157523 loglik(fit.lmer) 'log Lik.' -100.9984 (df=39) sum(log(poisl)) - loglik(fit.lmer)[1] [1] 7.533638 log(fit.phmm$lambda)[1:10] [1] -5.903387-5.779829-5.756366-5.661413-5.556490-5.547022 -Inf [8] -5.360862-5.356190-5.303652 9