A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation

A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation Darryl V. Creel RTI International 1 RTI International is a trade name of Research Triangle Institute. www.rti.org

Why do this presentation? Better Understand Performance of Weighted Sequential Hot Deck Alone and in Comparison with Approximate Bayesian Bootstrap Using the Multiple Imputation Variance Estimator (Rubin 1987) Address Andridge and Little (2010) Comment: The weighted sequential hot deck does not appear to have been widely implemented. Graphical Presentation of Monte Carlo Results (which may more easily interpretable than a numeric table) 2

Outline Missing Data Approximate Bayesian Bootstrap Weighted Sequential Hot Deck Monte Carlo Simulation Results 3

Missing Data What to do about missing data? Ignore (generally a bad idea) Weight (unit nonresponse) Impute (item nonresponse) Multiple Imputation (Rubin, 1987) V = W + M + 1 M B 4

Approximate Bayesian Bootstrap Approximate Bayesian Bootstrap (Rubin and Schenker, 1986) Let r be the Number of Respondents Let m be the Number of Nonrespondents Procedure within an imputation class Select r Units With Replacement from the Respondents to Create the Donor Pool (Potential Donors) Select m Units With Replacement from the Donor Pool to be Actual Donors Repeat b number of times 5

Kim s Adjustment Kim (2002) investigates ABB and shows multiple imputation variance estimator has a downward bias that is not negligible for moderate sample sizes. He proposes to reduce the size of the donor pool to minimize bias. 6

Parzen, Lipsitz, and Fitzmaurice s Adjustment Parzen, Lipsitz, and Fitzmaurice (2005) reviewed Kim s (2002) paper and suggested an alternative to reducing bias via a simple correction factor applied to the standard multiple imputation variance estimate. More Easily Implemented More Efficient (less variability in the variance estimates) 7

Weighted Sequential Hot Deck Weighted Sequential Hot Deck (Cox 1980, Iannacchione 1982, RTI International 2008) n r is the number of item respondents (5) w h is the sample weight for the h th respondent n m is the number of item nonrespondents (3) s i is the scaled weight for the i th nonrespondent s2 s3 8

Weighted Sequential Hot Deck Same Donor Pool Repeat WSHD b Times on Respondents Not Proper Multiple Imputation Bootstrap Donor Pool Essentially the First With Replacement Sample in the ABB Process ABB: With Replacement Sample to Create Donor Pool, With Replacement Sample to Select Donors, Repeat b Times WSHD: With Replacement Sample to Create Donor Pool, WSHD to Select Donors, Repeat b Times WSHD Implemented using SUDAAN (RTI International 2008) 9

Monte Carlo Simulation Extends Kim s (2002) which was used by PLF (ignorable nonresponse within one imputation class. Also used by Demirtas, Arguelles, Chung, and Hedeker 2007) 2 Sample Sizes 20 100 2 Distributions of the Analytic Variable Normal with mean 5 and variance 1 Chi-Square with 5 degrees of freedom 3 Response Rates 40% 60% 80% 2 Values for the Number of Multiple Imputations 3 10 10

Monte Carlo Simulation Each Combination of Factors 10,000 Replications 5 Imputation Methods ABB Kim modifies donor pool size PLF variance correction factor WSHD same donor pool WSHDB bootstrap to create donor pool Comparison Relative Bias of the Variance Estimators 11

Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias of the Variance Estimates Normal, n = 20, RR 40% Normal, n = 20, RR 80% Legend 3 10 Number of Imputations Normal, n = 100, RR 40% 3 10 Number of Imputations Normal, n = 100, RR 80% ABB Kim PLF WSHDB WSHD 3 10 Number of Imputations 3 10 Number of Imputations

Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias of the Variance Estimates ChiSq, n = 20, RR 40% ChiSq, n = 20, RR 80% Legend 3 10 Number of Imputations ChiSq, n = 100, RR 40% 3 10 Number of Imputations ChiSq, n = 100, RR 80% ABB Kim PLF WSHDB WSHD 3 10 Number of Imputations 3 10 Number of Imputations

Future Work Additional Empirical Investigations More Complex Ignorable Nonresponse WSHD Multiple Imputation 14

References Andridge, Rebecca and Roderick Little (2010). A Review of Hot Deck Imputation for Survey Nonresponse. International Statistical Review. Vol. 78, No. 1, Pp. 40-64. Cox, Brenda (1980). The Weighted Sequential Hot Deck Imputation Procedure. Proceedings of the Survey Research Methods Section of the American Statistical Association. Pp. 721-726. Demirtas, Hakan, Lester Arguelles, Hwan Chung, and Donald Hedeker (2007). On the Performance of Bias-reduction Techniques for Variance Estimation in Approximate Bayesian Bootstrap Imputation. Computational Statistics and Data Analysis. Vol. 51, Pp. 4064-4068. Iannacchione, Vincent (1982). Weighted Sequential Hot Deck Imputation Macros. Seventh Annual SAS User s Group International Conference. Kim, J. (2002). A Note on Approximate Bayesian Bootstrap Imputation. Biometrika. Vol. 89, No. 2, Pp. 470-477. Parzen, Michael, Stuart Lipsitz, and Garrett Fitzmaurice (2005). A Note on Reducing the Bias of the Approximate Bayesian Bootstrap Imputation Variance Estimator. Biometrika. Vol. 92, No. 4, Pp. 971-974. RTI International (2008). SUDAAN Language Manual, Release 10. Research Triangle Park: RTI International. Rubin, Donald (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons. Rubin, Donald and Nathaniel Schenker (1986). Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse. Journal of the American Statistical Association. Vol. 81, No. 394, Pp. 366-374. 15

More Information Darryl V. Creel Senior Statistician 301.770.8229 dcreel@rti.org 16