A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation

Similar documents
RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

Multiple Imputation for Missing Data in KLoSA

Missing Data Treatments

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Multiple Imputation of Turnover in EDINET Data: Toward the Improvement of Imputation for the Economic Census

Evaluation of Alternative Imputation Methods for 2017 Economic Census Products 1 Jeremy Knutson and Jared Martin

References. BEAUMONT, J.F., An estimation method for nonignorable nonresponse, Survey Methodology, 26, , 2000.

Flexible Working Arrangements, Collaboration, ICT and Innovation

Flexible Imputation of Missing Data

Handling Missing Data. Ashley Parker EDU 7312

Imputation of multivariate continuous data with non-ignorable missingness

Missing data in political science

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Method for the imputation of the earnings variable in the Belgian LFS

A study on consumer perception about soft drink products

Imputation Procedures for Missing Data in Clinical Research

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Missing Data Imputation Method Comparison in Ohio University Student Retention. Database. A thesis presented to. the faculty of

wine 1 wine 2 wine 3 person person person person person

5 Populations Estimating Animal Populations by Using the Mark-Recapture Method

Variance Estimation of the Design Effect

Processing Conditions on Performance of Manually Operated Tomato Slicer

A Note on a Test for the Sum of Ranksums*

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

A Comparison of Imputation Methods in the 2012 Behavioral Risk Factor Surveillance Survey

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

PSYC 6140 November 16, 2005 ANOVA output in R

The Effect of Almond Flour on Texture and Palatability of Chocolate Chip Cookies. Joclyn Wallace FN 453 Dr. Daniel

Veganuary Month Survey Results

The Development of a Weather-based Crop Disaster Program

THE WINEMAKER S TOOL KIT UCD V&E: Recognizing Non-Microbial Taints; May 18, 2017

Distribution of Hermit Crab Sizes on the Island of Dominica

Attachments: Memo from Lisa Applebee, ACHD Project Manager PowerPoint Slides for October 27, 2009 Work Session

Fair Trade C E R T I F I E D

AWRI Refrigeration Demand Calculator

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Effects of Information and Country of Origin on Chinese Consumer Preferences for Wine: An Experimental Approach in the Field

FAST FOOD PROJECT WAVE 1 CAMPAIGN: PREPARED FOR: "La Plazza" PREPARED BY: "Your Company Name" CREATED ON: 26 May 2014

Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Caffeine and Theobromine Intakes of Children: Results From CSFII , 1998

NOMINAL HOURS: UNIT NUMBER: UNIT DESCRIPTOR:

An application of cumulative prospect theory to travel time variability

Better Punctuation Prediction with Hierarchical Phrase-Based Translation

A.P. Environmental Science. Partners. Mark and Recapture Lab addi. Estimating Population Size

Which of the following are resistant statistical measures? 1. Mean 2. Median 3. Mode 4. Range 5. Standard Deviation

A Comparison of X, Y, and Boomer Generation Wine Consumers in California

Michael Bankier, Jean-Marc Fillion, Manchi Luc and Christian Nadeau Manchi Luc, 15A R.H. Coats Bldg., Statistics Canada, Ottawa K1A 0T6

Figure 1: Percentage of Pennsylvania Wine Trail 2011 Pennsylvania Wine Industry Needs Assessment Survey

A C E. Answers Investigation 1. Review Day: 1/5 pg. 22 #10, 11, 36, 37, 38

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

Labor Supply of Married Couples in the Formal and Informal Sectors in Thailand

You know what you like, but what about everyone else? A Case study on Incomplete Block Segmentation of white-bread consumers.

NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY DEPARTMENT OF HOSPITALITY MANAGEMENT COURSE OUTLINE COURSE #: HMGT 4961 COURSE TITLE: CONTEMPORARY CUISINE

Effect of paraquat and diquat applied preharvest on canola yield and seed quality

As described in the test schedule the wines were stored in the following container types:

Predictors of Repeat Winery Visitation in North Carolina

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

LEAN PRODUCTION FOR WINERIES PROGRAM

Cointegration Analysis of Commodity Prices: Much Ado about the Wrong Thing? Mindy L. Mallory and Sergio H. Lence September 17, 2010

The Elasticity of Substitution between Land and Capital: Evidence from Chicago, Berlin, and Pittsburgh

Imputation Variance Estimation for Statistics New Zealand s Accommodation Occupancy Survey

Effect of paraquat and diquat applied preharvest on canola yield and seed quality

Washington Vineyard Acreage Report: 2011

BNI of kinds of corn chips (descriptive statistics)

THE EFFECTS OF FINAL MOLASSES AND SUGAR PURITY VALUES ON THE CALCULATION OF 96 0 SUGAR AND FACTORY RECOVERY INDEX. Heera Singh

Caffeine And Reaction Rates

Update to A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

IMPUTING FOR MISSING SURVEY RESPONSES Graham Kalton, University of Michigan Daniel Kasprzyk, Social Security Administration i.

Proposed Adjustment of Public Health Fees for FY

Multiple Imputation Scheme for Overcoming the Missing Values and Variability Issues in ITS Data

INTERNATIONAL UNDERGRADUATE PROGRAM BINA NUSANTARA UNIVERSITY. Major Marketing Sarjana Ekonomi Thesis Odd semester year 2007

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

Specialty Coffee Market Research 2013

Computerized Models for Shelf Life Prediction of Post-Harvest Coffee Sterilized Milk Drink

Biosignal Processing Mari Karsikas

ANALYSIS OF THE EVOLUTION AND DISTRIBUTION OF MAIZE CULTIVATED AREA AND PRODUCTION IN ROMANIA

Temperature effect on pollen germination/tube growth in apple pistils

A STUDY ON COFFEE PRODUCT CATEGORIES SOLD IN LANDSCAPE COFFEE SHOPS

The Weights and Measures (Specified Quantities) (Unwrapped Bread and Intoxicating Liquor) Order 2011

What does radical price change and choice reveal?

Lack of Credibility, Inflation Persistence and Disinflation in Colombia

Compensation Market Data in the Wine Grape Industry. Steve Treder Western Management Group

Predicting Wine Quality

Results from the First North Carolina Wine Industry Tracker Survey

Buying Filberts On a Sample Basis

Missing Data: Part 2 Implementing Multiple Imputation in STATA and SPSS. Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 4/24/13

Structural Reforms and Agricultural Export Performance An Empirical Analysis

Statistics: Final Project Report Chipotle Water Cup: Water or Soda?

Youth Explore Trades Skills

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

GEORGIA DEPARTMENT OF CORRECTIONS Standard Operating Procedures. Policy Number: Effective Date: 2/9/2018 Page Number: 1 of 5

1. Expressed in billions of real dollars, seasonally adjusted, annual rate.

DETERMINANTS OF DINER RESPONSE TO ORIENTAL CUISINE IN SPECIALITY RESTAURANTS AND SELECTED CLASSIFIED HOTELS IN NAIROBI COUNTY, KENYA

GEORGIA DEPARTMENT OF CORRECTIONS Standard Operating Procedures

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

Transcription:

A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation Darryl V. Creel RTI International 1 RTI International is a trade name of Research Triangle Institute. www.rti.org

Why do this presentation? Better Understand Performance of Weighted Sequential Hot Deck Alone and in Comparison with Approximate Bayesian Bootstrap Using the Multiple Imputation Variance Estimator (Rubin 1987) Address Andridge and Little (2010) Comment: The weighted sequential hot deck does not appear to have been widely implemented. Graphical Presentation of Monte Carlo Results (which may more easily interpretable than a numeric table) 2

Outline Missing Data Approximate Bayesian Bootstrap Weighted Sequential Hot Deck Monte Carlo Simulation Results 3

Missing Data What to do about missing data? Ignore (generally a bad idea) Weight (unit nonresponse) Impute (item nonresponse) Multiple Imputation (Rubin, 1987) V = W + M + 1 M B 4

Approximate Bayesian Bootstrap Approximate Bayesian Bootstrap (Rubin and Schenker, 1986) Let r be the Number of Respondents Let m be the Number of Nonrespondents Procedure within an imputation class Select r Units With Replacement from the Respondents to Create the Donor Pool (Potential Donors) Select m Units With Replacement from the Donor Pool to be Actual Donors Repeat b number of times 5

Kim s Adjustment Kim (2002) investigates ABB and shows multiple imputation variance estimator has a downward bias that is not negligible for moderate sample sizes. He proposes to reduce the size of the donor pool to minimize bias. 6

Parzen, Lipsitz, and Fitzmaurice s Adjustment Parzen, Lipsitz, and Fitzmaurice (2005) reviewed Kim s (2002) paper and suggested an alternative to reducing bias via a simple correction factor applied to the standard multiple imputation variance estimate. More Easily Implemented More Efficient (less variability in the variance estimates) 7

Weighted Sequential Hot Deck Weighted Sequential Hot Deck (Cox 1980, Iannacchione 1982, RTI International 2008) n r is the number of item respondents (5) w h is the sample weight for the h th respondent n m is the number of item nonrespondents (3) s i is the scaled weight for the i th nonrespondent s2 s3 8

Weighted Sequential Hot Deck Same Donor Pool Repeat WSHD b Times on Respondents Not Proper Multiple Imputation Bootstrap Donor Pool Essentially the First With Replacement Sample in the ABB Process ABB: With Replacement Sample to Create Donor Pool, With Replacement Sample to Select Donors, Repeat b Times WSHD: With Replacement Sample to Create Donor Pool, WSHD to Select Donors, Repeat b Times WSHD Implemented using SUDAAN (RTI International 2008) 9

Monte Carlo Simulation Extends Kim s (2002) which was used by PLF (ignorable nonresponse within one imputation class. Also used by Demirtas, Arguelles, Chung, and Hedeker 2007) 2 Sample Sizes 20 100 2 Distributions of the Analytic Variable Normal with mean 5 and variance 1 Chi-Square with 5 degrees of freedom 3 Response Rates 40% 60% 80% 2 Values for the Number of Multiple Imputations 3 10 10

Monte Carlo Simulation Each Combination of Factors 10,000 Replications 5 Imputation Methods ABB Kim modifies donor pool size PLF variance correction factor WSHD same donor pool WSHDB bootstrap to create donor pool Comparison Relative Bias of the Variance Estimators 11

Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias of the Variance Estimates Normal, n = 20, RR 40% Normal, n = 20, RR 80% Legend 3 10 Number of Imputations Normal, n = 100, RR 40% 3 10 Number of Imputations Normal, n = 100, RR 80% ABB Kim PLF WSHDB WSHD 3 10 Number of Imputations 3 10 Number of Imputations

Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias, 100*(V_hat - S)/S Relative Bias, 100*(V_hat - S)/S -70-50 -30-10 10-70 -50-30 -10 10 Relative Bias of the Variance Estimates ChiSq, n = 20, RR 40% ChiSq, n = 20, RR 80% Legend 3 10 Number of Imputations ChiSq, n = 100, RR 40% 3 10 Number of Imputations ChiSq, n = 100, RR 80% ABB Kim PLF WSHDB WSHD 3 10 Number of Imputations 3 10 Number of Imputations

Future Work Additional Empirical Investigations More Complex Ignorable Nonresponse WSHD Multiple Imputation 14

References Andridge, Rebecca and Roderick Little (2010). A Review of Hot Deck Imputation for Survey Nonresponse. International Statistical Review. Vol. 78, No. 1, Pp. 40-64. Cox, Brenda (1980). The Weighted Sequential Hot Deck Imputation Procedure. Proceedings of the Survey Research Methods Section of the American Statistical Association. Pp. 721-726. Demirtas, Hakan, Lester Arguelles, Hwan Chung, and Donald Hedeker (2007). On the Performance of Bias-reduction Techniques for Variance Estimation in Approximate Bayesian Bootstrap Imputation. Computational Statistics and Data Analysis. Vol. 51, Pp. 4064-4068. Iannacchione, Vincent (1982). Weighted Sequential Hot Deck Imputation Macros. Seventh Annual SAS User s Group International Conference. Kim, J. (2002). A Note on Approximate Bayesian Bootstrap Imputation. Biometrika. Vol. 89, No. 2, Pp. 470-477. Parzen, Michael, Stuart Lipsitz, and Garrett Fitzmaurice (2005). A Note on Reducing the Bias of the Approximate Bayesian Bootstrap Imputation Variance Estimator. Biometrika. Vol. 92, No. 4, Pp. 971-974. RTI International (2008). SUDAAN Language Manual, Release 10. Research Triangle Park: RTI International. Rubin, Donald (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons. Rubin, Donald and Nathaniel Schenker (1986). Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse. Journal of the American Statistical Association. Vol. 81, No. 394, Pp. 366-374. 15

More Information Darryl V. Creel Senior Statistician 301.770.8229 dcreel@rti.org 16