Can Survey Bootstrap Replicates Be Used for Cross-Validation?

Similar documents
Lecture 15: Effect modification, and confounding in logistic regression

Spatiotemporal Analysis of Marriage and Marital Fertility in Japan: Using Geographically Weighted Regression

The Policy Performance of NFSF and Slippage in Futures Markets

The labour market impacts of adult education and training in Canada

Consumer Price Indices

Inventory Decision Model of Single-echelon and Two-indenture Repairable Spares

Modelling Beta Risk for New Zealand Industry Portfolios

Evaluation Method of Banking System Stability Based on the Volume of Subsystems

CALIBRATION ALGORITHM FOR CURRENT-OUTPUT R-2R LADDERS

Development and application of a rural water supply assessment tool in Brazil

Faculty Research Working Papers Series

Experimental and Numerical Studies on Flocculation of Sand-Mud Suspensions

The collision avoidance control algorithm of the UAV formation flight

THE ROLE OF ENVIRONMENT IN A REGION S SUSTAINABLE DEVELOPMENT AS DESCRIBED BY A BUTTERFLY CATASTROPHE MODEL

Ethnic Sorting in the Netherlands

Practical design approach for trapezoidal modulation of a radio-frequency quadrupole

Further Evidence on Finance-Growth Causality: A Panel Data Analysis

Oil Discovery, Real Exchange Rate Appreciation and Poverty in Ghana

The Rise of Obesity in Transition Economies: Theory and Evidence from the Russian Longitudinal Monitoring Survey

School Breakfast and Lunch Costs: Are There Economies of Scale? Authors. Michael Ollinger, Katherine Ralston, and Joanne Guthrie

BLIND SOURCE SEPARATION BASED ON SPACE-TIME-FREQUENCY DIVERSITY. Scott Rickard, Radu Balan, Justinian Rosca

epub WU Institutional Repository

The Pennsylvania State University. The Graduate School. College of Agricultural Sciences ESSAYS ON WELFARE USE, THE WAGE GAP AND UNEMPLOYMENT

AN ATTRACTIVENESS-BASED MODEL FOR SHOPPING TRIPS IN URBAN AREAS

Demand Analysis of Non-Alcoholic Beverages in Japan

Migration and Fertility: Competing Hypotheses Re-examined

'""' USAFA/ Coord.{!tr lv~ ""' DFCE... ~A.., USAFA/ DFER. Sign C:.dl A:>.-').l'. 23 \,;'~ rs- 7 USAFA-DF-PA- CJ

Heat Spreading Revisited Effective Heat Spreading Angle

Trade liberalization and labour markets:

LABOUR UNIONS AND WAGE INEQUALITY AMONG AFRICAN MEN IN SOUTH AFRICA

Dominance Testing for Pro-Poor Growth with an Application to European Growth

Resource Allocation for Cocoyam and Coffee Production in Momo, North West Region of Cameroon

Weight Gain During the Transition to Adulthood among Children of Immigrants: Is Parental Co-residence Important? Elizabeth Baker

The Flower of Paradise: Substitution or Income Effect? Sara Borelli University of Illinois at Chicago

Food Marketing Policy Center

Balanced Binary Trees

Coffee Differentiation: Demand Analysis at Retail Level in the US Market

Demand for meat quantitu and quality in Malaysia: Implications to Australia

European Technical Approval ETA-06/0009

William C. Hunter. Julapa Jagtiani

AN EVALUATION OF TRAINING

The Extension of Weight Determining Method for Weighted Zone Scoring in Information Retrieval

6 λk 0gh 3 þ 1 6 k. þ 1. z ¼ ð1 z ÞH p þ b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

A centralized/decentralized control approach for periodic systems with application to chromatographic separation processes

Coffeemar. Installation and maintenance guide. Contents: Ctra, Marcilla, Km PERALTA, Navarra SPAIN

Variance Estimation of the Design Effect

Estimation of State-by-State Trade Flows for Service Industries *

The Exchange Rate and the Performance of Japanese Firms: A Preliminary Analysis Using Firm-level Panel Data

OPERATING INSTRUCTIONS

Estimation of State-by-State Trade Flows for Service Industries *

YIELD AND COMPOSITIONAL DIFFERENCES BETWEEN SELECTIONS OF GRAPEVINE CV. CABERNET SAUVIGNON

QUICK START GUIDE Armonia

Oil spill forecasting in the Mediterranean Sea

Overall stability of multi-span portal sheds at right-angles to the portal spans

The Optimal Wine. A Study in Design Optimization. April 26, Erin MacDonald Alexis Lubensky Bryon Sohns

ABSTRACT 1. INTRODUCTION AND HISTORY OF DEVELOPMENT

Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowdsourced Content

A COMPARISON OF TWO ARRESTEE DRUG USE ESTIMATION METHODS Y. Michael Yang and Dean R. Gerstein

Cardiff Economics Working Papers

Prediction of steel plate deformation due to triangle heating using the inherent strain method

I - 1 The IBPGR was requested to: 1. recognize the two designated ISSCT world collections; 2. establish seed repositories; and

Development, maturation, and postharvest responses of Actinidia arguta (Sieb. et Zucc.) Planch, ex Miq. fruit

Background. Sample design

Investigation of factors affecting consumers bread wastage

DELINEATION OF DISEASED TEA PATCHES USING MXL AND TEXTURE BASED CLASSIFICATION

Factors Affecting Frequency of Fast Food Consumption

The Qualities of Albanian Soft Wheat Genotypes the Mathematical Approach

Optimization Model of Oil-Volume Marking with Tilted Oil Tank

Impacts of U.S. Sugar Policy and the North American Free Trade Agreement on Trade in North American Sugar Containing Products

Interannual Herbaceous Biomass Response to Increasing Honey Mesquite Cover on Two Soils

Stages of Globalization, Inequality and Unemployment

The household budget and expenditure data collection module (IOF 2014/2015) within a continuous multipurpose survey system (INCAF)

THE REDESIGNED CANADIAN MONTHLY WHOLESALE AND RETAIL TRADE SURVEY: A POSTMORTEM OF THE IMPLEMENTATION

econstor Make Your Publications Visible.

Fixation effects: do they exist in design problem solving?

~ AUSTlNMG TECHNICAL SERVICE BULLETIN BRITISH NO.~ May, Add itiona l Tools Available Austin/MG. All

WEST VOLUNTEER FI RE DEPARTMENT COOK-OFF

Ground Improvement Using Preloading with Prefabricated Vertical Drains

(12) United States Patent Jaswal et a].

Description of Danish Practices in Retail Trade Statistics.

Red Green Black Trees: Extension to Red Black Trees

Multiple Imputation for Missing Data in KLoSA

ANALYSIS OF WORK ROLL THERMAL BEHAVIOR FOR 1450MM HOT STRIP MILL WITH GENETIC ALGORITHM

Calculation of Theoretical Torque and Displacement in an Internal Gear Pump

Catching up or falling behind in Eastern European agriculture the case of milk production

Forecasting Harvest Area and Production of Strawberry Using Time Series Analyses

Mekelle University College of Business and Economics Department of Economics

16.1 Volume of Prisms and Cylinders

DESIGN OF A RAILWAY CARRIAGE, DRIVEN BY A LINEAR MOTOR WITH ACTIVE SUSPENSION/TILT MODULE

Entomology ABSTRACT INTRODUCTION MATERIALS AND METHODS. D. Samoedi. Indonesian Sugar Research Institute, Pasuruan, Indonesia

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

L A T E S U M M E R F A L L W I N T E R S P R I N G E A R L y S U M M E R

Physics Engineering PC 1431 Experiment P2 Heat Engine. Section B: Brief Theory (condensed from Serway & Jewett)

Development and validation of a new mass-consistent model using terraininfluenced

Protein Isolation from Tomato Seed Meal, Extraction Optimization

Russell James Department of Scientific and Industrial Research Taupo-ldairakei, New Zealand

Concession Contract Renegotiations: Some Efficiency vs. Equity Dilemmas

Revista Ingenierías Universidad de Medellín ISSN: Universidad de Medellín Colombia

ECOLOGICAL STUDIES OF CTENOSCIARA HAWAIIENSIS (HARDY) (Diptera: Sciaridae) 2

OPTIMIZATION OF TWO-DIMENSIONAL WING IN GROUND EFFECT CONSIDERING AERODYNAMIC CENTER OF HEIGHT AND LIFT

Transcription:

Secton on Survey Researc Metods JSM 2008 Can Survey Bootstrap Replcates Be Used for Cross-Valdaton? Geoff Rowe 1 and Davd Bnder 2 1 Geoff Rowe, Statstcs Canada, Tunney's Pasture, Ottawa, ON, K1A 0T6, Canada; geoff.rowe@statcan.gc.ca 2 Davd Bnder, Statstcs Canada, Tunney's Pasture, Ottawa, ON, K1A 0T6, Canada; dbnder49@otmal.com Abstract We propose an extenson to bootstrap metods for evaluatng regresson models estmated wt data from surveys wt complex desgn. Suc metods nvolve selecton of replcate samples formed from smple random samples of sampled clusters wtn strata. Selecton s carred out wt replacement, so tat about one trd of clusters are typcally left out of a gven replcate sample. Our evaluaton metod explots te excluded clusters, usng tem as cross-valdaton samples for assessment of a model s predcton error, and at te same tme usng te bootstrap samples to estmate te varance of regresson coeffcents. We also consder te use of a sample of te replcates as a cross-valdaton sample. Key Words: Complex surveys,.632+ Bootstrap, Healt Utlty Index 1. Introducton As s well known, regresson resduals wll gve an overly optmstc vew of te predctve value of an equaton (Efron, 1986). It s also known tat model-specfcaton searces tat consst smply of elmnatng all of te non-sgnfcant terms from a tral specfcaton can result n a selected equaton wt nferor predctve value (Haste, Tbsran and Fredman, 2001). Smply retanng all terms tat ave an ntutve appeal (weter sgnfcant or not) can also result n an equaton wt nferor predctve value. Cross-valdaton metods attempt to drectly facltate te searc for specfcatons tat wll produce accurate predctons. In ts paper, we extend te scope of cross-valdaton metods to data from surveys wt complex desgn. Te paper s n two parts. Followng te ntroducton, Secton 2 outlnes desgn-based propertes of te bootstrap/crossvaldaton and establses te valdty of metods utlzng replcate samples wen tose metods depend only on frst and second moments. Secton 3 llustrates our metod wt a comparatve assessment of selected models of ealt dynamcs usng Statstcs Canada s longtudnal Natonal Populaton Healt Survey (1992-2004). Secton 4 provdes concludng comments. 2. Cross-valdaton appled to survey samples Te term cross-valdaton generally refers to tecnques tat drectly assess predcton error of a ftted equaton by splttng te avalable sample and usng one part to ft te equaton (model constructon) and reservng te oter part for an assessment of predctons (model valdaton) (Pcard and Cook, 1984). Model selecton by cross-valdaton conssts of proposng and fttng alternatve models, assessng te out-of-sample predcton error of eac, and coosng te one wt te smallest predcton error. In practce, some care needs to be exercsed n applyng te cross-valdaton metod. Ts s because te sze of te sample used n model constructon wll affect te bas n predctons n one way and affect te varance of predcton error assessments n te opposte drecton. Te larger te model-constructon sample, te smaller te bas n predctons; but, te smaller te model-valdaton sample, te larger te varance of te assessment. Bot Sao (1993) and Efron and Tbsran (1997) ave consdered mprovements on naïve cross-valdaton, most of wc ave some of te features of te bootstrap. 1430

Secton on Survey Researc Metods JSM 2008 A typcal K-fold cross-valdaton for samples assumed to ave been generated drectly from a model s obtaned by parttonng te orgnal sample nto K subsamples, retanng one of te subsamples for valdatng te estmated model. Te remanng K 1 subsamples are used as model-constructon or tranng data. Normally te tranng and valdaton steps occur K tmes wt eac of te K subsamples makng a contrbuton to te valdaton average. In a Leave-oneout cross-valdaton only a sngle observaton from te orgnal sample s used to valdate te model, and te remanng observatons are te tranng data. Ts s repeated suc tat eac observaton n te sample s used once as te valdaton data. Te usual assumpton made for te valdaton sets are tat tey are ndependent from te tranng sets. Wt complex survey data, owever, wtout makng strong assumptons about te non-nformatveness of te sample desgn, te observatons are not ndependent, so t would seem tat cross-valdaton tecnques tat ave been developed for non-survey data cannot be appled n a complex survey settng. However, an nterestng property of te Rao-Wu-ue bootstrap (see Rao et al, 1992) s tat te bootstrap replcates can be uncorrelated. Samples tat are uncorrelated can be used for cross-valdaton purposes wen te metods depend on only te frst and second moments. 2.1 Cross-valdaton usng Rao-Wu-ue Bootstrap Replcates Te Rao-Wu-ue bootstrap (RWB) s now used by many survey producers, ncludng Statstcs Canada, as a useful way to obtan desgn-based varance estmates for a large number of descrptve statstcs tat estmate fnte populaton quanttes. To obtan te RWB, for a mult-stage survey, were t can be assumed tat te prmary samplng unts (psu s) are selected wt replacement, at least approxmately, te survey producer selects bootstrap replcates by selectng wtn eac of te H strata a sample of m psu s wt replacement from te n psu s n te orgnal sample. (b) Lettng z j be an ndcator varable takng te value one wen te t psu of te t stratum s selected on te jt draw for te bt replcate, we defne 1/ 2 1/ 2 H n m 1 m n b m ( ) zj 1 N 1 n 1 m 1 j 1 n 1 to be te bt bootstrap replcate estmatng te fnte populaton mean. Wen m n 1, ts smplfes to 1 N n n 1 H n n1 zj 1 1 j1. If we produce estmates gven by ) U ( b, t turns out tat under te desgn-based randomzaton, tese replcates ave means equal to zero, and tat tey are uncorrelated detals are avalable from te autors. Under a model-desgn based randomzaton framework, tese replcates also ave means equal to zero and are uncorrelated see Bnder and Roberts (2006) for detals of te modeldesgn-based randomzaton framework. Terefore, many metods n te standard lterature for cross-valdaton are applcable to bootstrap replcates wen te metods depend on only te frst and second moments. A key to ts tecnque s to defne replcate estmates tat ave mean zero. 2.2 An Alternatve Cross-valdaton Metod Based on Unsampled PSU s In eac bootstrap replcate, tere wll be some psu s tat are not ncluded n te replcate sample. Ts s smlar to te.632+ bootstrap used n non-survey settngs. We consder te propertes of estmates based on tese unsampled psu s. We let m n 1 1 ~ z n 1 ~ 1431

Secton on Survey Researc Metods JSM 2008 ~ b ( ) were z s te ndcator varable for weter te t psu n te t stratum s not n te bt bootstrap replcate. In ~ ( ) ts case, b s desgn-unbased for - detals are avalable from te autors. We refer to te frst factor on te rgt and sde of te above expresson as te adjustment factor for te full sample wegts. Propertes of ts new cross-valdaton sample need to be studed; owever, based on te example gven below, te use of suc samples for cross-valdaton purposes appears to old muc promse. Te advantage of ts metod s tat larger samples can be used as tranng sets. Ts concern n te non-survey settng s one tat led to te Leave-one-out cross-valdaton rater tan te K-fold cross-valdaton, were a sngle subsample used for one valdaton step can be qute small te sample sze beng only (1/K) of te orgnal sample sze (ence K s often lmted to 5 or 10). 3. Illustratng Cross-valdaton Tecnques In order to llustrate our tecnques, we present detals of an analyss of longtudnal ealt data drawn from Statstcs Canada s Natonal Populaton Healt Survey (NPHS) (Statstcs Canada, 1999). Te NPHS s a panel survey of selfreported ealt based on ntervews conducted bannually over more tan a decade. Te ntal sample comprsed over 17,000 respondents, wt more tan 11,000 provdng a full response n all of te sx cycles avalable to us. NPHS data fles are dssemnated wt 500 sets of bootstrap wegts (eo, et.al., 1999). Our analyss focuses on te ealt dynamcs of ndvduals as measured by te Healt Utlty Index or HUI (Feeny, et.al., 2002: see also www.ealtutltes.com/hui.tm). Te HUI provdes a descrpton of an ndvdual's overall functonal ealt usng egt attrbutes: vson, earng, speec, moblty, dexterty, cognton, emoton, and pan. Based on a standard set of questons, te HUI provdes a summary ealt score between -.360 and 1.000. For nstance, an ndvdual wo s nearsgted, yet fully ealty on te oter seven attrbutes, receves a score of 0.973. On tat scale, te most preferred ealt level (perfect ealt) s rated 1.000 and deat s rated 0.000, wle negatve scores reflect ealt states consdered worse tan deat. Healt dynamcs can be complex: perods of stablty mgt be followed by abrupt temporary canges n state (e.g., accdents) or by spells of gradual cange. In ts llustraton, we wll be concerned only wt te condtons under wc a cange may or may not occur and do not consder te subsequent magntudes of cange. A key scentfc queston s weter accountng for observatons from earler tme perods would reveal persstence or momentum/nertal effects on ealt cange. Te analyss was conducted n two pases. Te frst pase focused on model selecton by ncluson or excluson of subsets of canddate predctors. Here, cross-valdaton serves as a means of rankng models n order of ter predctve accuracy. Te second pase focused on non-lneartes n te assocaton between predctors. In ts case, crossvaldaton facltates comparson of non-nested models tat dffer n te form of non-lnear assocatons. 3.1 Populaton Healt by Age Group Cumulatve Probablty HUI Fgure 1: Emprcal HUI Dstrbuton Functons by Age Group: 10-year groups ordered youngest (black)-to-oldest (lgt grey) based on sx cycles of NPHS data. 1432

Secton on Survey Researc Metods JSM 2008 Te ealt of a majorty of cldren, as assessed by te HUI, s caracterzed by perfect or near perfect ealt. At succeedng ages, te proporton at or near perfect ealt declnes and te range of HUI over wc te remander of te populaton s dstrbuted ncreases. Tese basc facts can be seen n te emprcal dstrbutons functons n Fgure 1 wc dsplay emprcal cumulatve probablty curves versus correspondng HUI values for eac of ten 10-year age groups. In ts cart, HUI appears to provde a plausble descrpton of te affect of agng on populaton ealt. 3.2 Modelng Healt Cange of Indvduals We were nterested n modellng weter or not te HUI canges n a two-year perod. If HUI, t s referred to as Current Healt, a cange was observed for ndvdual f HUI,, t 2 HUI. t Our model of ealt cange was expressed as a logstc regresson: pr HUI HUI 1 exp X 1, t 2, t Te orgnal NPHS ouseold sample tat was n-scope for longtudnal follow-up comprsed about 17,000 respondents. We dvded te sample nto overlappng sets of responses from eac combnaton of tree consecutve cycles. Includng attrton, tere were under 50,000 suc sets of trplets. Reasonng tat te transton from perfect to less-tan-perfect ealt would requre a specal model on ts own, we cose to exclude observatons for response sets n wc HUI, t equaled 1.0. Smlarly, workng from te assumpton tat te ealt dynamcs of men and women mgt dffer n specal ways, we cose to focus ere exclusvely on men (n antcpaton of observng more canges occurrng earler n lfe). Tese two addtonal selectons reduced our workng sample to just over 14,000 sets of tree consecutve cycles. Te matrx of canddate predctors (X, t ) ncluded terms representng Immgrant Status, Presence of a Spouse, and Broad Educaton Attanment; as well as (natural) cubc splne bass functons (Haste et.al., 2001) representng nonlnear effects of Age at perod t, Current Healt, HUI, t, and Lagged Healt (gven by HUI, t-2 ). Te cubc splnes nvolve two regresson parameters eac, and eac par of bass functons requre tat tree knot locatons be specfed. In Pase 1 of our analyss, splne knot locatons for te age varable were cosen to broadly group responses nto younger, md and older age groups: postonng knots at ages 25, 50, and 75. For te HUI varables, te two upper knot locatons (0.9 and 0.5) were tose tat ave been used n te past to represent dvdng lnes between good/far eat and far/poor ealt, respectvely. Te trd HUI knot was set at 0.0, te dvdng lne between worse tan dead and better tan dead. 3.3 Regresson Estmates and Predcton Error Our logstc regresson equatons were estmated usng te SAS GENMOD procedure. Gven tat our data contaned as many as four observatons on eac respondent, we cose to estmate an odds rato, assumed to be constant over tme, to account for te assocaton between observatons from te same respondent and adopted te Alternatng Logstc Regressons varant of GEE estmaton (Carey, et.al., 1993). However, snce we ad no ntenton of usng te resultng estmates of coeffcent standard errors, GEE estmaton was not crtcal. Te cross-valdaton set-up employed ere uses te 500 sets of bootstrap wegts tat are dssemnated wt NPHS data. Eac model to be estmated and evaluated makes use of one set of bootstrap wegts at a tme. Our frst step s estmaton of a logstc regresson usng tose responses wt non-zero bootstrap wegts. Our second step uses te estmated equaton and responses from unsampled PSUs to perform an out-of-sample assessment of predcton error (for cross-valdaton purposes, te wegts used were te full sample wegts multpled by te adjustment factor descrbed n secton 2.2). We ave used two measures of predctve accuracy: Devance and mean-squared error (MSE). Tese two measures are defned n terms of te survey wegts W, te bnary dependent varable, and te probablty p(x θ) wc s predcted on te bass of covarate nformaton X and estmates of te parameters θ. Te terms, X, and W are based on te cross-valdaton replcate sample. Θ* dentfes a parameter estmate based on te b-t bootstrap replcate, t 1433

Secton on Survey Researc Metods JSM 2008 sample. Te subscrpt t denotng tme perod and te subscrpt b denotng te bootstrap replcate ave been suppressed for smplcty. Devance 2 W MSE ln W pr( X ( 1- * θ ) 1 ) ln 1 pr( X * - pr( X θ ) Efron (1978) demonstrates tat bot of tese are approprate measures of te dstance of observatons from predctons. In addton, e sows tat Devance and MSE wll be rougly proportonal (Devance 6 MSE). Tus, MSE, beng te smpler measure, s lkely suffcent for our purposes. However, Devance provdes a useful conceptual lnk to lkelood metods. Anoter lnk to more conventonal metods s provded by Akake s Informaton Crteron (AIC), wc as te followng defnton: 1 AIC 2 W ln ( 1- ) ln 2 Model df pr( X θ) 1 pr( X θ) were te frst term s twce te negatve wegted (pseudo) log-lkelood and te second term s a penalty varyng wt te number of parameters n te model. (Note tat AIC s estmated usng te full sample wtout bootstrappng or cross-valdaton.) Efron (1986) sows tat, for logstc regresson, te AIC penalty term wll approxmate te negatve bas n te full-sample estmate of Devance. Tus, expressed as error per observaton, values of AIC and cross-valdated Devance sould be of smlar magntude. 3.4 Pase 1 Results: Prelmnary Model Selecton Usng te canddate predctors dentfed n secton 3.2, 33 models were estmated (usng te full-sample and eac of te 500 bootstrap replcate samples). Tese 33 models correspond to most of te nterestng sub-sets of te canddate predctors. Fgure 2 dsplays for eac of te models te AIC, and te average value of eac of te cross-valdated Devance and MSE statstcs over te 500 replcates. Tese models are ordered n decreasng order of te AIC. Te out-of-sample crtera (Devance and MSE) are n close agreement wt te preference orderng of models provded by AIC. In only fve cases, would re-orderng by cross-valdated Devance result n an excange of postons between adjacent models. A correspondng re-orderng of seven negbours would result f re-orderng were based on cross-valdated MSE. Tus, te crtera appear to be largely mutually consstent. 2 * θ ) Average Error Pase 1 Cross-Valdaton of 33 Selected Models: Akake's Informaton Crteron, Devance, & MSE 1.25 1.15 1.05 Intercept Only Current Healt Only Current Healt, Age, Spouse, & Educaton 0.95 AIC Devance 6 x MSE Fgure 2: Pase 1 Cross-Valdaton 33 models ordered by decreasng AIC Tere s only one pont were a marked reducton n predcton error s evdent. Te large jump seen n te cart dstnguses models tat do not contan Current Healt terms (to te left of te jump) from models tat contan Current Healt terms (to te rgt of te jump). Evdently, Current Healt terms are crucal to a good model. Dfferences among succeedng models all contanng Current Healt appear small. Te reducton n MSE obtaned by addng Current Healt s an order of magntude greater tan te ncremental reducton n MSE provded 1434

Secton on Survey Researc Metods JSM 2008 by te over-all best fttng model. Gven te relatvely small mprovement n predctve power, t s approprate to queston weter te best fttng model as been establsed on a robust bass. 3.5 Confrmng Pase 1 Model Selecton Anoter common model selecton strategy s Backward Elmnaton. Here we estmated te model parameters usng te full sample, and we estmated te standard errors from te frst 250 bootstrap samples. At eac stage of te elmnaton we dropped te least sgnfcant estmated regresson coeffcent, and we contnued untl of all remanng terms ad a p-value less tan 0.06. Ts resulted n elmnatng te Lagged Healt terms and te Immgrant ndcator - all of te remanng terms were judged sgnfcant. To confrm te Backward Elmnaton results, te jont sgnfcance of te tree terms tat ad been elmnated was assessed. An ndependent bootstrap estmate of te covarance matrx of te terms n te full model was obtaned usng te second 250 bootstrap samples. A Wald test was performed on te two Lagged Healt splne coeffcents and on te Immgrant/Non-Immgraton Indcator (Fgure 3). Agan, te tree terms appeared jontly nsgnfcant. P-Value Coeffcent P-Values Bootstrap Backward Elmnaton 1.000 0.800 0.600 0.400 0.200 0.000 Fgure 3: Model Selecton by Backward Elmnaton Current Healt 2 Spouse Educaton 2 Age 2 Educaton 1 Age 1 Current Healt 1 Immgrant Lagged Healt 2 Lagged Healt 1 Wald test on 3 dropped varables C-square 1.57 p-value 0.67 All Vars less Lagged Healt 1 less Lagged Healt 2 less Immgrant We see, terefore, tat Cross-valdaton and Backward Elmnaton dentfy te same best model among tose examned n Pase 1: among te 33 models consdered, te best predctons were obtaned wen te two Lagged Healt terms and te Immgrant/Non-Immgraton Indcator were dropped, wle Age, Current Healt, Spouse and Educaton terms were retaned. 3.6 Pase 2 Results: Addtonal Varables and Emprcal Placement of Splne Knots Te presence or absence of te Lagged Healt terms n te preferred equaton specfcaton as scentfc sgnfcance. Te elmnaton of Lagged Healt from te preferred model mples lttle or no nerta n te process of ealt cange. Our concern tat te role of Lagged Healt ad not been adequately assessed led to te 2nd Pase of te analyss. Exploratory work nvolvng grapcal dsplays of resduals led to te observaton tat tose wo ad been n perfect ealt n te prevous perod (HUI,t-2 equal to 1.0) seemed to ave, all else beng equal, markedly dfferent cances of HUI cange tan oters (recall tat no men n ts sample are currently n perfect ealt). Correspondngly, an ndcator of perfect lagged ealt was added to te set of canddate predctors. Tere were, addtonally, concerns about te specfcaton of mmgrant effects, because mmgrants are generally selected for good ealt at te tme of mmgraton. Tese concerns were addressed by addng age-at-mmgraton terms n te form of 2-parameter splnes. As a fnal step, Pase 2 provded an opportunty to explore alternatve knot placements and ence a more flexble nonlnear response specfcaton (recall tat knot placements used n Pase 1 were taken as gven and were dentfed by an appeal to ntuton). In Fgure 4 we dsplay results for egt models: two Pase 1 models (.e., tat ncludng one Current Healt term only and te best among Pase 1 models wc ncluded only Age, Current Healt, Spouse and Educaton terms), and sx Pase 2 models wc nclude varous combnatons nvolvng extended mmgrant (I+) and Lagged Healt (L+) specfcatons. Estmaton of eac Pase 2 model also nvolved a random searc for mproved knot placements. (Te random searc was performed wt an ad oc SAS macro tat randomly perturbed knot locatons followed by repeated 1435

Secton on Survey Researc Metods JSM 2008 calls to PROC GENMOD.) Te frst two ponts on te x-axs gve te results for te two Pase 1 models; te remanng sx ponts are for te Pase 2 model sorted n decreasng order of te AIC (Current-Healt-only wt mproved knot placement beng te one wt gest AIC of te Pase 2 models). Average Error 1.000 0.975 0.950 Pase 2 Cross-Valdaton of 8 Selected Models: Akake's Informaton Crteron, Devance, & MSE Pase 1 Mnmum Current Healt Only Current + Lagged Healt, Age, Immgrant, Spouse, & Educaton AIC Devance 6 x MSE Fgure 4: Pase 2 Cross-Valdaton Two Pase 1 models compared wt egt Pase 2 models Te best fttng Pase 2 model based on te AIC crteron contans all terms; Immgrant and Lagged Healt terms ncluded; owever, te best fttng Pase 2 model based on bot te MSE and Devance crtera stll excludes Immgrant terms, but not Lagged Healt terms. Te addton of new varables and te searc for more approprate knot locatons led to defnte, but modest mprovements n te accuracy of te best model. Predcton error s stll not tat muc smaller tan a model wt Current Healt only and wt knot placement based on ntuton. Wen we compared te functonal form of predctons produced n Pase 1 and te best of te Pase 2 models, marked dfferences were revealed. Fgure 5 sows te estmated contrbuton of Current and Lagged Healt to te odds-rato for cange n ealt, based on te full Pase 1 model and on te best of te Pase 2 models. Fgure 5: Comparng Pase 1 and 2 Ftted Values exp( Healt Splne ) Ftted Healt Cange Odds Ratos by Healt Status 5 4 3 2 1 0-0.3-0.1 0.1 0.3 0.5 0.7 0.9 HUI Score Current Healt: Pase 1 Current Healt: Pase 2 Lagged Healt: Pase 1 Lagged Healt: Pase 2 Te Pase 2 knot placement searc as uncovered dramatcally greater curvature n te ftted odds ratos by HUI (current and lagged) tan were found n te correspondng model from Pase 1. Rater tan beng nsgnfcant, Lagged Healt plays a key role n accurately representng te ealt dynamcs of men n less tan perfect (current) ealt. Furter dggng sowed tat te g degree of non-lnearty represented n te Lagged Healt terms may result from two dfferent types of ealt dynamcs beng confounded n ts model. One type s caracterzed by progressve cange n ealt status wt moderate nerta and te oter type s caracterzed by onset-recovery sequences tat apply only to tose at or near perfect ealt. Te latter could arse from accdental njures tat lead to a complete recovery; a penomenon tat was demonstrated by te observaton tat about 40% of tose wt perfect ealt at t-2 ad perfect ealt at t+2 regardless of wat ter ealt status was at t. We ad assumed tat transtons drectly from perfect ealt were specal. It also appears are transtons from recently perfect ealt are specal. Peraps separate models would be more approprate n order to dfferentate transtons odds for tose wt perfect lagged ealt from tose wt less tan perfect lagged ealt. And so, we sould probably conclude tat our model s stll nadequate and tat no sngle model s lkely to be able to encompass bot types of ealt dynamc smultaneously. 1436

Secton on Survey Researc Metods JSM 2008 4. Conclusons Our llustraton of bootstrap/cross-valdaton metods represents an exploratory approac to data analyss. Ts s an approac tat can be gly effectve n uncoverng nadvertent effects of smplstc modelng; but tat, wtout care, also runs a g rsk of over-fttng and over-optmstc evaluaton. Moreover, te greater te extent of nteracton wt te data durng te model selecton pase of analyss, te less vald conventonal (uncondtonal) sgnfcance tests wll be. Cross-valdaton s a useful tool n te assessment of alternatve exploratory models and deserves wder use by te analytcal communty. In our llustraton of te tecnques, cross-valdaton made a deep exploraton of a key scentfc queston nvolvng te dynamcs of ealt relatvely easy, were conventonal approaces would ave requred tecncal vrtuosty and/or greater expendture of tme and effort. We ave demonstrated tat cross-valdaton tecnques may be put n a desgn-based settng. In tat settng, we can expect, usng cross-valdatory tecnques, to dentfy preferred models tat are smlar, but not necessarly dentcal, to tose tat mgt be dentfed usng conventonal nferental procedures. Gven te ncreasng avalablty of sets of bootstrap wegts to ad users accountng for complex survey desgns; we would encourage furter researc nto use of combned bootstrap/cross-valdaton tecnques. In our vew, a promsng drecton for furter researc may be te use of some bootstrap replcate samples for Tranng (tral model estmaton) wt smultaneous use of te unsampled PSU s used for Valdaton (model selecton), wle reservng some bootstrap replcate samples for Testng (fnal model assessment). References Bnder, Davd and Roberts, Georga (2006), Approaces for Analyzng Survey Data: a Dscusson, 2006 Jont Statstcal Meetngs Secton on Survey Researc Metods, 2771-2778. Carey, V., Zeger, S.L., and Dggle, P. (1993), Modellng Multvarate Bnary Data wt Alternatng Logstc Regressons, Bometrka, 80, 517-526. Efron, Bradley (1978). Regresson and ANOVA wt Zero-One Data: Measures of Resdual Varaton? Journal of te Amercan Statstcal Assocaton, 73, pp.113-121. Efron, Bradley (1986). How Based Is te Apparent Error Rate of a Predcton Rule? Journal of te Amercan Statstcal Assocaton, 81, 461-470. Efron, Bradley and Tbsran, Robert, (1997). Improvements on Cross-Valdaton: Te.632+ Bootstrap Metod, Journal of te Amercan Statstcal Assocaton, 92, 548-560. Feeny,D., Furlong,W., Torrance, G., Goldsmt, C.H., Zu, Z., DePauw, S., Denton, M., and Boyle, M. (2002). Mult-attrbute and sngle-attrbute utlty functons for te Healt Utltes Index Mark 3 system. Medcal Care, 40(2), 113-128. Haste, Trevor, Tbsran, Robert and Fredman, Jerome (2001). Te Elements of Statstcal Learnng: Data Mnng, Inference, and Predcton. Sprnger-Verlag, New ork. Pcard, R.R. and Cook, R.D. (1984). Cross-Valdaton of Regresson Models, Journal of te Amercan Statstcal Assocaton, 79, 575-583. Rao, J.N.K., Wu, C.F.J. and UE, K. (1992). Some resamplng metods for complex surveys, Survey Metodology, 18, 209-217. Sao, Jun (1993). Lnear Model Selecton by Cross-Valdaton, Journal of te Amercan Statstcal Assocaton, 88, 486-494. Statstcs Canada (1999). Informaton about te Natonal Populaton Healt Survey, Catalogue No. 82F0068XIE, www.statcan.ca/engls/concepts/nps/ndex.tm eo, Douglas, Mantel, Harold and Lu, Tzen-Png (1999). Bootstrap Varance Estmaton for te Natonal Populaton Healt Survey, 1999 Jont Statstcal Meetngs Secton on Survey Researc Metods, 778-783. 1437