Can Survey Bootstrap Replicates Be Used for Cross-Validation?

Secton on Survey Researc Metods JSM 2008 Can Survey Bootstrap Replcates Be Used for Cross-Valdaton? Geoff Rowe 1 and Davd Bnder 2 1 Geoff Rowe, Statstcs Canada, Tunney's Pasture, Ottawa, ON, K1A 0T6, Canada; geoff.rowe@statcan.gc.ca 2 Davd Bnder, Statstcs Canada, Tunney's Pasture, Ottawa, ON, K1A 0T6, Canada; dbnder49@otmal.com Abstract We propose an extenson to bootstrap metods for evaluatng regresson models estmated wt data from surveys wt complex desgn. Suc metods nvolve selecton of replcate samples formed from smple random samples of sampled clusters wtn strata. Selecton s carred out wt replacement, so tat about one trd of clusters are typcally left out of a gven replcate sample. Our evaluaton metod explots te excluded clusters, usng tem as cross-valdaton samples for assessment of a model s predcton error, and at te same tme usng te bootstrap samples to estmate te varance of regresson coeffcents. We also consder te use of a sample of te replcates as a cross-valdaton sample. Key Words: Complex surveys,.632+ Bootstrap, Healt Utlty Index 1. Introducton As s well known, regresson resduals wll gve an overly optmstc vew of te predctve value of an equaton (Efron, 1986). It s also known tat model-specfcaton searces tat consst smply of elmnatng all of te non-sgnfcant terms from a tral specfcaton can result n a selected equaton wt nferor predctve value (Haste, Tbsran and Fredman, 2001). Smply retanng all terms tat ave an ntutve appeal (weter sgnfcant or not) can also result n an equaton wt nferor predctve value. Cross-valdaton metods attempt to drectly facltate te searc for specfcatons tat wll produce accurate predctons. In ts paper, we extend te scope of cross-valdaton metods to data from surveys wt complex desgn. Te paper s n two parts. Followng te ntroducton, Secton 2 outlnes desgn-based propertes of te bootstrap/crossvaldaton and establses te valdty of metods utlzng replcate samples wen tose metods depend only on frst and second moments. Secton 3 llustrates our metod wt a comparatve assessment of selected models of ealt dynamcs usng Statstcs Canada s longtudnal Natonal Populaton Healt Survey (1992-2004). Secton 4 provdes concludng comments. 2. Cross-valdaton appled to survey samples Te term cross-valdaton generally refers to tecnques tat drectly assess predcton error of a ftted equaton by splttng te avalable sample and usng one part to ft te equaton (model constructon) and reservng te oter part for an assessment of predctons (model valdaton) (Pcard and Cook, 1984). Model selecton by cross-valdaton conssts of proposng and fttng alternatve models, assessng te out-of-sample predcton error of eac, and coosng te one wt te smallest predcton error. In practce, some care needs to be exercsed n applyng te cross-valdaton metod. Ts s because te sze of te sample used n model constructon wll affect te bas n predctons n one way and affect te varance of predcton error assessments n te opposte drecton. Te larger te model-constructon sample, te smaller te bas n predctons; but, te smaller te model-valdaton sample, te larger te varance of te assessment. Bot Sao (1993) and Efron and Tbsran (1997) ave consdered mprovements on naïve cross-valdaton, most of wc ave some of te features of te bootstrap. 1430

Secton on Survey Researc Metods JSM 2008 A typcal K-fold cross-valdaton for samples assumed to ave been generated drectly from a model s obtaned by parttonng te orgnal sample nto K subsamples, retanng one of te subsamples for valdatng te estmated model. Te remanng K 1 subsamples are used as model-constructon or tranng data. Normally te tranng and valdaton steps occur K tmes wt eac of te K subsamples makng a contrbuton to te valdaton average. In a Leave-oneout cross-valdaton only a sngle observaton from te orgnal sample s used to valdate te model, and te remanng observatons are te tranng data. Ts s repeated suc tat eac observaton n te sample s used once as te valdaton data. Te usual assumpton made for te valdaton sets are tat tey are ndependent from te tranng sets. Wt complex survey data, owever, wtout makng strong assumptons about te non-nformatveness of te sample desgn, te observatons are not ndependent, so t would seem tat cross-valdaton tecnques tat ave been developed for non-survey data cannot be appled n a complex survey settng. However, an nterestng property of te Rao-Wu-ue bootstrap (see Rao et al, 1992) s tat te bootstrap replcates can be uncorrelated. Samples tat are uncorrelated can be used for cross-valdaton purposes wen te metods depend on only te frst and second moments. 2.1 Cross-valdaton usng Rao-Wu-ue Bootstrap Replcates Te Rao-Wu-ue bootstrap (RWB) s now used by many survey producers, ncludng Statstcs Canada, as a useful way to obtan desgn-based varance estmates for a large number of descrptve statstcs tat estmate fnte populaton quanttes. To obtan te RWB, for a mult-stage survey, were t can be assumed tat te prmary samplng unts (psu s) are selected wt replacement, at least approxmately, te survey producer selects bootstrap replcates by selectng wtn eac of te H strata a sample of m psu s wt replacement from te n psu s n te orgnal sample. (b) Lettng z j be an ndcator varable takng te value one wen te t psu of te t stratum s selected on te jt draw for te bt replcate, we defne 1/ 2 1/ 2 H n m 1 m n b m ( ) zj 1 N 1 n 1 m 1 j 1 n 1 to be te bt bootstrap replcate estmatng te fnte populaton mean. Wen m n 1, ts smplfes to 1 N n n 1 H n n1 zj 1 1 j1. If we produce estmates gven by ) U ( b, t turns out tat under te desgn-based randomzaton, tese replcates ave means equal to zero, and tat tey are uncorrelated detals are avalable from te autors. Under a model-desgn based randomzaton framework, tese replcates also ave means equal to zero and are uncorrelated see Bnder and Roberts (2006) for detals of te modeldesgn-based randomzaton framework. Terefore, many metods n te standard lterature for cross-valdaton are applcable to bootstrap replcates wen te metods depend on only te frst and second moments. A key to ts tecnque s to defne replcate estmates tat ave mean zero. 2.2 An Alternatve Cross-valdaton Metod Based on Unsampled PSU s In eac bootstrap replcate, tere wll be some psu s tat are not ncluded n te replcate sample. Ts s smlar to te.632+ bootstrap used n non-survey settngs. We consder te propertes of estmates based on tese unsampled psu s. We let m n 1 1 ~ z n 1 ~ 1431

Secton on Survey Researc Metods JSM 2008 ~ b ( ) were z s te ndcator varable for weter te t psu n te t stratum s not n te bt bootstrap replcate. In ~ ( ) ts case, b s desgn-unbased for - detals are avalable from te autors. We refer to te frst factor on te rgt and sde of te above expresson as te adjustment factor for te full sample wegts. Propertes of ts new cross-valdaton sample need to be studed; owever, based on te example gven below, te use of suc samples for cross-valdaton purposes appears to old muc promse. Te advantage of ts metod s tat larger samples can be used as tranng sets. Ts concern n te non-survey settng s one tat led to te Leave-one-out cross-valdaton rater tan te K-fold cross-valdaton, were a sngle subsample used for one valdaton step can be qute small te sample sze beng only (1/K) of te orgnal sample sze (ence K s often lmted to 5 or 10). 3. Illustratng Cross-valdaton Tecnques In order to llustrate our tecnques, we present detals of an analyss of longtudnal ealt data drawn from Statstcs Canada s Natonal Populaton Healt Survey (NPHS) (Statstcs Canada, 1999). Te NPHS s a panel survey of selfreported ealt based on ntervews conducted bannually over more tan a decade. Te ntal sample comprsed over 17,000 respondents, wt more tan 11,000 provdng a full response n all of te sx cycles avalable to us. NPHS data fles are dssemnated wt 500 sets of bootstrap wegts (eo, et.al., 1999). Our analyss focuses on te ealt dynamcs of ndvduals as measured by te Healt Utlty Index or HUI (Feeny, et.al., 2002: see also www.ealtutltes.com/hui.tm). Te HUI provdes a descrpton of an ndvdual's overall functonal ealt usng egt attrbutes: vson, earng, speec, moblty, dexterty, cognton, emoton, and pan. Based on a standard set of questons, te HUI provdes a summary ealt score between -.360 and 1.000. For nstance, an ndvdual wo s nearsgted, yet fully ealty on te oter seven attrbutes, receves a score of 0.973. On tat scale, te most preferred ealt level (perfect ealt) s rated 1.000 and deat s rated 0.000, wle negatve scores reflect ealt states consdered worse tan deat. Healt dynamcs can be complex: perods of stablty mgt be followed by abrupt temporary canges n state (e.g., accdents) or by spells of gradual cange. In ts llustraton, we wll be concerned only wt te condtons under wc a cange may or may not occur and do not consder te subsequent magntudes of cange. A key scentfc queston s weter accountng for observatons from earler tme perods would reveal persstence or momentum/nertal effects on ealt cange. Te analyss was conducted n two pases. Te frst pase focused on model selecton by ncluson or excluson of subsets of canddate predctors. Here, cross-valdaton serves as a means of rankng models n order of ter predctve accuracy. Te second pase focused on non-lneartes n te assocaton between predctors. In ts case, crossvaldaton facltates comparson of non-nested models tat dffer n te form of non-lnear assocatons. 3.1 Populaton Healt by Age Group Cumulatve Probablty HUI Fgure 1: Emprcal HUI Dstrbuton Functons by Age Group: 10-year groups ordered youngest (black)-to-oldest (lgt grey) based on sx cycles of NPHS data. 1432

Secton on Survey Researc Metods JSM 2008 Te ealt of a majorty of cldren, as assessed by te HUI, s caracterzed by perfect or near perfect ealt. At succeedng ages, te proporton at or near perfect ealt declnes and te range of HUI over wc te remander of te populaton s dstrbuted ncreases. Tese basc facts can be seen n te emprcal dstrbutons functons n Fgure 1 wc dsplay emprcal cumulatve probablty curves versus correspondng HUI values for eac of ten 10-year age groups. In ts cart, HUI appears to provde a plausble descrpton of te affect of agng on populaton ealt. 3.2 Modelng Healt Cange of Indvduals We were nterested n modellng weter or not te HUI canges n a two-year perod. If HUI, t s referred to as Current Healt, a cange was observed for ndvdual f HUI,, t 2 HUI. t Our model of ealt cange was expressed as a logstc regresson: pr HUI HUI 1 exp X 1, t 2, t Te orgnal NPHS ouseold sample tat was n-scope for longtudnal follow-up comprsed about 17,000 respondents. We dvded te sample nto overlappng sets of responses from eac combnaton of tree consecutve cycles. Includng attrton, tere were under 50,000 suc sets of trplets. Reasonng tat te transton from perfect to less-tan-perfect ealt would requre a specal model on ts own, we cose to exclude observatons for response sets n wc HUI, t equaled 1.0. Smlarly, workng from te assumpton tat te ealt dynamcs of men and women mgt dffer n specal ways, we cose to focus ere exclusvely on men (n antcpaton of observng more canges occurrng earler n lfe). Tese two addtonal selectons reduced our workng sample to just over 14,000 sets of tree consecutve cycles. Te matrx of canddate predctors (X, t ) ncluded terms representng Immgrant Status, Presence of a Spouse, and Broad Educaton Attanment; as well as (natural) cubc splne bass functons (Haste et.al., 2001) representng nonlnear effects of Age at perod t, Current Healt, HUI, t, and Lagged Healt (gven by HUI, t-2 ). Te cubc splnes nvolve two regresson parameters eac, and eac par of bass functons requre tat tree knot locatons be specfed. In Pase 1 of our analyss, splne knot locatons for te age varable were cosen to broadly group responses nto younger, md and older age groups: postonng knots at ages 25, 50, and 75. For te HUI varables, te two upper knot locatons (0.9 and 0.5) were tose tat ave been used n te past to represent dvdng lnes between good/far eat and far/poor ealt, respectvely. Te trd HUI knot was set at 0.0, te dvdng lne between worse tan dead and better tan dead. 3.3 Regresson Estmates and Predcton Error Our logstc regresson equatons were estmated usng te SAS GENMOD procedure. Gven tat our data contaned as many as four observatons on eac respondent, we cose to estmate an odds rato, assumed to be constant over tme, to account for te assocaton between observatons from te same respondent and adopted te Alternatng Logstc Regressons varant of GEE estmaton (Carey, et.al., 1993). However, snce we ad no ntenton of usng te resultng estmates of coeffcent standard errors, GEE estmaton was not crtcal. Te cross-valdaton set-up employed ere uses te 500 sets of bootstrap wegts tat are dssemnated wt NPHS data. Eac model to be estmated and evaluated makes use of one set of bootstrap wegts at a tme. Our frst step s estmaton of a logstc regresson usng tose responses wt non-zero bootstrap wegts. Our second step uses te estmated equaton and responses from unsampled PSUs to perform an out-of-sample assessment of predcton error (for cross-valdaton purposes, te wegts used were te full sample wegts multpled by te adjustment factor descrbed n secton 2.2). We ave used two measures of predctve accuracy: Devance and mean-squared error (MSE). Tese two measures are defned n terms of te survey wegts W, te bnary dependent varable, and te probablty p(x θ) wc s predcted on te bass of covarate nformaton X and estmates of te parameters θ. Te terms, X, and W are based on te cross-valdaton replcate sample. Θ* dentfes a parameter estmate based on te b-t bootstrap replcate, t 1433

Secton on Survey Researc Metods JSM 2008 sample. Te subscrpt t denotng tme perod and te subscrpt b denotng te bootstrap replcate ave been suppressed for smplcty. Devance 2 W MSE ln W pr( X ( 1- * θ ) 1 ) ln 1 pr( X * - pr( X θ ) Efron (1978) demonstrates tat bot of tese are approprate measures of te dstance of observatons from predctons. In addton, e sows tat Devance and MSE wll be rougly proportonal (Devance 6 MSE). Tus, MSE, beng te smpler measure, s lkely suffcent for our purposes. However, Devance provdes a useful conceptual lnk to lkelood metods. Anoter lnk to more conventonal metods s provded by Akake s Informaton Crteron (AIC), wc as te followng defnton: 1 AIC 2 W ln ( 1- ) ln 2 Model df pr( X θ) 1 pr( X θ) were te frst term s twce te negatve wegted (pseudo) log-lkelood and te second term s a penalty varyng wt te number of parameters n te model. (Note tat AIC s estmated usng te full sample wtout bootstrappng or cross-valdaton.) Efron (1986) sows tat, for logstc regresson, te AIC penalty term wll approxmate te negatve bas n te full-sample estmate of Devance. Tus, expressed as error per observaton, values of AIC and cross-valdated Devance sould be of smlar magntude. 3.4 Pase 1 Results: Prelmnary Model Selecton Usng te canddate predctors dentfed n secton 3.2, 33 models were estmated (usng te full-sample and eac of te 500 bootstrap replcate samples). Tese 33 models correspond to most of te nterestng sub-sets of te canddate predctors. Fgure 2 dsplays for eac of te models te AIC, and te average value of eac of te cross-valdated Devance and MSE statstcs over te 500 replcates. Tese models are ordered n decreasng order of te AIC. Te out-of-sample crtera (Devance and MSE) are n close agreement wt te preference orderng of models provded by AIC. In only fve cases, would re-orderng by cross-valdated Devance result n an excange of postons between adjacent models. A correspondng re-orderng of seven negbours would result f re-orderng were based on cross-valdated MSE. Tus, te crtera appear to be largely mutually consstent. 2 * θ ) Average Error Pase 1 Cross-Valdaton of 33 Selected Models: Akake's Informaton Crteron, Devance, & MSE 1.25 1.15 1.05 Intercept Only Current Healt Only Current Healt, Age, Spouse, & Educaton 0.95 AIC Devance 6 x MSE Fgure 2: Pase 1 Cross-Valdaton 33 models ordered by decreasng AIC Tere s only one pont were a marked reducton n predcton error s evdent. Te large jump seen n te cart dstnguses models tat do not contan Current Healt terms (to te left of te jump) from models tat contan Current Healt terms (to te rgt of te jump). Evdently, Current Healt terms are crucal to a good model. Dfferences among succeedng models all contanng Current Healt appear small. Te reducton n MSE obtaned by addng Current Healt s an order of magntude greater tan te ncremental reducton n MSE provded 1434

Secton on Survey Researc Metods JSM 2008 by te over-all best fttng model. Gven te relatvely small mprovement n predctve power, t s approprate to queston weter te best fttng model as been establsed on a robust bass. 3.5 Confrmng Pase 1 Model Selecton Anoter common model selecton strategy s Backward Elmnaton. Here we estmated te model parameters usng te full sample, and we estmated te standard errors from te frst 250 bootstrap samples. At eac stage of te elmnaton we dropped te least sgnfcant estmated regresson coeffcent, and we contnued untl of all remanng terms ad a p-value less tan 0.06. Ts resulted n elmnatng te Lagged Healt terms and te Immgrant ndcator - all of te remanng terms were judged sgnfcant. To confrm te Backward Elmnaton results, te jont sgnfcance of te tree terms tat ad been elmnated was assessed. An ndependent bootstrap estmate of te covarance matrx of te terms n te full model was obtaned usng te second 250 bootstrap samples. A Wald test was performed on te two Lagged Healt splne coeffcents and on te Immgrant/Non-Immgraton Indcator (Fgure 3). Agan, te tree terms appeared jontly nsgnfcant. P-Value Coeffcent P-Values Bootstrap Backward Elmnaton 1.000 0.800 0.600 0.400 0.200 0.000 Fgure 3: Model Selecton by Backward Elmnaton Current Healt 2 Spouse Educaton 2 Age 2 Educaton 1 Age 1 Current Healt 1 Immgrant Lagged Healt 2 Lagged Healt 1 Wald test on 3 dropped varables C-square 1.57 p-value 0.67 All Vars less Lagged Healt 1 less Lagged Healt 2 less Immgrant We see, terefore, tat Cross-valdaton and Backward Elmnaton dentfy te same best model among tose examned n Pase 1: among te 33 models consdered, te best predctons were obtaned wen te two Lagged Healt terms and te Immgrant/Non-Immgraton Indcator were dropped, wle Age, Current Healt, Spouse and Educaton terms were retaned. 3.6 Pase 2 Results: Addtonal Varables and Emprcal Placement of Splne Knots Te presence or absence of te Lagged Healt terms n te preferred equaton specfcaton as scentfc sgnfcance. Te elmnaton of Lagged Healt from te preferred model mples lttle or no nerta n te process of ealt cange. Our concern tat te role of Lagged Healt ad not been adequately assessed led to te 2nd Pase of te analyss. Exploratory work nvolvng grapcal dsplays of resduals led to te observaton tat tose wo ad been n perfect ealt n te prevous perod (HUI,t-2 equal to 1.0) seemed to ave, all else beng equal, markedly dfferent cances of HUI cange tan oters (recall tat no men n ts sample are currently n perfect ealt). Correspondngly, an ndcator of perfect lagged ealt was added to te set of canddate predctors. Tere were, addtonally, concerns about te specfcaton of mmgrant effects, because mmgrants are generally selected for good ealt at te tme of mmgraton. Tese concerns were addressed by addng age-at-mmgraton terms n te form of 2-parameter splnes. As a fnal step, Pase 2 provded an opportunty to explore alternatve knot placements and ence a more flexble nonlnear response specfcaton (recall tat knot placements used n Pase 1 were taken as gven and were dentfed by an appeal to ntuton). In Fgure 4 we dsplay results for egt models: two Pase 1 models (.e., tat ncludng one Current Healt term only and te best among Pase 1 models wc ncluded only Age, Current Healt, Spouse and Educaton terms), and sx Pase 2 models wc nclude varous combnatons nvolvng extended mmgrant (I+) and Lagged Healt (L+) specfcatons. Estmaton of eac Pase 2 model also nvolved a random searc for mproved knot placements. (Te random searc was performed wt an ad oc SAS macro tat randomly perturbed knot locatons followed by repeated 1435

Secton on Survey Researc Metods JSM 2008 calls to PROC GENMOD.) Te frst two ponts on te x-axs gve te results for te two Pase 1 models; te remanng sx ponts are for te Pase 2 model sorted n decreasng order of te AIC (Current-Healt-only wt mproved knot placement beng te one wt gest AIC of te Pase 2 models). Average Error 1.000 0.975 0.950 Pase 2 Cross-Valdaton of 8 Selected Models: Akake's Informaton Crteron, Devance, & MSE Pase 1 Mnmum Current Healt Only Current + Lagged Healt, Age, Immgrant, Spouse, & Educaton AIC Devance 6 x MSE Fgure 4: Pase 2 Cross-Valdaton Two Pase 1 models compared wt egt Pase 2 models Te best fttng Pase 2 model based on te AIC crteron contans all terms; Immgrant and Lagged Healt terms ncluded; owever, te best fttng Pase 2 model based on bot te MSE and Devance crtera stll excludes Immgrant terms, but not Lagged Healt terms. Te addton of new varables and te searc for more approprate knot locatons led to defnte, but modest mprovements n te accuracy of te best model. Predcton error s stll not tat muc smaller tan a model wt Current Healt only and wt knot placement based on ntuton. Wen we compared te functonal form of predctons produced n Pase 1 and te best of te Pase 2 models, marked dfferences were revealed. Fgure 5 sows te estmated contrbuton of Current and Lagged Healt to te odds-rato for cange n ealt, based on te full Pase 1 model and on te best of te Pase 2 models. Fgure 5: Comparng Pase 1 and 2 Ftted Values exp( Healt Splne ) Ftted Healt Cange Odds Ratos by Healt Status 5 4 3 2 1 0-0.3-0.1 0.1 0.3 0.5 0.7 0.9 HUI Score Current Healt: Pase 1 Current Healt: Pase 2 Lagged Healt: Pase 1 Lagged Healt: Pase 2 Te Pase 2 knot placement searc as uncovered dramatcally greater curvature n te ftted odds ratos by HUI (current and lagged) tan were found n te correspondng model from Pase 1. Rater tan beng nsgnfcant, Lagged Healt plays a key role n accurately representng te ealt dynamcs of men n less tan perfect (current) ealt. Furter dggng sowed tat te g degree of non-lnearty represented n te Lagged Healt terms may result from two dfferent types of ealt dynamcs beng confounded n ts model. One type s caracterzed by progressve cange n ealt status wt moderate nerta and te oter type s caracterzed by onset-recovery sequences tat apply only to tose at or near perfect ealt. Te latter could arse from accdental njures tat lead to a complete recovery; a penomenon tat was demonstrated by te observaton tat about 40% of tose wt perfect ealt at t-2 ad perfect ealt at t+2 regardless of wat ter ealt status was at t. We ad assumed tat transtons drectly from perfect ealt were specal. It also appears are transtons from recently perfect ealt are specal. Peraps separate models would be more approprate n order to dfferentate transtons odds for tose wt perfect lagged ealt from tose wt less tan perfect lagged ealt. And so, we sould probably conclude tat our model s stll nadequate and tat no sngle model s lkely to be able to encompass bot types of ealt dynamc smultaneously. 1436

Secton on Survey Researc Metods JSM 2008 4. Conclusons Our llustraton of bootstrap/cross-valdaton metods represents an exploratory approac to data analyss. Ts s an approac tat can be gly effectve n uncoverng nadvertent effects of smplstc modelng; but tat, wtout care, also runs a g rsk of over-fttng and over-optmstc evaluaton. Moreover, te greater te extent of nteracton wt te data durng te model selecton pase of analyss, te less vald conventonal (uncondtonal) sgnfcance tests wll be. Cross-valdaton s a useful tool n te assessment of alternatve exploratory models and deserves wder use by te analytcal communty. In our llustraton of te tecnques, cross-valdaton made a deep exploraton of a key scentfc queston nvolvng te dynamcs of ealt relatvely easy, were conventonal approaces would ave requred tecncal vrtuosty and/or greater expendture of tme and effort. We ave demonstrated tat cross-valdaton tecnques may be put n a desgn-based settng. In tat settng, we can expect, usng cross-valdatory tecnques, to dentfy preferred models tat are smlar, but not necessarly dentcal, to tose tat mgt be dentfed usng conventonal nferental procedures. Gven te ncreasng avalablty of sets of bootstrap wegts to ad users accountng for complex survey desgns; we would encourage furter researc nto use of combned bootstrap/cross-valdaton tecnques. In our vew, a promsng drecton for furter researc may be te use of some bootstrap replcate samples for Tranng (tral model estmaton) wt smultaneous use of te unsampled PSU s used for Valdaton (model selecton), wle reservng some bootstrap replcate samples for Testng (fnal model assessment). References Bnder, Davd and Roberts, Georga (2006), Approaces for Analyzng Survey Data: a Dscusson, 2006 Jont Statstcal Meetngs Secton on Survey Researc Metods, 2771-2778. Carey, V., Zeger, S.L., and Dggle, P. (1993), Modellng Multvarate Bnary Data wt Alternatng Logstc Regressons, Bometrka, 80, 517-526. Efron, Bradley (1978). Regresson and ANOVA wt Zero-One Data: Measures of Resdual Varaton? Journal of te Amercan Statstcal Assocaton, 73, pp.113-121. Efron, Bradley (1986). How Based Is te Apparent Error Rate of a Predcton Rule? Journal of te Amercan Statstcal Assocaton, 81, 461-470. Efron, Bradley and Tbsran, Robert, (1997). Improvements on Cross-Valdaton: Te.632+ Bootstrap Metod, Journal of te Amercan Statstcal Assocaton, 92, 548-560. Feeny,D., Furlong,W., Torrance, G., Goldsmt, C.H., Zu, Z., DePauw, S., Denton, M., and Boyle, M. (2002). Mult-attrbute and sngle-attrbute utlty functons for te Healt Utltes Index Mark 3 system. Medcal Care, 40(2), 113-128. Haste, Trevor, Tbsran, Robert and Fredman, Jerome (2001). Te Elements of Statstcal Learnng: Data Mnng, Inference, and Predcton. Sprnger-Verlag, New ork. Pcard, R.R. and Cook, R.D. (1984). Cross-Valdaton of Regresson Models, Journal of te Amercan Statstcal Assocaton, 79, 575-583. Rao, J.N.K., Wu, C.F.J. and UE, K. (1992). Some resamplng metods for complex surveys, Survey Metodology, 18, 209-217. Sao, Jun (1993). Lnear Model Selecton by Cross-Valdaton, Journal of te Amercan Statstcal Assocaton, 88, 486-494. Statstcs Canada (1999). Informaton about te Natonal Populaton Healt Survey, Catalogue No. 82F0068XIE, www.statcan.ca/engls/concepts/nps/ndex.tm eo, Douglas, Mantel, Harold and Lu, Tzen-Png (1999). Bootstrap Varance Estmaton for te Natonal Populaton Healt Survey, 1999 Jont Statstcal Meetngs Secton on Survey Researc Metods, 778-783. 1437