Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea Testig sigificace of peaks i kerel desity estiator by SiZer ap Aleksadra Baszczyńska Abstract I kerel desity estiatio te researcer eeds two paraeters of kerel etod: te kerel fuctio ad sootig paraeter called as badwidt. Te special care is required i coosig te last oe. Too sall value of badwidt results i spurious peaks i te desity estiator. Too large value akes it oversooted. I paper, a useful tecique kow as SiZer ap is preseted. Tis tecique elps i deteriig weter peaks i desity estiator are sigificat or ot. Te desity kerel estiator is viewed tougt te differet level of sootig. Te SiZer ap ca be used by o-experts ad speeds te procedure of decidig wic features are sigals ad wic are oise. Te procedure of testig te ypotesis about sigificace of tis type is described. Te applicatios of SiZer ap is illustrated by aalysis of carbo dioxide eissio i coutries ade by desity fuctio estiatio. Keywords: kerel desity estiatio, SiZer ap, testig ypotesis JEL Classificatio: C, C3. Itroductio Desity estiatio is oe of te ostly used way of idetifyig ad describig te structure of data o te basis of te rado saple. Noparaetric etods, especially kerel desity estiatio, becoes ore ad ore popular i te aalysis of, aog oters, ecooic variables (Li ad Racie, 7). I te process of desity fuctio estiatio by kerel etod, te researcer as to deterie two paraeters of te etod: kerel fuctio ad sootig paraeter. Soe kerel fuctios are preseted i literature but te ifluece of tis paraeter o te results of desity estiatio is regarded ot to be sigificat. Te sootig paraeter, kow as te badwidt, wic deteries te level of sootig i te process of estiatio, plays a iportat role i resultig estiator. So, te ways of coosig te appropriate value of sootig paraeter i te process of estiatio are take ito regard i, for exaple, i Silvera (996). Te classical approac to kerel desity estiator eas regardig oe value of sootig paraeter i kerel desity estiatio tat results i a sigle estiated fuctio. Eve we a good coice of sootig paraeter is ade, isleadig ipressio ca be created due to te bups of te estiator. Te proble of assessig if tese bups are really tere ad avoidig spurious oise sould be regarded Uiversity of Łódź, Departet of Statistical Metods, 9-4 Łódź, Rewolucji 95 r. 4, Polad, albasz@ui.lodz.pl. 9
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea i te data structure aalysis. I tecical aalysis tis proble eas deteriig wic structure is sigal ad wic is oise. Te SiZer ap is a grapical tool used i aalyzig te visible feature represetig iportat uderlyig structures troug differet levels of sootig wat eas tat te estiatio of kerel desity fuctio is ade ad aalyzed for differet values of badwidts. Te idea of cosiderig a faily of soots ca be foud i scale space teory i coputer sciece. Cauduri ad Marro () explored tis proble i a statistical poit of view. Te bup i te structure of curve like desity fuctio is caracterized by goig up oe side ad goig dow te oter. Te bup is a zero crossig of te derivative ad it is statistically sigificat we te derivative estiate is sigificatly positive to te left ad statistically egative to te rigt. Te ae of SiZer ap stes fro assessig te SIgificat ZERo crossig of te derivative. Coperig wit te classical approac tere are two ai differeces. Firstly, SiZer studies a very wide rage of badwidts istead of lookig at just oe. Secodly, istead of focusig o a true uderlyig curve i classical, SiZer as lookig at te true curve viewed at varyig badwidts wat ca lead to recoverig te sigificat aspects of te uderlyig fuctio for differet levels of sootig. Beefits are evidet - it speeds up te process of decidig wic features are really tere ad akes tis type of iferece readily do-able by o-experts.. Kerel etod Kerel etod ca be applied i differet areas: i desity estiatio, regressio estiatio, classificatio ad patter recogitio. I desity fuctio estiatio, kerel etod, kow as Parze-Roseblat etod, is oe of te ostly used procedures i assessig te caracteristic features of rado variable. A copreesive review of kerel desity etods ca be foud i Silvera (986) ad Li ad Racie (7). Kerel desity estiator is defied i te followig way (Roseblatt 956; Parze 96): X i x f ˆ x K i () were X,...,, X X is te rado -eleet saple, is te sootig paraeter, () K is te kerel fuctio. Kerel fuctios, wic are i ost cases desity fuctios, are preseted, aog oters, i Doański ad Pruska (). Te ost widely used is Gaussia kerel wic is desity
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea fuctio of oral stadardized distributio. We tis kerel is used i kerel desity estiatio, te uber of zero crossig of te derivative estiate is always a decreasig fuctio of sootig paraeter. Because of tis feature just Gaussia kerel is used i SiZer ap. I classical approac of kerel desity estiatio te researcer as to ake a decisio wic value of sootig paraeter is appropriate i particular estiatio. Sootig paraeter cotrols, like i oter oparaetric curve estiators (for exaple istogra), te level of sootess. Sall value of leads to jagged estiate, wile big value teds to produce over sooted estiator. I literature soe procedures idicatig tis value are preseted, suc as Silvera s rule of tub, cross-validatio, plug-i etod ad teir odificatios. I SiZer ap te sootig paraeter rage, istead oe value like i classical approac, is take ito cosideratio. 3. Testig ypoteses i SiZer ap I SiZer ap we ave te possibility of regardig ot oly oe desity kerel estiator costructed for a particular kerel fuctio ad particular value of sootig paraeter but te faily of desity estiators wit Gaussia kerel fuctio ad te rage of sootig paraeter. Te faily of soot curves is te followig: were: x : i, i B, B is te biwidt, ax xax xi. Te case of, is also regarded. ax fˆ () Te faily () represets differet structures of te curve uder differet levels of sootig ad ca be called as scale space surface. Wile at differet scales of resolutio. x E f ˆ is te true curve viewed We a peak is observed, before te peak te sig of derivative is positive, at te poit of axiu te derivative is equal to, after peak te derivative is egative. We a valley is observed before te valley te sig of derivative is egative, at te poit of iiu te derivative is equal to, after valley te derivative is positive. Hece, peaks ad valleys are deteried by zero crossig of te derivative. I SiZer ap Gaussia kerel fuctio is used: K u e x,
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea because i kerel desity estiatio wit tis kerel fuctio, te uber of zero crossigs of derivative (uber of peaks) decreases ootoically wit te icrease of te badwidt (Silvera 986). Cauduri ad Marro () sow tat i kerel regressio wit Gaussia kerel fuctio, te uber of zero crossigs of te t order derivative decreases ootoically wit te icrease of te badwidt. I SiZer te followig ypoteses are regarded: fˆ x, x E H :, (3) x If x H, to te sig of is rejected, tere is a evidece tat f ˆ x x locatio i te scale space. fˆ x, x E H :. (4) x E fˆ x x is positive or egative, accordig (Cauduri ad Marro, ). Te test is doe idepedetly at eac I te calculatio of te quatile q te followig fact is used: if two locatios u ad u are sufficietly far apart, relative to te f ˆ u ad ˆ u tat f ˆ u ad ˆ u f are idepedet wic iplies f are idepedet. Te siultaeous cofidece liit proble is te approxiated by idepedet cofidece itervals. Te estiate for is calculated troug a x ESS, estiated effective saple size: ESS We kerel is uifor x x X i x K i. (5) K, ESS, is siply te uber of data poits i te widow of widt cetered at x. For Gaussia kerel te data poits are dowweigted accordig to te eigt of te kerel fuctio. Next is cose to be te uber of idepedet blocks ( cofidece itervals) of average size available fro a dataset of size : Te x. (6) avg ESS x x, ESS, ca also be used to idicate were te soot is based o sparse data by igligtig te regios were ESS x,. Cauduri ad Marro (999) suggested tat
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea. Terefore te calculatio of block size 5 ESS x, to: were x ESS x, is odified to avoid probles wit sall, (7) avg ESS x x D, D : 5, is te set of locatios were te data are dese. Assuig idepedece of a % cofidece iterval is: blocks of data te approxiate siultaeous quatile for q. (8) For te derivative estiate f ˆ x te cofidece liits, depedig o, ca be costructed: were: q is appropriate quatile, ad calculatio of derivative estiator were s k,..., k x f ˆ x q sdfˆ x f x q sdf x ; ˆ ˆ, (9) x sd fˆ is based o te fact tat te f ˆ is a average of te derivative kerel fuctios: var fˆ s x is te saple variace of X i x var K i, X x X x K,..., K k,...,k. O te vertical axis i te SiZer ap is x ad o te orizotal axis is. Fro te SiZer ap it is possible to preset te iforatio, for give x ad, about te positivity ad u x egativity of te derivative of x K f udu used:. blue, x f. red, x f 3. purple, x f. Te followig color codes are ˆ is sigificatly icreasig, (zero is greater ta te upper cofidece liit), ˆ is sigificatly decreasig (zero is less ta te lower cofidece liit), ˆ is ot sigificatly icreasig or decreasig (zero witi cofidece liits), f 4. grey, idicates regios were te data are too spare to ake stateets about sigificace, te effective saple size is less ta 5. 3
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea I SiZer ap te logscale is used for i te display (it gives soots tat are ore equally spaced). Te dotted wite curves sow effective widow widts for eac badwidt, as itervals represetig ( stadard deviatios of te Gaussia kerel). Tere is a variatio of SiZer ap aed SiCo ap (Sigificat CONvexity), were statistical iferece is ade takig ito accout secod derivative ad regios of statistically sigificat curvature are igligted (special color code is used: cya sigificat cocavity, dowward curvature; orage sigificat covexity, upward curvature; gree o sigificat curvature). 4. Applicatio of SiZer ap I literature tere are exaples of usig te Sizer ap i aalysis of ecooic data (Zabo ad Dias, ), edical data (Skrovset, Bellika, Godtliebse, ) or geoceical data (Rudge, 8). Te applicatio of SiZer ap is illustrated i te aalysis of te carbo dioxide eissio i coutries i te world. Te data was dowloaded fro te data bak (ttp://data.worldbak.org/topic/eviroet [5..4]). Total carbo dioxide eissio (i tousad etric tos) is available for 4 coutries i te world for 96-. Te last year was take ito accout i te researc. Saples of sizes, 3 ad 5 coutries were cose ad o te basis of tese saples te SiZer aps are obtaied usig te codes i Matlab. Figure sows te results were te kerel desity estiator for differet values of sootig paraeters is preseted (top) ad te SiZer ap (botto) for saple size. I te SiZer ap blue sows regios of sigificat positive sigificatly egative ˆ x, purple regios were x f fˆ x, red regios of ˆ is ot sigificatly icreasig or decreasig ad grey regios were it is ot possible to ake iferece. For large values of badwidt te desity fuctio sigificatly icreases, te tere is a regio were SiZer is uable to distiguis ad te tere is a regio were te desity fuctio sigificatly decreases. Te SiZer ap results i grey regio for sall values of badwidt, it eas tat it is ot possible to separate sigal ad oise. Tis situatio is closed coected wit te saple size. For suc sall saple size te process of estiatig te desity fuctio is rater difficult. f 4
log () log () Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea x -5 Faily Overlay, 5-Feb-4 4.5.5.5 3 3.5 4 4.5 Slope SiZer Map x 5 5.5 5 4.5 4 3.5.5.5.5 3 3.5 4 4.5 x 5 Fig.. SiZer ap for =. Figure -3 presets SiZer ap for bigger saple sizes. It sould be oted tat we saple size is icreasig, te grey regio becoes saller. x -6 Faily Overlay, 5-Feb-4 5.5.5.5 3 3.5 4 4.5 5 Slope SiZer Map x 6 6.5 6 5.5 5 4.5 3 4 5 x 6 Fig.. SiZer ap for = 3. 5
log () Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea x -6 Faily Overlay, 5-Feb-4 8 6 4 3 4 5 6 7 8 Slope SiZer Map x 6 6.5 6 5.5 5 3 4 5 6 7 8 Fig. 3. SiZer ap for = 5. x 6 Coclusio Te SiZer ap is very useful tecique i deteriig structure of te data. It ca be treated as oclassical etod because of its ultiple results. Takig ito accout ot oly oe value of sootig paraeter like i classical approac but te rage of values, broades te researcer s poit of view. But te special issue sould be uderlied: te saple size. Too sall saple size uables detailed aalysis of structure of date. Furter researc sould be ade to deterie te ifluece of te saple size o te results of SiZer ap. Ackowledgeets Tis work was supported by te project uber DEC-//B/HS4/746 fro te Natioal Sciece Cetre. Refereces Cauduri, P., & Marro, S. (999). SiZer for exploratio of structure of curves. JASA, 94, 87-83. Cauduri, P., & Marro, S. (). Scale space view of curve estiatio. Te Aals of Statistics, 8, 4-48. Doański, Cz., & Pruska, K. (). Nieklasycze etody statystycze. PWE, Warszawa. 6
Proceedigs of te 8 t Professor Aleksader Zelias Iteratioal Coferece o Modellig ad Forecastig of Socio-Ecooic Peoea Li, Q., & Racie, J. S. (7). Noparaetric ecooetrics. Teory ad practice, Priceto Uiversity Press, Priceto ad Oxford. Parze, E. (96). O estiatio of a probability desity fuctio ad ode. A. Mat. Statist., 3. Roseblatt, M. (956). Rearks o soe oparaetric estiatio of a desity fuctio, A. Mat. Statist., 7. Rudge, J. (8). Fidig peaks i geoceical distributios: A re-exaiatio of te eliu-cotietal crust correlatio, Eart ad Plaetary Sciece Letters, 74, 79-88. Silvera, B.(996). Desity estiatio for statistics ad data aalysis, Capa ad Hall, Lodo Skrovset, S., Bellika, J., & Godtliebse, F. (). Causality i scale space as a approac to cage detectio. Retrived fro ttp://www.plosoe.org/article/fetcobject.actio? uri=ifo%3adoi%f.37%fjoural.poe.553&represetatio=pdf. Turer, L. (3). Explorig structure of curves usig SiZer. Retrived fro ttp://www.stat.ubc.ca/~webaste/owto/statsoftware/isc/sizer/paper.pdf. Zabo, A. Z., & Dias, R. (). A review of kerel desity estiatio wit applicatios to ecooetrics. Retrieved fro ttp://arxiv.org/pdf/.8.pdf. 7