THE REDESIGNED CANADIAN MONTHLY WHOLESALE AND RETAIL TRADE SURVEY: A POSTMORTEM OF THE IMPLEMENTATION

Similar documents
Overall stability of multi-span portal sheds at right-angles to the portal spans

Balanced Binary Trees

Gasoline Empirical Analysis: Competition Bureau March 2005

Description of Danish Practices in Retail Trade Statistics.

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

Optimization Model of Oil-Volume Marking with Tilted Oil Tank

Variance Estimation of the Design Effect

FACTORS DETERMINING UNITED STATES IMPORTS OF COFFEE

ECONOMIC IMPACT OF LEGALIZING RETAIL ALCOHOL SALES IN BENTON COUNTY. Produced for: Keep Dollars in Benton County

Farm Structure Survey 2009/2010 Survey on agricultural production methods 2009/2010

The household budget and expenditure data collection module (IOF 2014/2015) within a continuous multipurpose survey system (INCAF)

Economic Contributions of the Florida Citrus Industry in and for Reduced Production

Multiple Imputation for Missing Data in KLoSA

Physics Engineering PC 1431 Experiment P2 Heat Engine. Section B: Brief Theory (condensed from Serway & Jewett)

16.1 Volume of Prisms and Cylinders

Buying Filberts On a Sample Basis

Revision Topic 12: Area and Volume Area of simple shapes

Grape Growers of Ontario Developing key measures to critically look at the grape and wine industry

Calculation of Theoretical Torque and Displacement in an Internal Gear Pump

OF THE VARIOUS DECIDUOUS and

1. Expressed in billions of real dollars, seasonally adjusted, annual rate.

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

Analysis of Things (AoT)

Method for the imputation of the earnings variable in the Belgian LFS

OD DVOSTRUKO ZASTAKLJENOG PROZORA DO DVOSTRUKE FASADE INDIKATORI PRENOSA TOPLOTE STACIONARNOG STANJA

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Indexes of Aggregate Weekly Hours. Last Updated: December 22, 2016

Results from the First North Carolina Wine Industry Tracker Survey

Food and beverage services statistics - NACE Rev. 2

Background. Sample design

P O L I C I E S & P R O C E D U R E S. Single Can Cooler (SCC) Fixture Merchandising

Chapter Ten. Alcoholic Beverages. 1. Article 402 (Right of Entry and Exit) does not apply to this Chapter.

EFFECT OF TOMATO GENETIC VARIATION ON LYE PEELING EFFICACY TOMATO SOLUTIONS JIM AND ADAM DICK SUMMARY

青藜苑教育 Example : Find te area of te following trapezium. 7cm 4.5cm cm To find te area, you add te parallel sides 7

Handbook for Wine Supply Balance Sheet. Wines

Labor Supply of Married Couples in the Formal and Informal Sectors in Thailand

Ground Improvement Using Preloading with Prefabricated Vertical Drains

Dairy Market R E P O R T

UPPER MIDWEST MARKETING AREA THE BUTTER MARKET AND BEYOND

Imputation of multivariate continuous data with non-ignorable missingness

Canada-EU Free Trade Agreement (CETA)

MPLEMENTATION OF A NATIONAL OBSERVATORY FOR MONITORING TECHNO-ECONOMIC DATA OF THE ITALIAN FLEET AND THE EVALUATION OF SOCIO-ECONOMIC PARAMETERS 1

MBA 503 Final Project Guidelines and Rubric

The Weights and Measures (Specified Quantities) (Unwrapped Bread and Intoxicating Liquor) Order 2011

Soft and Semi-soft Cheese made from Unpasteurized/Raw Milk in Canada Bureau of Microbial Hazards, Food Directorate, Health Canada

THIS REPORT CONTAINS ASSESSMENTS OF COMMODITY AND TRADE ISSUES MADE BY USDA STAFF AND NOT NECESSARILY STATEMENTS OF OFFICIAL U.S.

An Examination of operating costs within a state s restaurant industry

How Rest Area Commercialization Will Devastate the Economic Contributions of Interstate Businesses. Acknowledgements

Veganuary Month Survey Results

HOUSE COMMITTEE ON APPROPRIATIONS FISCAL NOTE. HOUSE BILL NO. 466 PRINTERS NO. 521 PRIME SPONSOR: Turzai

Prediction of steel plate deformation due to triangle heating using the inherent strain method

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Fungicides for phoma control in winter oilseed rape

Proposed Adjustment of Public Health Fees for FY

Red Green Black Trees: Extension to Red Black Trees

A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation

Study of Steam Export Transients in a Combined Cycle Power Plant

P O L I C I E S & P R O C E D U R E S. I.C.E. In-store Merchandising

Napa County Planning Commission Board Agenda Letter

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

BREWERS ASSOCIATION CRAFT BREWER DEFINITION UPDATE FREQUENTLY ASKED QUESTIONS. December 18, 2018

Module 6. Yield and Fruit Size. Presenter: Stephan Verreynne

Trends. in retail. Issue 8 Winter The Evolution of on-demand Food and Beverage Delivery Options. Content

Study of microrelief influence on optical output coefficient of GaN-based LED

Problem Set #3 Key. Forecasting

Pasta Market in Italy to Market Size, Development, and Forecasts

The aim of the thesis is to determine the economic efficiency of production factors utilization in S.C. AGROINDUSTRIALA BUCIUM S.A.

DELIVERING REFRESHING SOFT DRINKS

Re: LCBO Lightweight Glass Wine Standard Implementation Date

Fleurieu zone (other)

The Market Potential for Exporting Bottled Wine to Mainland China (PRC)

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

Fixation effects: do they exist in design problem solving?

Russell James Department of Scientific and Industrial Research Taupo-ldairakei, New Zealand

Point Pollution Sources Dimensioning

Chef And Team Derby Green Ooty

Retailing Frozen Foods

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Dairy Market. May 2017

Dairy Market. Overview. Commercial Use of Dairy Products

IN THIS ISSUE FEBRUARY Financial Calendar: Late September 2014 Annual Results Announced. 26 March 2014 Interim Results Announced

$ BUY STARBUCKS CORPORATION (SBUX) Rena Kaufman. Valuation Methodology. Market Data. Financial Summary (7/1/2018) Profile. Financial Analysis

(A report prepared for Milk SA)

Promotion Strategy and Financial Policy -The Wine Industry in Hokkaido Japan -

Flexible Working Arrangements, Collaboration, ICT and Innovation

Uniform Rules Update Final EIR APPENDIX 6 ASSUMPTIONS AND CALCULATIONS USED FOR ESTIMATING TRAFFIC VOLUMES

Economic Census Overview and Exercises

Annex 16. Methodological Tool. Tool to determine project emissions from flaring gases containing methane

Composition and Value of Loin Primals

Dairy Market. April 2016

Math Practice Use a Formula

Memorandum of understanding

BOARD OF ZONING ADJUSTMENT STAFF REPORT Date: June 4, 2018

OIV Revised Proposal for the Harmonized System 2017 Edition

PEEL RIVER HEALTH ASSESSMENT

Peet's Coffee & Tea, Inc. Reports 62% Increase in Second Quarter 2008 Diluted Earnings Per Share

MANGO PERFORMANCE BENCHMARK REPORT

Adelaide Plains Wine Region

ICC September 2018 Original: English. Emerging coffee markets: South and East Asia

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Transcription:

ASA Section on Survey Researc Metods THE REDESIGNED CANADIAN MONTHLY WHOLESALE AND RETAIL TRADE SURVEY: A POSTMORTEM OF THE IMPLEMENTATION Julie Trépanier, Statistics Canada Julie Trépanier, Business Survey Metods Division, Statistics Canada, Ottawa, Canada K1A 0T6 Keywords: Retail Trade, Wolesale Trade, Size Stratification, NAICS, Birts, Deats, Misclassifications, Evaluation. 1. INTRODUCTION Te Montly Wolesale and Retail Trade Survey (MWRTS) is one of te mission critical surveys conducted by Statistics Canada (SC). It collects montly retail and wolesale sales and inventories at te industrial and geograpical levels. On a valueadded basis, tese two industries constitute approximately 1% of te Gross Domestic Product (GDP). It uses te same sample mont after mont wit te exception of a sample of birts tat is added montly. Te last major redesign of te MWRTS was completed in 1988 using te 1980 Standard Industrial Classification (SIC) system. Wit te implementation of te Nort American Industry Classification System (NAICS) in 1997, it was obvious tat te MWRTS needed to be redesigned as a NAICS-based survey. Oter objectives were to improve te coverage of non-employer businesses, to take advantage of te new Goods and Services Tax (GST) data and to incorporate innovative solutions at te sampling and estimation stages to maintain te quality of te survey over time. In addition, te processing systems, wic were based mostly on a mainframe computer environment, were becoming costly and inefficient. Under te Project to Improve Provincial Economic Statistics (PIPES) and its generic Unified Enterprises Survey (UES), SC s annual business survey program (including retail and wolesale annual surveys) ad also been redesigned in te late 1990s. As well, consistency between te MWRTS and te UES metodology needed to be improved. Reducing response burden were possible was a priority to compensate for te expansion, bot in coverage and details, of te annual surveys. Te redesign started wit a feasibility study tat was launced in te fiscal year 1999-000. Te newly available Goods and Services Tax (GST) data were studied to determine ow tey could improve sampling and estimation for te MWRTS. Te SC GST processing system was still being improved at tat time. Consequently, te study concluded tat, for te moment, annual GST sales sould only be used in te size stratification, an area needing improvement in MWRTS. Te sampling design metodology (Bérard 001, Majkowski 001) was developed in 000-01 followed by te development of te editing and imputation processes in 001-0. Estimation, including an outlier detection and treatment strategy (Mattews and Bérard 00), was also developed. Te fiscal year 00-03 served to fine-tune te processes and develop a backcasting approac for NAICS estimates (Fortier 003). Te new MWRTS sample was selected in April 003. Te fiscal year 003-04, te fourt and final year of te redesign project, also permitted te completion of diagnostic tools to monitor processes (Majkowski et al. 004). A complete test of te new MWRTS in parallel wit te old MWRTS was performed from December 003 to April 004 reference monts. Te new MWRTS released in June 004 its first estimates for te April 004 reference mont. Tis paper describes te sampling design implementation and assumptions, as well as te estimation in section. Te new estimates are compared to te old MWRTS estimates in section 3 followed by an assessment of te assumptions made at te sampling design stage. Te conclusion presents general observations on te redesign and aspects tat we would keep and tose we would cange if we were to start over te project.. SAMPLING DESIGN AND ESTIMATION.1 Sampling Design Te April 003 sampling design is described in tis section. Te resulting sample is used to collect retail sales, wolesale sales and wolesale inventories, but te sampling design focuses on sales primarily. Retail inventories are collected for a portion of te sample only. Te frame is extracted from SC s Business Register (BR). Te BR provides a four-level ierarcical statistical structure starting at te top wit te enterprise, ten te company, te establisment and te location. Te MWRTS frame is made of establisments, bot employers and non-employers, for wic te first two digits of teir NAICS (i.e., sector) are 41 (wolesale), 44 or 45 (retail). Excluded 4515

ASA Section on Survey Researc Metods from tis frame are retail establisments wit NAICS 4541 (electronic sopping and mail-order ouses), 454 (vending macine operators), 45431 (fuel dealers) or 45439 (oter direct selling establisments), and wolesale establisments wit NAICS 4111 (oilseed and grain), 41 (petroleum products) or 419 (agents and brokers). Apart from being NAICS-based, te two major differences between te new and old MWRTS frames are: 1) non-employer establisments are now included. In 1988 no frame existed for tem. A constant and outdated adjustment for teir undercoverage was made to te retail estimates only; ) te frame is built at te establisment level for consistency wit te UES (previously built at te location level for retail). Sampling units are defined as clusters of establisments witin te same stratification trade group (TG), te same stratification geograpical region (GEO) and under te same enterprise. Tis differs from te old MWRTS were te sampling units were te companies of te retail locations or wolesale establisments. For retail, 4 TGs based on 3- to 5-digit NAICS and 16 GEOs, i.e., te 13 provinces and territories wit 3 separate regions for Montreal, Toronto and Vancouver, are used. For wolesale, tere are 16 TGs based on 3- to 5-digit NAICS and 13 GEOs (provinces and territories). Te clusters of establisments are first stratified by TG and GEO as defined above. An annual size measure is ten associated wit eac cluster. In te old MWRTS, an estimated measure of te Gross Business Income (GBI) provided by te BR was used to stratify te companies by size. Altoug te GBI is overall of good quality, te GBI of some large companies was sometimes underestimated. As a result, some large companies were assigned to takesome strata wit large weigts, creating an undesired impact on te estimates. An objective of te new MWRTS was to improve size stratification by using oter data sources suc as annual sales provided by GST. A study performed in 000-001 (Bérard 001) sowed tat te maximum of te GBI, annual GST sales and revenue available for te corporate income tax data ( T-revenue) called TRIO, would produce a more efficient size stratification. A 10% reduction in size misclassification for employers was expected. TRIO seemed particularly efficient in identifying large units. Survey data from te old MWRTS and te UES could also be used for size stratification since te April 003 sample was to be selected independently from te samples of tese two surveys (no control of overlap). In summary, te size measure for a given establisment is set to: 1) annual sales from te 00 MWRTS; else ) annual sales from te 1999 or 000 UES; else 3) TRIO=Max(GBI, GST sales, T-revenue). Te cluster size measure is ten equal to te sum of te size measures of its establisments. Statistics on te use of eac source are presented in Table 3 later in tis section. Section 3 presents te performance of tis size measure now tat montly sales are available for te sampled clusters. Once stratified by TG and GEO, te cluster s size measure is used to identify te non-surveyed portion of te survey frame or, if one prefers, to build te take-none strata. Tis portion of te population will be estimated by oter data sources suc as GST. Using a pre-determined set of tresolds (often referred to at SC as te Royce-Maranda tresolds), a tresold is cosen from tis set for eac combination of TG x GEO so tat te clusters wit a size measure below tis tresold do not represent more tan 5% of te total size measure of te TG x GEO. Tese clusters comprise te take-none stratum of tat TG x GEO. Wen multi-cluster enterprises ave clusters tat fall bot in and out of te take-none strata, all of te clusters are forced into te surveyed portion. Tis is to ensure tat multi-cluster enterprises are eiter totally excluded or totally surveyed. Tables A and B sow tat overall 45.1% of te retail clusters and 68.1% of te wolesale clusters are in te non-surveyed portion. Tis is a significant effort to reduce response burden. Before stratifying te surveyed portion of te frame, particular clusters suc as tose in multi-cluster enterprises are pre-specified take-all in teir TG x GEO. Based on te remaining number of clusters in te surveyed portion of te TG x GEO, te following number of size strata is usually formed: - 10 clusters or less : 1 take-all stratum - 11 to 50 clusters : 1 take-all stratum and 1 takesome stratum - More tan 50 clusters: 1 take-all stratum and take-some strata ( large and small ). An algoritm developed by Lavallée and Hidiroglou (L-H) (1988) is widely used for te size stratification of SC business surveys. It provides an iterative algoritm tat can stratify a igly skewed population into a take-all stratum and a number of take-some strata wile minimizing te sample size required for a given level of relative precision. It assumes simple random sampling witout replacement (SRSWOR) witin te take-some strata, and any form of power allocation of te total sample size to te strata. Te L-H algoritm needs a size measure igly correlated wit te survey variables 4516

ASA Section on Survey Researc Metods of interest. Te new MWRTS uses te size measure as described earlier (referred to as X) and a sample allocation proportional to te square root of te size measure is specified. In te old MWRTS, te same sample allocation was used but te size measure was te GBI. Te take-all strata were establised by a metod from Hidiroglou (1986). Boundaries between te two take-some strata were not optimal, being based on some fixed BR boundaries. For te new MWRTS, a modified version of te L-H algoritm (Ferland 003) is used in eac TG x GEO. It is very similar to te original L-H metod but it takes into account expected rates of out-of-business (OOB) units in te sample. OOB units, still unknown to te BR and tus present as active on te BR, can fall on te MWRTS frame and sample. Tey will be found inactive by te survey and peraps by signals from administrative sources. In te interim, one can only estimate teir percentage. Tese rates are 10% for large take-some strata and 0% for small takesome strata and are based on old MWRTS results. (Please note tat te out-of-scope rate (te rate of active clusters tat are expected to be found out of retail and wolesale) could ave been considered ere as well but it was not.) Te L-H algoritm is modified so tat te function to minimize in eac TG x GEO in te Lavallée and Hidiroglou paper: n is replaced by : n = N L + c ( L 1 N = 1 = N L + 1 L = 1 L 1 = 1 p Y ) ( NcY ) + p N ( S L 1 = 1 + S L = 1 / a + (1 p N ) Y p N ( S S ) / a + (1 p ) Y, ) were n is te total sample size to be minimized, represents te stratum wit L being te take-all stratum and oters being take-some strata, N is te population size of, N is te total population size, S is te population variance of, a represents te sample size allocation formula to te take-some L 1 stratum, i.e., a = X X in te = 1 MWRTS case, p is te expected OOB rate in, and c is te target coefficient of variation. Tis form of n taking into account te OOB rates was mentioned in Latouce (1988). It takes into account te effect of suc units ($0) on te resulting variance of te estimate. Integrating it into te L-H algoritm ensures tat size stratification and sample size determination are optimally obtained to acieve te expected CV c altoug OOB rates p could be observed in te sample once surveyed. Te target coefficients of variation (CVs) for te new MWRTS sales are presented in Table 1. Te old MWRTS ad iger target CVs for wolesale (e.g., 1.7% national). Table 1: Target CVs for Sales Level CV (%) National All Industries 1. Provincial / Territorial.5 TG Publised - Priority - Non- Priority.5 3.5 TG Stratification X GEO 16.5 Te 4 and 16 stratification TGs for retail and wolesale make respectively 19 and 15 publised TGs. Tere are more stratification tan publised TGs in order to create more omogeneous strata and to satisfy needs of te Quarterly Retail Commodity Survey (a second-pase sample selected from te first-pase retail sample tat collects commodity distribution). Some TGs, like new car dealers and grocery stores in retail, are considered priority TGs due to teir importance in te national estimate. Two adjustments to te sample sizes of take-some strata provided by te modified L-H algoritm are planned to avoid undesired impact on te estimates or teir variances: 1) establising minimum sample size; and ) capping design weigts. Since tese adjustments increase te final sample size and consequently decrease te expected CVs, iger CVs tan tose presented in Table 1 are used in te modified L-H. Before describing tese adjustments, one must understand a second series of adjustments tat is performed: oversampling for frame misclassifications and nonresponse. Contrary to minimum sample size and maximum weigt rules tat decrease te expected CVs, oversampling for frame misclassifications and nonresponse is applied to prevent te observed CVs from being greater tan expected due to misclassifications and nonresponse. Altoug tey artificially increase te sample sizes given by te modified L-H metod, one sould not adjust te expected CVs used as an input to te L-H metod. Ideally, like te expected OOB rates, expected misclassification and nonresponse rates sould be integrated into te modified L-H metod so tat size stratification and sample size determination remain optimally obtained. However, in te new MWRTS, expected frame misclassification and nonresponse 4517

ASA Section on Survey Researc Metods rates are only used to inflate te sample sizes after te modified L-H. Tis inflation is te last step and its rates are derived as follows. 1) Frame misclassifications: Out-of-scope cases and movements between TG and GEO are andled in an unbiased way by domain estimation. However, tey can affect te CVs of te estimates as a result of stratification inefficiency. Based on UES 000, te expected rate of out-ofscope units (live units moving out of retail and wolesale) is set to 15%. Movements between TGs are on average.7% for retail and 0.6% for wolesale but vary by TG. GEO misclassifications are expected to rarely occur. An overall frame misclassification rate of 17% (15% for out-of-scopes plus % for TG movements) is assumed. Te rates vary from 10% to 35% by TG. ) Nonresponse : A 10% nonresponse rate was observed on average in te old MWRTS. Tat rate is assumed for te new MWRTS. Knowing te above oversampling rates, te adjustments for minimum sample size and maximum design weigts are first performed as follows. 1) Ultimately a minimum sample size of 8 clusters (10 in some cases) is desired. Knowing te combined oversampling rate in a stratum for misclassifications and nonresponse, say r, te temporary minimum sample size to use before te oversampling in is 8 / (1 + r ), truncated to te integer below. For example, if r =30%, te minimum sample size is te integer of (8/1.30)=6. ) Ultimately sample sizes sould ensure tat te design weigt does not exceed 10 for large takesome strata and 30 for small take-some strata. (Te maximum is lower for some TG x GEO.) As in 1), knowing r, te temporary maximum design weigt to use before te oversampling for is 10* (1 + r ), rounded up to te next integer (assuming is a large take-some stratum). For example, if r =30%, te maximum weigt is round (10*1.30)=13. In te old MWRTS, final design weigts of 15 in te large take-some strata and 50 in te small take-some strata were not rare, causing a large impact on te estimates wen associated wit a misclassified company. Lower weigts were desirable in te new MWRTS. In summary, te size stratification and sample size determination of te new MWRTS are performed in te following iterative way. 1) Te modified L-H (including te OOB rates) is run by TG x GEO based on : - Prespecified take-all clusters - CVs for te TG x GEO - 1 take-all stratum, 1 or take-some strata - Sample allocation proportional to te square root of te size measure - SRSWOR witin strata ) Minimum sample size and maximum weigt adjustments are made to te resulting modified L-H sample sizes. 3) Expected CVs by TG, GEO and TG x GEO are computed given te sample sizes in ) for all TG x GEO and taking into account OOB rates used in 1) and teir impact on te variance of te estimates. If any of te expected CVs are above te target CVs (see Table 1), steps 1), ) and 3) are repeated wit appropriate new CVs by TG x GEO. 4) Te TG x GEO population distribution and size stratification tresolds are examined grapically. For example, if te large take-some stratum population is still igly skewed, te tresold between te large take-some and takeall strata may be lowered. Te revised tresold is ten forced in step 1). Steps ) and 3) are repeated. (Please note tat 46% of te tresolds were lowered during tis grapical analysis). 5) Oversampling for frame misclassifications and nonresponse is finally performed. Systematic sampling witin strata was in fact used to select te April 003 sample altoug stratified SRSWOR is assumed at estimation. Before sample selection, clusters were ordered based on teir size measure. Tis tecnique was adopted to avoid an extreme sample (i.e., a sample eiter made of very large units or very small units). Tis is particularly important wen te sample is not canged frequently except for te addition of a montly sample of birts. Te results of te April 003 sampling are presented below in Tables A and B. A large proportion of te sample (46.1% for retail and 7.% for wolesale) is made of take-all clusters. Tis is largely due to te prespecification of take-all clusters, mostly from multi-cluster enterprises. Te distribution of employer and non-employer clusters is different in te surveyed and non-surveyed portions. Nonemployer clusters are more present in te nonsurveyed portion (smaller clusters). Tis explains wy in Table 3 te source of te size measure is less often based on te old MWRTS data in te nonsurveyed portion since te old MWRTS did not survey non-employers. 4518

ASA Section on Survey Researc Metods Table A: April 003 Retail Sampling Strata Population Sample Counts % Counts % Take-All : -Pre-specified -Modified L-H Take-All Total 4,045 1,784 5,89.1 0.9 3.1 4,045 1,784 5,89 3.0 14.1 46.1 Large Take-Some 19,46 10.3 3,171 5.1 Small Take-Some 78,181 41.5 3,659 8.9 Surveyed - Total -Employer -Non-Employer Non-Surveyed -Employer -Non-Employer 103,47 83,480 19,99 85,041 5,941 59,100 54.9 80.7 19.3 45.1 30.5 69.5 1,659 100 0 0 Total 188,513 100 1,659 100 Table B: April 003 Wolesale Sampling Strata Population Sample Counts % Counts % Take-All : -Pre-specified -Modified L-H Take-All Total 4,991 1,148 6,139 5.0 1.1 6.1 4,991 1,148 6,139 58.7 13.5 7. Large Take-Some 5,63 5.6 1,96 15.3 Small Take-Some 0,59 0. 1,063 1.5 Surveyed Total -Employer -Non-Employer Non-Surveyed: -Employer -Non-Employer 3,030 8,911 3,119 68,350,1 46,19 31.9 90.3 9.7 68.1 3.5 67.5 8,498 100 0 0 Total 100,380 100 8,498 100 Table 3: Source of te Size Measure Used for te Population of Establisments (%) Source Retail Wolesale Surveyed Non Surveyed Surveyed Non Surveyed Survey MWRTS UES Admin. GBI GST T 18.6 6.7 4.0 35.4 15. 1.3 0.8 44.8 37.4 15.7.0 5.8.8 31. 18.3 1. 0.4 44.6 3.8 1.0 For collection purposes, te clusters are converted into collection entities, i.e., te company tied to te establisments witin te same TG. Te collection entity, altoug at te company level (between te enterprise and te establisment), does not consider te geograpical dimension like te clusters. Instead, sales by GEO are collected from eac collection entity. Te 1,157 clusters in te retail and wolesale samples (1,659+8,498) are expected to translate into 16,38 collection entities. Because of expected deat rates (out-of-business and out-of-scope) and special collection arrangements, 11,85 collection entities are expected to be collected in a regular mont. Tis is a significant reduction in collection costs as te old MWRTS ad 16,75 to collect... Sample Update Since April 003, birts ave been added to te sample montly. A birt in te context of te new MWRTS is a brand new cluster of establisments. A new establisment joining an existing cluster (te enterprise already as establisments in te same TG x GEO) is not considered a birt. Birt clusters are stratified by TG, GEO and size using te same definitions and stratum boundaries as in April 003. Tey are sampled using te same sampling fractions. So far, te typical number of clusters birted montly in te population is 900 for retail and 400 for wolesale. Approximately 60% of tem fall in te non-surveyed portion. Typically 90 retail and 80 wolesale clusters are added to te sample. Deats ave also been identified montly since April 003. A deat in te context of te new MWRTS is a cluster tat does not ave live and in-scope establisments anymore, i.e., all its establisments ave died, ave ceased retail or wolesale activities or ave moved to a different enterprise. Surveys identify deats more quickly tan administrative sources. Bot surveys and administrative sources update te BR. Surveys ten extract teir latest universe from te BR to update teir frame. Te source of te updates is difficult to establis (eiter a survey, an administrative source or bot can identify a deat). It is difficult to establis if te source of te update is independent or not from te survey. For tat reason, MWRTS does not remove its deats montly from eiter its sample or its population. Instead, tey are imputed wit $0 and contribute to te variance estimate. Every 6 monts or so, an unbiased deat removal is performed (Trépanier et al. 1998). Oter tan birts and deats, te same sample is used mont after mont. In 000, montly sample rotation was investigated for te new MWRTS (Majkowski 001). Results sowed tat any reasonable montly sample rotation will affect te mont-to-mont cange estimate by at least 0. percentage points. Tis was considered too ig. To reduce response burden, a new sample sould be selected every 4-5 years simultaneously wit a full restratification of te population. In te interim, unbiased deat removals and partial restratifications are planned to maintain te efficiency of te sample. MWRTS is also analysing options to replace sales obtained from direct data collection by modelled sales from te GST data for simple units (direct link wit GST). Tis may be possible due to te improved processing of GST data (Dubreuil et al. 003). 4519

ASA Section on Survey Researc Metods.3. Estimation Te new MWRTS uses a simple expansion domain estimator for its surveyed portion ˆ N Y ( d) = yi( d), n i were d is te domain of interest (e.g., TG, GEO), Y(d) is te parameter of interest (e.g., total sales) in domain d, denotes te strata, N and n are te population and sample sizes of, i denotes te clusters, and y i (d) = y i if i d, else y i (d) = 0 were y i represents te value of Y for cluster i in. Estimated variances are also computed for te surveyed portion. Te non-surveyed portion s contribution is added to te estimates troug a multiplicative adjustment factor applied to Y ˆ( d ). For te first year of te new MWRTS, it as been decided tat te adjustment factor will be te ratio of te total April 003 size measure in te non-surveyed portion over te total April 003 size measure in te surveyed portion witin a given TG x GEO. As a result, te mont-to-mont cange estimate will be completely driven by te surveyed portion. Tis approac as some weaknesses but it as te advantage of allowing analysts and metodologists to concentrate teir efforts on analysing and stabilising te surveyed portion. Te adjustment factor metodology will be reviewed once te survey processes are stable. Mattews and Bérard (00) implemented an outlier detection and treatment metodology in te new MWRTS tat provides a compromise between variance reduction and bias increase. It is based on a modified version of te Fuller (1991) Test and Treat metod and anoter metod referred to as te Deflation Factor metod. Its objective is to identify influential units witin suspicious domains (domains for wic te simple expansion domain estimate is greater tan an expected value (e.g., te forecasted estimate based on te time series)). Tese influential units are deemed to ave true survey response values but teir observed size is muc larger tan expected. A large observed value combined wit a large weigt seriously impacts te estimate. Te outlier detection and treatment module proposes a factor wic, if accepted by te analyst, is applied to te y-value to reduce its contribution to te domain estimate. 3. RESULTS AND EVALUATION Te new MWRTS sample was selected in April 003. From May to November 003, te sample was updated for birts and deats but data were not collected from te units. In te summer of 003, a pre-contact was made wit te sampled units to introduce tem to te survey and verify tat tey were in-scope for te survey. Ten survey data were collected for te new MWRTS for December 003 to April 004 reference monts in parallel wit te old MWRTS data collection. Te first estimates from te new MWRTS (for April 004 reference mont) were released in June 004. At te same time, te time series were revised and converted to NAICS-based TGs back to 1991 for retail and 1993 for wolesale. To do so, NAICSbased TG domain estimates were first produced from te old MWRTS from 1998 to 003 (period for wic te NAICS classification was carried on te old MWRTS). A subset of tese estimates, te 1998 to 001 estimates, also served in establising macrolevel conversion coefficients for eac SIC-based TG to eac NAICS-based TG. Backcasted NAICS-based TG estimates were produced using tese coefficients for te period prior to 1999 (Fortier 003). A NAICS-based TG time series was ten available from te old MWRTS. Using January 004 estimates for retail and February 004 for wolesale final linkage was accomplised to reflect te new level provided by te new MWRTS. 3.1 Estimates : Old vs. New Te total retail and wolesale trade estimates are presented from bot te old and new MWRTS for Marc 004. Suc differences were expected as wolesale trade lost units to retail trade and manufacturing under NAICS. Table 4 : National Estimates from te Old and New MWRTS (in $ billions) Marc 004 Old (SIC) New (NAICS) % Diff. Retail 6.07 7.00 +3.6% Wolesale 40.53 39.9-1.5% 3.. Evaluation Tis section evaluates some of te assumptions made at te sampling design stage as well as te performance of te new size measure. Te evaluation is owever based on limited data, i.e., Dec. 003 to Marc 004. 3..1. Deat Rates Te deat rates take into account bot out-ofbusiness and out-of-scope clusters. Altoug tey were dealt wit separately at te sampling design, tey are evaluated globally ere. As mentioned in.1, te out-of-business rates were set at 10% for large take-some strata and 0% for small take-some strata. Te out-of-scope rate was 15%. Consequently, te expected deat rates in te sample are 5% and 35%, respectively for large take-some and small take- 450

ASA Section on Survey Researc Metods some strata. Results observed for Marc 004 reference mont are presented in Table 5. Te deat rates in te sample appear iger tan tose expected at te sampling stage. However, if an unbiased deat removal were performed, i.e., by removing te same proportion of deats in and out of te sample, tere would still be a significant proportion of deats left in te sample but it would be in line wit te initial assumption. Retail: Large TS Small TS Table 5: Dead Clusters in te Take-Some Surveyed (TS) Portion (%) Marc 004 Strata In-Sample Out-of-Sample Wolesale: Large TS Small TS 6.3 36.6 8.9 39.0 4.3 3.1 5.5 3.4 3... TG and GEO Movements Based on Marc 004 survey data, te TG and GEO movements are presented in Table 6. Table 6 : Overall TG and GEO Movements (in %) for Marc 004 TG GEO Retail.9 5.1 Wolesale 1.9 5. As mentioned in.1, te movements between TGs were expected to be.7% for retail and 0.6% for wolesale. Te observed rates are sligtly iger for TGs. Movements between GEOs were deemed negligible based on UES. Tis made sense since industrial misclassification is known to be more frequent tan geograpical misclassification on te BR. Te observed GEO rates are quite large, even larger tan te TG rates. One explanation can be tat MWRTS estimated te misclassification rate at te design stage on old information, i.e., 000 UES information. 3..3. Response Rates Table 7 presents December 003 to Marc 004 response rates, i.e., number of responding units divided by te total number of in-scope units. A 90% response rate was assumed at te sample design stage (or 10% nonresponse). As time goes by and te processes become more stable, tat target response rate seems acievable and is in fact met for retail. Table 7: Response Rates (%) from December 003 to Marc 004 Dec.03 Jan.04 Feb.04 Mar.04 Retail 84.9 89.7 88.8 90.0 Wolesale 83.9 87.8 86.8 87.9 3..4. Efficiency of te Size Stratification A preliminary analysis of te size stratification efficiency was performed. Micro-level annualised sales were produced for te sample using te sum of te seasonally adjusted December 003 to Marc 004 sales multiplied by 3. Te annualised sales of eac cluster were compared to te size stratification tresolds to see if te cluster appears to be correctly or incorrectly stratified. Wen it was incorrectly stratified, we looked if te cluster was under stratified (its size measure underestimated its real size) or over stratified. Te under stratified cases are more problematic at estimation as teir sales value, wic is larger tan expected, may be combined wit a large design weigt. Te efficiency based on te April 003 size measure source (MWRTS, UES, TRIO, and ten for te individual components of TRIO) was examined. Table 8 sows te results. Note tat te prespecified take-all clusters were excluded from tat comparison as teir criteria to be stratified take-all was not size. Multiestablisment clusters were te establisments size measure came from different sources were excluded. Table 8: Efficiency of te April 003 Size Stratification - Preliminary Evaluation Source Retail Wolesale MWRTS UES TRIO GBI GST T Incorrectly Stratified 11.8 19.4 5.5 33.8 0. 7.4 Under stratified. 5.7.5.5 3.0 1.1 Incorrectly stratified 15.3 30.1 8.5 4. 0.9 7.4 Under stratified 3. 7.1 3.0 3.0 3.3.4 Overall 1..8 4. 3.4 Altoug a non-negligible proportion of te clusters appear to be incorrectly stratified, te size measure meets its first objective: avoid te under stratified situations. UES does not perform as well as oter sources, but note tat 1999 and 000 information was used. Oter results indicate tat if MWRTS ad restratified its population in April 004 based on te latest size measure (but wit same stratum boundaries), only 9.1% and 5.1% of te clusters in te retail and wolesale population respectively would ave canged size strata. Tis seems to indicate tat te incorrectly stratified cases in Table 8 are not due only to a global increase in te clusters size since April 003. However, a more in-dept evaluation needs to be performed before drawing any conclusion. 3..5. Coefficients of Variation Finally, te estimated CVs for Marc 004 are compared to te target CVs (see Table 1). 451

ASA Section on Survey Researc Metods Table 9 : Number of Domain Estimates witin or not witin Target CVs Marc 004 Retail Witin target CVs? Wolesale Witin target CVs? Yes No Yes No National 1 0 1 0 TG 16 3 1 3 Prov./Terr. 13 0 13 0 Target CVs are not acieved in some TGs for te following reasons: 1) Tese TGs often ave te igest TG misclassification rates; ) Te variability in te annual size measure appears to underestimate te variability in te montly sales. 4. CONCLUSION Te new MWRTS predicted quite accurately frame imperfections, suc as deats and TG movements, and nonresponse, but GEO misclassification was underestimated. Provisions were made at te sampling design to account for tese in a modified Lavallée-Hidiroglou algoritm tat incorporates expected out-of-business rates, and by inflating sample sizes. A size measure based on survey and administrative sources improved size stratification. As a result, most CVs were witin te target CVs. In te future, efforts could also be put on incorporating as well te impact of frame misclassifications and nonresponse in te Lavallée- Hidiroglou algoritm. Tis would better preserve te optimality provided by te Lavallée-Hidiroglou algoritm. A size measure tat better reflects montly variability of te survey data could also be an area of investigation. Te 8-mont delay between sample selection and actual data collection sould be sortened to allow te use of more up to date information to estimate frame imperfection rates. If budget permits, te 5-mont period to test te new survey in parallel wit te old one sould be longer to allow more time for fine-tuning and analysis. 5. ACKNOWLEDGEMENT Te autor would like to express er sincere tanks to Caterine Dufour, Micel Ferland, Susie Fortier and Mark Majkowski for teir precious inputs into tis paper, and to Ricard Evans and Cris Mol for teir toroug review of te paper. 6. REFERENCES Bérard, H. (001), Te Redesign of te Montly Wolesale and Retail Trade Survey of Statistics Canada, Proceedings of te Survey Metods Section, Statistical Society of Canada, pp. 81-86. Dubreuil, G., Hidiroglou, M. A. and Pierre, L. (003), Use of Administrative Data in Modeling of Montly Survey Data, Proceedings of te Survey Metods Section, Statistical Society of Canada [CD- ROM]. Ferland, M. (003), Enanced Lavallée-Hidiroglou Algoritm, Unpublised Manuscript, Ottawa : Statistics Canada. Fortier, S. (003), La conversion de données istoriques selon un nouveau système de classification pour l'enquête mensuelle sur le commerce de gros et de detail, Proceedings of te Survey Metods Section, Statistical Society of Canada [CD-ROM]. Fuller, W. A. (1991), Simple Estimators for te Mean of Skewed Populations. Statistica Sinica, 1, 137-158 Hidiroglou, M.A. (1986), Te Construction of a Self-Representing Stratum of Large Units in Survey Design, Te American Statistician, 40, 7-31. Latouce, M. (1988), Détermination, allocation et sélection de l écantillon, Recueil des textes des presentations du colloque sur les métodes et domaines d application de la statistique, Bureau de la statistique du Québec, pp. 69-78. Lavallée, P. and Hidiroglou, M. A. (1988), On te Stratification of Skewed Populations, Survey Metodology, June 1988, Vol. 14, No. 1, pp. 33-43. Majkowski, M. (001), Maintaining Estimate Quality and Easing Response Burden in a Sub- Annual Business Survey, Proceedings of te Survey Researc Metods Section, American Statistical Association [CD-ROM]. Majkowski, M., Bérard, H., Dufour, C. and Fortier, S. (004), Monitoring Survey Processes of te Canadian Montly Wolesale and Retail Trade Survey, Proceedings of te Survey Researc Metods Section, American Statistical Association [CD-ROM]. Mattews, S. and Bérard, H. (00), Te Outlier Detection and Treatment Strategy for te Montly Wolesale and Retail Trade Survey of Statistics Canada, Proceedings of te Survey Metods Section, Statistical Society of Canada [CD-ROM]. Trépanier, J., Babyak, C., Marcand, I., Bissonnette, J. and St-Pierre, M. (1998), Enancements to te Canadian Montly Wolesale and Retail Trade Survey, Proceedings of te Survey Researc Metods Section, American Statistical Association, pp. 487-49. 45