The household budget and expenditure data collection module (IOF 2014/2015) within a continuous multipurpose survey system (INCAF)

Similar documents
Background. Sample design

Farm Structure Survey 2009/2010 Survey on agricultural production methods 2009/2010

Variance Estimation of the Design Effect

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

Description of Danish Practices in Retail Trade Statistics.

Overall stability of multi-span portal sheds at right-angles to the portal spans

Balanced Binary Trees

Multiple Imputation for Missing Data in KLoSA

THE REDESIGNED CANADIAN MONTHLY WHOLESALE AND RETAIL TRADE SURVEY: A POSTMORTEM OF THE IMPLEMENTATION

1. Expressed in billions of real dollars, seasonally adjusted, annual rate.

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Indexes of Aggregate Weekly Hours. Last Updated: December 22, 2016

Buying Filberts On a Sample Basis

Revision Topic 12: Area and Volume Area of simple shapes

ESTIMATING ANIMAL POPULATIONS ACTIVITY

Labor Supply of Married Couples in the Formal and Informal Sectors in Thailand

THE ECONOMIC IMPACT OF BEER TOURISM IN KENT COUNTY, MICHIGAN

16.1 Volume of Prisms and Cylinders

New Orleans One Year After Katrina: Obtaining a Representative Sample and Conducting a House-to-House Survey

Optimization Model of Oil-Volume Marking with Tilted Oil Tank

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

MBA 503 Final Project Guidelines and Rubric

QUARTELY MAIZE MARKET ANALYSIS & OUTLOOK BULLETIN 1 OF 2015

Math GPS. 2. Art projects include structures made with straws this week.

青藜苑教育 Example : Find te area of te following trapezium. 7cm 4.5cm cm To find te area, you add te parallel sides 7

ECONOMIC IMPACT OF LEGALIZING RETAIL ALCOHOL SALES IN BENTON COUNTY. Produced for: Keep Dollars in Benton County

To find the volume of a pyramid and of a cone

2. The proposal has been sent to the Virtual Screening Committee (VSC) for evaluation and will be examined by the Executive Board in September 2008.

Predicting Wine Quality

The Wild Bean Population: Estimating Population Size Using the Mark and Recapture Method

Point Pollution Sources Dimensioning

Fair Trade and Free Entry: Can a Disequilibrium Market Serve as a Development Tool? Online Appendix September 2014

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

Missing Data Treatments

Gasoline Empirical Analysis: Competition Bureau March 2005

Calculation of Theoretical Torque and Displacement in an Internal Gear Pump

wine 1 wine 2 wine 3 person person person person person

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Prediction of steel plate deformation due to triangle heating using the inherent strain method

MARKET ANALYSIS REPORT NO 1 OF 2015: TABLE GRAPES

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Figure 1: Quartely milk production and gross value

Red Green Black Trees: Extension to Red Black Trees

Study of Steam Export Transients in a Combined Cycle Power Plant

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Increasing the efficiency of forecasting winegrape yield by using information on spatial variability to select sample sites

Is urban food demand in the Philippines different from China?

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

Fixation effects: do they exist in design problem solving?

Relation between Grape Wine Quality and Related Physicochemical Indexes

Perspective of the Labor Market for security guards in Israel in time of terror attacks

This appendix tabulates results summarized in Section IV of our paper, and also reports the results of additional tests.

Physics Engineering PC 1431 Experiment P2 Heat Engine. Section B: Brief Theory (condensed from Serway & Jewett)

EXECUTIVE SUMMARY OVERALL, WE FOUND THAT:

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

FACTORS DETERMINING UNITED STATES IMPORTS OF COFFEE

Calculation Methodology of Translucent Construction Elements in Buildings and Other Structures

FIRST COMPARISON OF REMOTE CESIUM FOUNTAINS

Analysing the energy consumption of air handling units by Hungarian and international methods

TORQUE CONVERTER MODELLING FOR ACCELERATION SIMULATION

COMPARISON OF EMPLOYMENT PROBLEMS OF URBANIZATION IN DISTRICT HEADQUARTERS OF HYDERABAD KARNATAKA REGION A CROSS SECTIONAL STUDY

RESULTS OF THE MARKETING SURVEY ON DRINKING BEER

Summary Report Survey on Community Perceptions of Wine Businesses

Math Practice Use a Formula

OF THE VARIOUS DECIDUOUS and

Step 1: Prepare To Use the System

Goal: Measure the pump curve(s)

Coffee zone updating: contribution to the Agricultural Sector

Napa County Planning Commission Board Agenda Letter

P O L I C I E S & P R O C E D U R E S. Single Can Cooler (SCC) Fixture Merchandising

The Future of the Still & Sparkling Wine Market in Poland to 2019

WP Board 1035/07. 3 August 2007 Original: English. Projects/Common Fund

FARM STRUCTURE SURVEY 2007

Quality of Canadian oilseed-type soybeans 2017

VOLUME VII REPORT ON CROP AND LIVESTOCK PRODUCT UTILIZATION

Ground Improvement Using Preloading with Prefabricated Vertical Drains

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

OIV Revised Proposal for the Harmonized System 2017 Edition

ANALYSIS OF WORK ROLL THERMAL BEHAVIOR FOR 1450MM HOT STRIP MILL WITH GENETIC ALGORITHM

UPPER MIDWEST MARKETING AREA THE BUTTER MARKET AND BEYOND

Annex 16. Methodological Tool. Tool to determine project emissions from flaring gases containing methane

2016 China Dry Bean Historical production And Estimated planting intentions Analysis

Economic Losses from Pollution Closure of Clam Harvesting Areas in Machias Bay

What are the Driving Forces for Arts and Culture Related Activities in Japan?

The supply and demand for oilseeds in South Africa

Figure 1: Percentage of Pennsylvania Wine Trail 2011 Pennsylvania Wine Industry Needs Assessment Survey

Regression Models for Saffron Yields in Iran

Bt Corn IRM Compliance in Canada

VOLUME VII REPORT ON CROP AND LIVESTOCK PRODUCT UTILIZATION

Bizualem Assefa. (M.Sc in ABVM)

A.P. Environmental Science. Partners. Mark and Recapture Lab addi. Estimating Population Size

Problem Set #3 Key. Forecasting

What Is This Module About?

EFFECT OF TOMATO GENETIC VARIATION ON LYE PEELING EFFICACY TOMATO SOLUTIONS JIM AND ADAM DICK SUMMARY

PROCEDURE million pounds of pecans annually with an average

Washington Vineyard Acreage Report: 2011

Quality of western Canadian flaxseed 2013

Mischa Bassett F&N 453. Individual Project. Effect of Various Butters on the Physical Properties of Biscuits. November 20, 2006

A Comparison of Approximate Bayesian Bootstrap and Weighted Sequential Hot Deck for Multiple Imputation

Inequality Among the MPI Poor, and Regional Disparity in Multidimensional Poverty: Levels and Trends

Transcription:

MZ:2014:08 Te ouseold budget and expenditure data collection module (IOF 2014/2015) witin a continuous multipurpose survey system (INCAF) Report from a fift sort term mission to te National Statistical Institute of Mozambique, Maputo Mozambique 22 November-14 December 2014 witin te frame work of te AGREEMENT ON CONSULTING ON INSTITUTIONAL CAPACITY BUILDING, ECONOMIC STATISTICS AND RELATED AREAS between INE and Scanstat David J. Megill Ref: Contract DARH/2008 /004 October, 2012 1

Address: David J. Megill 1504 Kenwood Ave. Alexandria, VA 22302, US E-Mail: davidmegill@yaoo.com Telepone: 1-703-824-0292 2

Table of Contents 1 INTRODUCTION AND TERMS OF REFERENCE... 4 2 ACTIVITIES DURING THE MISSION... 5 2.1. Summary of Sample Design for IOF 2014/15... 6 2.2. Weigting Procedures for IOF 2014/15... 8 2.3. Procedures for Calculating Sampling Error... 14 2.4. Capacity Building.....15 3 FINDINGS AND RECOMMENDATIONS... 16 APPENDIX 1. Persons Contacted... 17 3

1. INTRODUCTION AND TERMS OF REFERENCE Te Instituto Nacional de Estatística (INE) is conducting te Inquérito sobre o Orçamento Familiar (IOF) 2014/15, or Houseold Budget Survey (HBS), in a nationally-representative sample of 11,592 ouseolds in 1,236 sample census enumeration areas (EAs) over te 12-mont period from August 2014 to July 2015. Tis survey was designed as a module of te Inquérito Contínuo de Agregados Familiares (INCAF), or Continuous Houseold Survey, wic is a multipurpose ouseold survey wit a quarterly employment component, and te HBS, designed to obtain income and expenditure data for all four quarters to represent seasonality. One of te objectives of te IOF is to provide measures of poverty and oter socioeconomic indicators, and to provide information on consumption needed for national accounts. Te sample of ouseolds for IOF is treated as a panel, and eac sample ouseold is interviewed eac quarter in a different period of te mont. Te first quarter of data collection for IOF was conducted between 8 August and 7 November 2014. A small team of Scanstat sort-term consultants began working wit te INE staff on 24 November to review te first quarter of IOF data and to develop procedures for producing preliminary results from tese data. Based on tis review, te team also made recommendations for improving te data quality for te remaining tree quarters of data collection. Te Terms of Reference for tis first mission of te Sampling Consultant were stated as follows: Objective: During te first mission te Sampling Consultant will focus on assessing te current status of te INCAF/IOF sampling and estimation procedures following te first quarter of data collection, in order to identify any issues tat need to be addressed. One objective of tis mission will be to finalize te weigting procedures for te INCAF/IOF first quarter results, including te adjustment of weigts to take into account non-response. Activities: Te assessment will include a review of summary information from te listing of ouseolds in sample enumeration areas (EAs), and te distribution of te completed ouseold interviews for te first quarter of INCAF/IOF. Sampling errors and design effects for key INCAF/IOF estimates suc as te quarterly unemployment rate and average ouseold expenditures will be tabulated and reviewed to assess te level of precision and te efficiency of te sample design. Te metodology for maintaining te panel of ouseolds and te longitudinal analysis will be reviewed. Te replacement of non-interview ouseolds will also be examined. Expected outputs: Based on te findings, te Sampling Consultant will make recommendations for improving different aspects of te metodology. Trougout tis visit te Sampling Consultant will work closely wit te INE Statisticians to provide on-te-job training. A alf day seminar on te INCAF/IOF sampling and estimation metodology will be presented for te INE staff at te end of tis visit. Reporting: Te Sampling Consultant will submit a report on findings and recommendations on te INCAF/IOF sampling and estimation metodology. One purpose of tis mission report is to document te metodology for calculating te weigts for IOF. Since te weigting procedures depend on te sampling metodology, tis report also summarizes te IOF sample design. 4

Te calculation of sampling errors for selected IOF indicators can only be accomplised once te IOF data edits are complete and te weigted survey data file for te first quarter is considered final. Terefore tis activity will be followed up during te second mission of te Sampling Consultant around Marc 2015. Te main activity of tis mission, te calculation of te IOF weigts, was only completed by te middle of te tird week, given tat it took time to compile te sampling frame information for all te sample clusters needed for te calculation of te weigts. Te sampling consultant worked closely wit Arão Balate, Director, Direcção de Censos e Inquéritos, Basílio Cubula, INE Sampling Statistician, and oter INE staff in implementing te weigting procedures for te IOF 2014/15. He also collaborated wit is Scanstat consultant colleagues, Lars Lundgren and Anne Abelset. He appreciates teir collaboration, and e would also like to tank Dr. João Loureiro, INE President, Manuel Gaspar, INE Vice-President, Antônio Adriano, Director Adjunto, Direcção de Censos e Inquéritos, Cristóvão Muaio, Cief, Departamento de Metodologia e Amostragem (DMA), and Eugénio Matavel, Programmer, for teir support. 2. ACTIVITIES DURING THE MISSION Te data collection for te first quarter of IOF 2014/15 was completed around 7 November. Since computer-assisted personal interviewing (CAPI) was conducted using tablets tat included initial edits in te field, te IOF data files sould be available soon after te end of te first quarter. Te occupation and industry information for employment was coded later in te office, but it would still be possible to produce some of te key unemployment tables soon after te data files are received from te field. In te case of te expenditure data, paper questionnaires were used, and te data were ten supposed to be entered in te field. However, INE decided to enter all te expenditure data again in te office, wic resulted in some delay in te availability of tese data. Once te full data set from te ouseold interviews for te first quarter of IOF became available during te first week of tis mission, Megill produced aggregated data at te sample cluster level to verify te concatenation of te data and determine weter data from any sample clusters were missing. Initially it was found tat nine sample clusters (enumeration areas, or EAs) were missing, so te data processing staff cecked on te status of tese sample EAs. Tere were tree EAs in Sofala Province tat could not be enumerated because of security reasons. In te case of te oter missing EAs, te INE staff contacted te field staff to obtain back-up copies of te data files. By te end of te second week of tis mission te data files for all te EAs except for te tree in Sofala were obtained and merged wit te national data file for te first quarter of IOF. Te weigts were calculated based on te final IOF data set tat will be used for te first quarter analysis. Te weigting procedures for te IOF 2014/15 depend on te sample design, so first it was necessary to review te sampling metodology used for te survey. Tis sample design is described below. 5

2.1. Summary of Sample Design for IOF 2014/15 A stratified multi-stage sample design was used for selecting te sample for te IOF 2014/15. Te sampling frame was based on te Master Sample (Amostra Mãe) developed by INE from te 2007 Mozambique Census of Population and Housing (Recenseamento Geral da População e Habitação, RGPH 2007). Te sampling metodology for te Master Sample is described in te report on"recomendações Metodológicas para o Deseno da Amostra Mãe em Base ao RGPH 2007 de Moçambique" (David J. Megill and Carlos Creva Singano, November 2010). Te sampling metodology involves tree stages of selection. Te primary sampling units (PSUs) selected at te first stage are based on te supervisory areas (áreas de controlo) defined for te RGPH 2007. Eac supervisory area as about 3 to 5 enumeration areas (EAs), wic are operational segments defined on maps for te census enumeration. At te first sampling stage te PSUs were selected systematically wit probability proportional to size (PPS) witin eac stratum. Te measure of size for eac PSU was based on te number of ouseolds in te RGPH 2007 frame. At te second stage one EA was selected wit PPS witin eac PSU. A listing of ouseolds was conducted witin eac sample EA, wic is te frame for selecting a sample of ouseolds at te tird sampling stage. Te PSUs in te Master Sample are stratified by province, urban and rural areas. Te urban stratum of eac province is furter divided into substrata consisting of cities and oter urban areas. A few very large cities are also divided into socioeconomic substrata, wic were defined based on te RGPH 2007 socioeconomic data. In te case of te rural stratum of eac province, te PSUs were classified by agroecological zone, wic was used as a sorting variable to provide implicit stratification. Te sampling frame was also sorted geograpically to provide additional implicit stratification. Te main component of te first stage sample for IOF 2014/15 consisted of te 752 sample EAs selected for te INCAF 2012. Te sample EAs for tat survey ad been previously selected from te Master Sample, stratified by province, urban and rural stratum. In order to improve te level of precision for te provincial-level estimates of key indicators, te total number of sample EAs for IOF was increased to 1,236, so 484 additional sample EAs were selected from te Master Sample, using te same systematic PPS selection procedures witin eac stratum. One reason tat te INCAF sample EAs were used for te IOF is tat te listing of ouseolds for tat survey conducted in 2012 could be used again for te IOF in order to reduce te cost of te fieldwork. In tis case it was only necessary to conduct a new listing in te additional sample of 484 sample EAs. Te overlap in te sample EAs between te INCAF and IOF would also provide a greater correlation between te two samples, wic sould improve te level of precision for te estimates of trends (differences) over time for te unemployment rate labor force indicators. At te last sampling stage 11 sample ouseolds were selected from te listing for eac urban EA, and 8 ouseolds were selected for eac rural EA. A reserve of sample ouseolds was also selected in eac EA for replacing any sample ouseold tat could not be interviewed for any reason. Table 1 sows te distribution of sample EAs and ouseolds selected for te IOF 2014/15 by province, urban and rural stratum. 6

Table 1. Distribution of Sample EAs and Sample Houseolds for IOF 2014/15, by Province and Urban/Rural Stratum Province No. of EAs Urban Rural Total No. of No. of No. of Houseold No. of Houseold No. of Houseold s EAs s EAs s Niassa 32 352 64 512 96 864 Cabo Delgado 44 484 60 480 104 964 Nampula 60 660 104 832 164 1,492 Zambézia 52 572 124 992 176 1,564 Tete 40 440 68 544 108 984 Manica 40 440 56 448 96 888 Sofala 60 660 44 352 104 1,012 Inambane 40 440 52 416 92 856 Gaza 40 440 48 384 88 824 Maputo Província 60 660 48 384 108 1,044 Maputo Cidade 100 1,100 - - 100 1,100 Total 568 6,248 668 5,344 1,236 11,592 One additional step used in te sampling implementation for IOF was tat small EAs (for example, wit less tan 50 ouseolds) were combined wit adjacent EAs in te census frame to form a larger cluster tat was listed. Some large EAs (for example, wit more tan 200 ouseolds) were subdivided into smaller segments, and one segment was randomly selected for te listing. Altoug te original listing form was designed to include tis information for combined and sub-divided EAs, unfortunately te tablet system used for capturing te data in te field did not keep tis information. Tis affects te calculation of te weigts, as described later in te section on weigting procedures. Altoug te final sample of EAs for te IOF 2014/15 was selected in different pases for te INCAF and te additional IOF sample, te same sampling procedures were used for eac pase. Tat is, at te first stage te PSUs were selected systematically wit PPS witin eac stratum, and one EA was selected witin eac PSU wit PPS. Terefore te estimation procedures for calculating te weigts and te sampling errors will be based on te assumption tat all te IOF sample EAs witin eac stratum were selected at te same time using tese procedures. In most EAs all te original sample ouseolds tat could not be interviewed were replaced, in wic case tere were exactly 11 completed ouseold interviews for sample urban EAs and 8 completed ouseold interviews for sample rural EAs. However, in some EAs tere were more non-interview ouseolds tan replacements, so tat less sample ouseolds ad completed interviews. In a few cases te interviewers completed 12 ouseold interviews in an urban sample EA. Tis is not a problem, since te number of completed interviews is taken into account in te weigting procedures, as described later in tis section. 7

2.2. Weigting Procedures for INCAF/IOF 2014/15 In order for te sample estimates from te IOF 2014/15 to be representative of te population, it is necessary to multiply te data by a sampling weigt, or expansion factor. Te basic weigt for eac sample ouseold would be equal to te inverse of its probability of selection (calculated by multiplying te probabilities at eac sampling stage). Te sampling probabilities at eac stage of selection are maintained in an Excel spreadseet wit information from te sampling frame for eac sample EA so tat te corresponding overall probabilities and corresponding weigts can be calculated. Tis section first describes te weigts based on te actual probabilities of selection, followed by weigt adjustment procedures tat were needed to compensate for deficiencies in te sampling information. Based on te sampling procedures for te Master Sample and te IOF 2014/15, te overall probability of selection for te IOF sample ouseolds can be expressed as follows: p n M M i M M i ' n n p S m M ' ' n M M p Si m M ', were: p n M M i probability of selection for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum number of sample EAs selected in stratum for te Master Sample total number of ouseolds in te RGPH 2007 frame for stratum total number of ouseolds in te RGPH 2007 frame for te i-t sample PSU in stratum M total number of ouseolds in te RGPH 2007 frame for te j-t sample EA of te i-t sample PSU in stratum n' number of EAs selected in stratum for te IOF 2014/15 p S probability of selection for te selected segment in large sample EA tat is sub-divided; tis probability is equal to 1 for all EAs tat are not segmented m number of sample ouseolds selected in te j-t sample EA of te i-t sample PSU in stratum M' total number of ouseolds listed in te j-t sample EA of te i-t sample PSU in stratum Te different components of tis probability of selection correspond to te individual sampling stages. Te probability of selecting a segment in a large EA (p Si ) depends on te type of selection procedure tat is used. If one segment is selected randomly wit equal probability, tis probability would be calculated as follows: 8

p S 1, S were: S total number of segments in te j-t large sample EA of te i-t sample PSU in stratum In te case of a small EA tat was combined wit anoter EA in te same PSU for te listing, te measure of size M was based on te sum of te number of ouseolds in te Census frame for te combined EAs. Te basic sampling weigt, or expansion factor, is calculated as te inverse of tis probability of selection. Based on te previous expression for te probability, te weigt can be simplified as follows: W n' M p S M ' M m, were: W basic weigt for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum During te first quarter of data collection for te IOF 2014/15, tree of te 1,236 sample EAs could not be enumerated because of security problems. In tis case it is necessary to adjust te weigts for te corresponding strata. Te weigts are also adjusted to take into account any non-interviews tat could not be replaced. Te weigt adjusted for missing sample EAs and sample ouseolds tat could not be replaced can be expressed as follows: W ' n' M p S M ' M m n' n" m m' M n" p S M ' M m', were: W' adjusted basic weigt for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum n" number of EAs enumerated in stratum for te IOF 2014/15 m' number of sample ouseolds wit completed interviews in te j-t sample EA of te i-t sample PSU in stratum, including replacement ouseolds Altoug an attempt was made to obtain all te information needed to calculate tis adjusted basic weigt based on te probabilities of selection, it was not possible to obtain te information on te EAs tat were combined or sub-divided. Te spreadseet 9

wit te information from te frame for eac sample EA did not include te number of ouseolds for eac sample EA in te RGPH 2007 (M ), so it was necessary to merge tis information from a database wit all te Census EAs by matcing te geograpic codes. All te EAs were matced to te Census database except for about 13 EAs. However, tese measures of size did not take into account te small EAs tat were combined wit oter EAs in te same PSU. Anoter problem is tat it was not possible to obtain information on te subdivided EAs in order to calculate te probability p S. For tis reason it was necessary to calculate approximate weigts based on te available information. Te approximate probabilities were based on using te number of ouseolds from te listing for eac EA as an approximate measure of size. In tis case te approximate adjusted weigts for te IOF 2014/15 sample ouseolds were calculated as follows: W" M M ' n" M ' m' M n" m', were: W" approximate adjusted basic weigt for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum It can be seen in tis formula tat te final adjusted weigt is similar for all sample ouseolds witin eac stratum, varying only by te number of completed ouseold interviews in eac EA. Given te procedure for replacing non-interview sample ouseolds, te number of completed ouseold interviews is exactly 11 for most sample urban EAs, and 8 for most sample rural EAs. Te effect of tis approximate weigting procedure is to adjust te weigts to te distribution of te frame based on te RGPH 2007. Terefore tis weigting procedure does not take into account any differential growt rate of te urban and rural strata by province following te RGPH 2007. However, in te next step tese approximate weigts are adjusted based on population projections by province, urban and rural stratum, as described later in tis weigting section of te report. As long as tese population projections are reasonably accurate, te weigted estimates from te IOF 2014/15 will reflect te actual distribution of te population by province, urban and rural stratum. Terefore te final adjusted weigts will reduce some of te bias in te distribution of te weigted population by stratum. Anoter reason to ave confidence in te final adjusted weigts is tat te probability of selection of te sample PSUs at te first sampling stage is known, and te last stage probability of selection of te ouseolds from te listing is known. Altoug we do not know te exact probability of selection of te EA witin te sample PSU for te cases were te EA was combined or sub-divided, we know tat te final cluster was randomly selected wit PPS (based on te EA) witin te sample PSU. In tis case we use te number of ouseolds listed in te cluster as te approximate measure of size. It sould also be pointed out tat apparently te weigts for te first quarter of INCAF 2012 suffered from a similar problem wit te lack of information for sample EAs tat were combined or sub-divided. Since te basic weigts were not adjusted to take tis problem into account, te INCAF 2012 weigts were more variable. Using te basic design weigts (prior to te adjustment based on population projections), te weigted total population from te INCAF 2012 data was only about 17.1 million, considerably 10

lower tan te corresponding population from te RGPH 2007. Tis illustrates te problem wit a potential under-count in te listing, as well as te lack of information for EAs tat were sub-divided for te listing. However, te INCAF 2012 weigts were also adjusted based on te population projections, so tis will improve te comparability of te weigted estimates from te IOF 2014/15 wit tose from te INCAF 2012. As mentioned above, te adjusted basic weigts for te IOF sample ouseolds will provide a weigted distribution by province, urban and rural stratum tat is consistent wit te RGPH 2007. In order to reflect te growt in te population by stratum between 2007 and te time of te IOF 2014/15 data collection, te preliminary weigts were adjusted based on population projections. Te INE demograpers ad used a demograpic analysis model wit te data from te RGPH 1997 and 2007 and estimates of different parameters from te 2013 Demograpic and Healt Survey (DHS) and oter sources to produce tables on population projections for eac province, urban and rural stratum, by individual year up to 2040. Te weigt adjustment factor based on te projected total population by province, urban and rural stratum can be expressed as follows: A P W" iε j k p k, were: A P adjustment factor for te weigts of te IOF sample ouseolds in stratum (province, urban/rural) projected total population for stratum for te mid-point of te data collection period for te first quarter of IOF, based on demograpic analysis W" adjusted basic design weigt for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum p k number of persons in te k-t sample ouseold in te j-t sample EA of te i-t sample PSU in stratum Te denominator of te adjustment factor A is te estimated weigted total population in stratum from te IOF data using te preliminary adjusted basic design weigts. Te preliminary weigts for all te sample ouseolds witin a stratum were multiplied by te corresponding adjustment factor for te stratum to obtain te final adjusted weigts, as follows: W A were: W" A, W A final adjusted weigt for te sample ouseolds in te j-t sample EA of te i-t sample PSU in stratum After te adjustment factors were applied to te weigts of eac stratum, te final weigted survey estimates of total population by stratum were consistent wit te 11

corresponding population projections. Of course te accuracy of te estimates of total population based on te adjusted weigts depends on te quality of te population projections by stratum. Te population projections wic INE generated for eac year reflect te mid-point of te year, or 1 July. For te adjustment of te weigts, it is ideal to use te population projections for te mid-point of te data collection period for te survey. In te case of te first quarter of IOF, te data collection was conducted between 8 August and 7 November, so te mid-point was estimated as 23 September 2014. Using te population projections by province, urban and rural stratum for 1 July 2014 and 1 July 2015, an interpolation based on exponential growt was used to estimate te population for 23 September 2014, using te following formula: P P 14 e P ln P 15 14 t IOF t t15 t 14 14 were: P projected total population for stratum on 23 September 2014 P 14 population projection for stratum on 1 July 2014 P 15 population projection for stratum on 1 July 2015 t IOF - t 14 number of days between 1 July 2014 and 23 September 2014 (tat is, 84 days) t 15 - t 14 number of days between 1 July 2014 and 1 July 2015 (tat is, 365 days) Table 2 presents te INE population projections by province, urban and rural stratum, for 1 July 2014 and 1 July 2015, and te corresponding interpolated population estimates for 23 September 2014. Table 2. Mozambique Population Projections by Province, Urban and Rural Stratum for 2014 and 2015, and Interpolated Population for Mid-Point of IOF Data Collection Period for First Quarter 2014 2015 IOF - 2014 Province and Stratum 1 July 1 July 23 Sept. Niassa Urban 372,176 388,202 375,805 Niassa Rural 1,221,307 1,268,704 1,232,055 Cabo Delgado Urban 444,864 463,038 448,982 Cabo Delgado Rural 1,417,221 1,430,118 1,420,179 Nampula Urban 1,549,414 1,615,298 1,564,334 Nampula Rural 3,338,425 3,393,495 3,351,019 Zambézia Urban 958,355 1,008,281 969,621 Zambézia Rural 3,724,080 3,794,084 3,740,075 Tete Urban 327,752 341,385 330,840 Tete Rural 2,090,829 2,176,059 2,110,143 Manica Urban 447,430 460,597 450,426 12

Manica Rural 1,418,871 1,472,925 1,431,132 Sofala Urban 725,458 737,503 728,212 Sofala Rural 1,273,851 1,311,173 1,282,345 Inambane Urban 349,499 359,253 351,720 Inambane Rural 1,125,819 1,140,226 1,129,118 Gaza Urban 358,546 365,350 360,101 Gaza Rural 1,033,526 1,051,460 1,037,626 Maputo Province Urban 1,145,642 1,200,866 1,158,122 Maputo Province Rural 492,989 508,192 496,447 Maputo City 1,225,868 1,241,702 1,229,494 Mozambique 25,041,922 25,727,911 25,198,155 Table 3 sows te population projections for te mid-point of te IOF data collection period for te first quarter, te IOF weigted estimates of total population by stratum based on te adjusted design weigts, and te corresponding weigt adjustment factor for te sample ouseold weigts in eac stratum. It can be seen in Table 3 tat te weigt adjustment factors vary from 0.8885 for Cabo Delgado Rural to 1.4808 for Maputo Province Urban. Table 3. Mozambique Population Projections and IOF Weigted Estimates of Total Population by Province, Urban and Rural Stratum, and Corresponding Weigt Adjustment Factors Province and Stratum Projected Population 23-09-14 Weigted Population IOF, First Quarter Weigt Adjustment Factor Niassa Urban 375,805 274,659 1.3683 Niassa Rural 1,232,055 1,157,637 1.0643 Cabo Delgado Urban 448,982 386,203 1.1626 Cabo Delgado Rural 1,420,179 1,598,312 0.8885 Nampula Urban 1,564,334 1,206,509 1.2966 Nampula Rural 3,351,019 3,298,091 1.0160 Zambézia Urban 969,621 697,511 1.3901 Zambézia Rural 3,740,075 3,422,909 1.0927 Tete Urban 330,840 223,590 1.4797 Tete Rural 2,110,143 1,647,588 1.2807 Manica Urban 450,426 373,314 1.2066 Manica Rural 1,431,132 1,179,326 1.2135 Sofala Urban 728,212 771,575 0.9438 Sofala Rural 1,282,345 1,242,540 1.0320 Inambane Urban 351,720 289,589 1.2145 Inambane Rural 1,129,118 1,009,156 1.1189 Gaza Urban 360,101 297,715 1.2095 Gaza Rural 1,037,626 942,200 1.1013 Maputo Province Urban 1,158,122 782,079 1.4808 13

Maputo Province Rural 496,447 396,739 1.2513 Maputo City 1,229,494 1,138,478 1.0799 Megill worked closely wit Basílio Cubula on te calculation of weigts for te first quarter of data for IOF 2014/15 using tese procedures. First Cubula compiled a spreadseet wit te information from te frame for te 1,236 sample EAs selected for IOF 2014/15, including te number of ouseolds listed in eac sample EA. Megill used te IOF ouseold data file for te first quarter to tabulate te number of sample ouseolds wit completed interviews in eac EA. He also identified sample EAs tat were missing, as described previously in tis report. He consulted wit various INE staff to try to obtain information on te EAs tat were combined or sub-divided. Since tis information was not available, Megill developed te approximate weigting procedures described above. Te final weigts were produced in te middle of te tird week of tis mission. 2.3. Procedures for Calculating Sampling Errors In te publication of te results for te IOF 2014/15 it is important to include a statement on te accuracy of te survey data. In addition to presenting tables wit calculated sampling errors and confidence intervals for te most important survey estimates, te different sources of nonsampling error sould be described. Te most common estimates to be calculated from te data for IOF will be in te form of totals and ratios. Te survey estimate of a total can be expressed as follows: Yˆ L n mi 1 i1 k1 W A y ik, were: L y ik number of strata (province, urban/rural) in te domain value of variable y for te k-t sample ouseold in te i-t sample PSU in stratum Te survey estimate of a ratio is defined as follows: Yˆ R ˆ Xˆ, wereyˆ and Xˆ are estimates of totals for variables y and x, respectively, calculated as specified previously. In te case of a stratified multi-stage sample design, means and proportions are special types of ratios. In te case of te mean, te variable x, in te denominator of te ratio, is defined to equal 1 for eac element so tat te denominator is te sum of te weigts. For a proportion, te variable x in te denominator is also defined to equal 1 for all elements; te variable y in te numerator is binomial and is defined to equal eiter 0 or 14

1, depending on te absence or presence, respectively, of a specified caracteristic for te element. Te standard error, or square root of te variance, is used to measure te sampling error, altoug it may also include a small variable part of te nonsampling error. Te variance estimator sould take into account te different aspects of te sample design, suc as te stratification and clustering. Programs available for calculating te variances for survey data from stratified multi-stage sample designs, suc as IOF, include Stata and te Complex Samples module of SPSS. Bot of tese software packages use a linearized Taylor series variance estimator. Te Complex Samples module of SPSS can be used for calculating te sampling errors for survey estimates of totals, means, proportions and oter types of ratios. For eac estimate, te SPSS tables sow te standard error, coefficient of variation (CV), 95 percent confidence interval, te design effect (DEFF) and te number of observations. Te design effect is defined as te ratio of te variance of an estimate from a complex (stratified, multi-stage) sample to te variance of a simple random sample of te same size. It is a relative measure of te sampling efficiency. Most of te design effects are greater tan 1 given te clustering effects in te sample design. Te variance estimator for a total used by SPSS Complex Samples and Stata can be expressed as follows: Variance Estimator of a Total V(Y) ˆ L 1 n n - 1 n i1 Yˆ i 2 Yˆ -, n were: Yˆ i mi k1 W Ai y ik Yˆ n i1 Yˆ i Te variance estimator of a ratio used by tese statistical software packages can be expressed as follows: Variance Estimator of a Ratio V(R) ˆ 1 Xˆ 2 [ ˆ 2 ˆ V(X) ˆ - 2 Rˆ COV(X,Y) ˆ ˆ ], V(Y)+ R were: COV(X,Y) ˆ ˆ L 1 n n - 1 n i1 Xˆ i - Xˆ n Yˆ i Yˆ - n 15

V( Y) ˆ and V( X ˆ ) are calculated according to te formula for te variance of a total. Since te final weigted data set for te first quarter of IOF 2014/15 were not available during tis mission, it was not possible to tabulate te sampling errors for selected indicators. However, tis will be followed up during te second mission, wit te tabulation of sampling errors for key estimates from te first and second quarters of IOF data. 2.4. Capacity Building As mentioned previously in tis report, Megill worked closely wit Basílio Cubula and oter INE counterparts to provide on-te-job training in developing te weigting procedures for IOF 2014/15. On te last day of tis mission Megill gave a alf-day training session to te INE staff on te metodology used for calculating te weigts for te IOF first quarter. Tere sould be more time available for suc training in te next mission, wen IOF panel data will be available for te second quarter. Altoug te final weigted data file for te IOF first quarter was not available during te first mission for te calculation of sampling errors, tis will also be followed up in te next mission. 3. FINDINGS AND RECOMMENDATIONS Te main findings during tis mission are discussed in te previous section, and te igligts are summarized ere. Altoug te data collection for te first quarter of IOF 2014/15 was successful and te survey data appear to ave reasonable quality, tere were some important lessons learned. Sampling information related to combining small sample EAs and sub-dividing large sample EAs prior to te listing operation appears to ave been lost. Tis information would be needed to calculate te exact probabilities and corresponding weigts for te IOF sample ouseolds. Since tis information was not available, it was necessary to calculate approximate weigts wic were ten adjusted based on te population projections by province, urban and rural stratum, as described in tis report. Since te IOF is based on a panel of ouseolds tat are enumerated eac quarter, it will be necessary to use te same approximate weigting procedures for all quarters. However, it is recommended tat for future surveys te information from eac sampling stage sould be carefully recorded and maintained for te calculation of te probabilities of selection and corresponding design weigts. Conceptually, a complete listing of ouseolds in te sample EAs reflects te overall average growt in te number of ouseolds across all te sample EAs, so te weigted estimates of te total population would also sow a corresponding increase. Terefore te design weigts depend on te updating of te sampling frame based on te listing, and if te listing for some sample EAs is not complete, tis will lead to a downward bias in te weigted population estimates from te survey data. It is important to note tat it is ideal to rely on a ig-quality updated listing of ouseolds in eac sample EA and weigts based on te sampling probabilities to reflect te differential population growt by province, urban and rural stratum. Altoug it is too late to correct tis for IOF 2014/15, tis is an important lesson learned for 16

improving future surveys. Te population projections are based on te growt rates between te last two censuses and general demograpic assumptions, so tey do not always accurately reflect te actual differential growt rates by urban and rural stratum witin eac province. For tis reason it is not good to always rely on te population projections for adjusting te probability-based weigts. Te second mission is tentatively sceduled for Marc 2015, following te data collection for te second quarter of IOF 2014/15. It is recommended tat te IOF data files for te first and second quarters of IOF be ready prior to tat mission, so tat te second quarter weigts can be calculated during te first week, and more time will be available for te calculation of sampling errors and oter aspects of te analysis. 17

APPENDIX 1. Persons Contacted Instituto Nacional de Estatística (INE) Dr. João Loureiro, INE President Manuel Gaspar, INE Vice-President Arão Balate, Director, Direcção de Censos e Inquéritos Antônio Adriano, Deputy Director, Direcção de Censos e Inquéritos Cristóvão Muaio, Cief, Departamento de Metodologia e Amostragem Basílio Cubula, Sampling Statistician Eugênio Matavel, Programmer, INE Carlos Creva, former INE Sampling Statistician Tomás Bernardo Scanstat Lars Lundgren, Houseold Surveys Consultant Anne Abelset, IT Consultant Lars Carlsson, Resident Advisor 18