Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data July 31, 2014 Justice Research and Statistics Association 720 7th Street, NW, Third Floor Washington, DC 20001 www.jrsa.org
Improving State Capacity for Crime Repor4ng: Data Quality and Imputa4on Methods Using State Incident- Based Repor4ng System (IBRS) Data JRSA Webinar July 31, 2014 Chris4na LaValle Research Specialist West Virginia Sta4s4cal Analysis Center
Accuracy of Crime Reporting Reporting data as is is often not accurate. IBR data is an essential data source for tracking crime trends in states without victimization surveys. Imputed data allows for more stable crime trends. WVIBR data are well suited to study data quality and imputation. 100% population and crime coverage with NIBRS. Established history of NIBRS reporting (since 2006). Relative stability and consistency in crime reporting.
Imputation Research in WV The WV Statistical Analysis Center, with funding from JRSA, has conducted two research projects on data quality and imputation methods using WV incident based reporting (IBR) data. The first project used one year of state IBR data to develop data quality and several alternative imputation methods for partial and non-reporting agencies. The second project tested and validated the data quality techniques and imputation methods on longitudinal state IBR data.
WV NIBRS Imputation Reports Research reports for both projects are located on the WVSAC and JRSA web sites. http://www.djcs.wv.gov/sac/ Documents/ WV_ImputeReportJan2013_F inal.pdf http://www.djcs.wv.gov/sac/ Documents/ WV_Impute2ReportJan2014_ Final.pdf
Imputation Background FBI methods have been used since 1958. Recently, researchers have taken on the task of improving imputation methods. Use more advanced computational techniques Account for seasonality and other data issues WV methods incorporate these recommendations. Advanced enough to be accurate yet accessible with reasonable guidance Use seasonality and variation in reporting patterns between agencies
Process of Applying Imputation Methods 1. Obtain crime data (ORI, agency name, crime counts). 2. Identify missing data (zero reports), manually inspect, and classify as missing data or true zeros. 3. Identify irregular data, manually inspect, and classify as outlier or acceptable. 4. Identify non-reporting agencies. 5. Obtain population estimates for municipal police departments and county sheriff departments. 6. Obtain MSA status for all non-municipal agencies. 7. Apply imputation methods. 8. Calculate statistics.
1. Obtain Crime Data Three dataset are needed Aggregate violent, Aggregate property, and Aggregate non-index crimes Data must be formatted by ORI, agency, and monthly crime count (columns) Agency data (rows)
2. Identifying Missing Data No variable or value is assigned for missing data. Zeros in crime data can mean No crime occurred (true zero) Data were not reported (missing data) Guidelines were developed by closely inspecting three years of data and finding common patterns.
Zero Classification Guidelines Zero classification guidelines use helper variables. NCZ - number of consecutive months were all crime counts are zero Total P total number of property crimes PopStatus population status (population or zero-population) 1. If any crime count in the same month for violent, property, or non-index crimes are non-zero any zero reported is a true zero. 2. If Total P > 25 and NCZ > 0, zeros are identified as missing. 3. If NCZ 4, zeros are identified as missing. 4. Guidelines 2 and 3 may not hold for zero-population agencies.
Data Format
Entering Data
Running the Missing Data Macro
Manually Inspecting the Data
Preparing Data for Outlier Detection
3. Identify Irregular Data Two, complementary techniques and graphical analysis. Outlier statistics were developed by WVSAC. Ratio of Ranges (Rr) Identifies the agency with suspected outliers Ratio to Median (Yi) Identifies the month containing suspected outliers Graphical Analysis Histogram Dot Plot Line Chart
Outlier Detection Formulas Ratio or Ranges, Rr Ratio to Median, Yi
Preparing for Outlier Detection
Data Set Up
Running the Outlier Detection Macro
Manually Inspecting the Data
Graphical Analysis - Visually Depicting Data
Preparing Data for Imputation
4. Identify Non-reporting Agencies Data needed: Complete state agency list n State Police or state repository
5. Obtain Population Estimates Data needed: Population data (U.S. Census Bureau) n Population 2000-2009 n http://www.census.gov/popest/data/cities/totals/2009/sub-est2009- states.html n Population 2010-2012 n http://www.census.gov/popest/data/cities/totals/2012/sub-est2012.html
6. Obtain MSA Status Data needed: County MSA status (U.S. Census Bureau) n County FIPS codes 2010 n https://www.census.gov/geo/reference/codes/cou.html n County MSA 2003-2009 n http://www.census.gov/population/metro/data/defhist.html
7. Imputation Methods WV imputation methods. Partial reporting agencies (missing 1-9 months of data) n Missing months imputed using a moving average of reported months Non-reporting agencies (missing 10-12 months of data) n Missing crime total imputed using adjusted population and crime rates from associated population groups
WV Partial Reporting Imputation Method Crime Total = Partial Crime Total + Q1*(N1) + Q2*(N2) + Q3*(N3) + Q4*(N4) Nx = number of missing values per period (can vary from 0 to 3) Q1 = average of December, January, February crime counts; Q2 = average of March, April, May crime counts; Q3 = average of June, July, August crime counts; Q4 = average of September, October, November crime counts. If N1 = 3, then Q1 = minimum[q2, Q3, Q4]. If N2 or N4 = 3, then Q2 = Q4. If N3 = 3, then Q3 = maximum[q1, Q2, Q4]. If N2 and N4 = 3, then Q2 = Q4 = average[q1, Q3]. If data for three entire quarters were missing, the average of the remaining values are used for the respective Qx.
Example: WV Partial Reporting Imputation Method Q 2 Q 1 Q 3 Q 4 Agency Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Data1 63 52 67 66 69 55 57 71 90 72 99 73 Data2..... 55 57. 90 72. 73 55 55 73.5 72.5 Crime Total = (55+57+90+72+73) + 55*(3) + 55*(2) + 73.5*(1) + 72.5*(1) Crime Total (original) Crime Total without imputation Crime Total with imputation 834 347 768
WV Non-reporting Imputation Method Assign population estimates to municipal and county agencies. Assign MSA status to non-municipal agencies (i.e., county, state police, DNR, task forces, etc.). Group agencies by population groups Calculate group crime rates by summing the total crime then dividing by the total population for all full reporting agencies in each group and multiplying by 100,000. Apply the imputation to non-reporting agencies by multiplying the non-reporting agency s population by the group crime rate then dividing by 100,000.
WV Non-reporting population groups Crime Total = Population Group Crime Rate*Agency s Population 100,000 Group WV Population Groups 1 25,000+ 2 10,000-24,999 3 5,000-9,999 4 2,500-4,999 5 1,000-2,499 + colleges 6 Less than 1,000 7 Not Applicable 8 Non-MSA counties & State Police 9 MSA counties & State Police
Example: WV Non-reporting Imputation Method (overestimate) Agency Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Data1 70 58 72 71 70 100 67 88 82 78 72 65 Data2............ Agency Popula4on MSA Group Group Crime Rate Municipal PD 16,406 N/A 2 6,043 Crime Total (original) Crime Total without imputation Crime Total with imputation 893 0 991
Example: WV Non-reporting Imputation Method (underestimate) Agency Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Data1 123 83 103 108 97 94 106 123 102 74 100 91 Data2............ Agency Popula4on MSA Group Group Crime Rate Municipal PD 17,112 N/A 2 6,043 Crime Total (original) Crime Total without imputation Crime Total with imputation 1,204 0 1,034
8. Calculate statistics Now that data is imputed, you can conduct analyses and create crime trends. Property Crimes, No imputation Property Crimes, with Imputation 50,000 50,000 Crime Count (Property) 48,000 46,000 44,000 42,000 Crime Count (Property) 48,000 46,000 44,000 42,000 40,000 2007 2008 2009 2010 2011 Year 40,000 2007 2008 2009 2010 2011 Year
Results: Data Quality and Imputation Methods Imputation can reasonably estimate for missing data, that would otherwise go undetected and uncounted, and offers a way to strengthen data quality. About 20% of agencies had data quality issues and remained consistent across all years. Alternative imputation methods for partial and nonreporting agencies were more accurate than the methods used by the FBI. Based on MAE and RMSE, the research suggests that reliable state crime totals can be estimated when up to 40% of the data are missing.
Future Directions Testing the imputation methods using other state or jurisdictional data. Investigating imputation methods that do not depend on population data. Developing imputation methods for zero-population agencies.
Questions & Contact Questions? Christina LaValle christina.r.lavalle@wv.gov Stephen Haas stephen.m.haas@wv.gov