Fall 2015 Solutions. Biostats691F: Practical Data Management and Statistical Computing

Similar documents
The SAS System 09:38 Wednesday, December 2, The CANDISC Procedure

5 Populations Estimating Animal Populations by Using the Mark-Recapture Method

wine 1 wine 2 wine 3 person person person person person

Update to A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

Predicting Wine Quality

Organic Chemistry 211 Laboratory Gas Chromatography

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Capacity Utilization. Last Updated: December 21, 2016

Napa Highway 29 Open Wineries

Wine Rating Prediction

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

IT 403 Project Beer Advocate Analysis

US FOODS E-COMMERCE AND TECHNOLOGY OFFERINGS

PSYC 6140 November 16, 2005 ANOVA output in R

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

Simulation of the Frequency Domain Reflectometer in ADS

Appendix Table A1 Number of years since deregulation

Flexible Imputation of Missing Data

TRTP and TRTA in BDS Application per CDISC ADaM Standards Maggie Ci Jiang, Teva Pharmaceuticals, West Chester, PA

Effect of Inocucor on strawberry plants growth and production

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

Improving Capacity for Crime Repor3ng: Data Quality and Imputa3on Methods Using State Incident- Based Repor3ng System Data

F&N 453 Project Written Report. TITLE: Effect of wheat germ substituted for 10%, 20%, and 30% of all purpose flour by

PROC PRINT DATA=simp NOOBS SPLIT="*"; WHERE &outpx GE 1; TITLE2 "Table P1a. Population Simulation Parameters";

Yelp Chanllenge. Tianshu Fan Xinhang Shao University of Washington. June 7, 2013

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

INTERNATIONAL UNDERGRADUATE PROGRAM BINA NUSANTARA UNIVERSITY. Major Marketing Sarjana Ekonomi Thesis Odd semester year 2007

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

The R&D-patent relationship: An industry perspective

From VOC to IPA: This Beer s For You!

Which of the following are resistant statistical measures? 1. Mean 2. Median 3. Mode 4. Range 5. Standard Deviation

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

1.3 Box & Whisker Plots

PARENTAL SCHOOL CHOICE AND ECONOMIC GROWTH IN NORTH CAROLINA

PRODUCTION SOFTWARE FOR WINEMAKERS. Wine Operations and Laboratory Analyses

Food Allergies on the Rise in American Children

Method for the imputation of the earnings variable in the Belgian LFS

Investigation 1: Ratios and Proportions and Investigation 2: Comparing and Scaling Rates

Investigation 1: Ratios and Proportions and Investigation 2: Comparing and Scaling Rates

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

1. Expressed in billions of real dollars, seasonally adjusted, annual rate.

Size Matters: Smaller Batches Yield More Efficient Risk-Limiting Audits

Silage Corn Variety Trial in Central Arizona

#611-7 Workbook REVIEW OF PERCOLATION TESTING PROCEDURES. After completing this chapter, you will be able to...

LM-80 Data. Results from Curve Desk Lamp Lumen Maintenance Testing And Use Of IES LM Data

Silage Corn Variety Trial in Central Arizona

November K. J. Martijn Cremers Lubomir P. Litov Simone M. Sepe

Temperature effect on pollen germination/tube growth in apple pistils

QUARTELY MAIZE MARKET ANALYSIS & OUTLOOK BULLETIN 1 OF 2015

Biologist at Work! Experiment: Width across knuckles of: left hand. cm... right hand. cm. Analysis: Decision: /13 cm. Name

Cafeteria Ordering System, Release 1.0

Online Appendix for. Inattention and Inertia in Household Finance: Evidence from the Danish Mortgage Market,

Washington State Snap-Ed Curriculum Fidelity for Continuous Improvement

Statewide Monthly Participation for Asian Non-Hispanic by Cultural Identities, Months of Prenatal Participation & Breastfeeding Amount

Read & Download (PDF Kindle) The Everything Pressure Cooker Cookbook

The Moscone Center Production Guide

Topic: Preventing Cross-Contamination

QUARTERLY REVIEW OF THE PERFORMANCE OF THE DAIRY INDUSTRY 1

Subject: Industry Standard for a HACCP Plan, HACCP Competency Requirements and HACCP Implementation

Tamanend Wine Consulting

1. Identify environmental conditions (temperature) and nutritional factors (i.e. sugar and fat) that encourages the growth of bacteria.

You know what you like, but what about everyone else? A Case study on Incomplete Block Segmentation of white-bread consumers.

ARM4 Advances: Genetic Algorithm Improvements. Ed Downs & Gianluca Paganoni

National 5 ADDITIONAL QUESTION BANK You have chosen to study: Statistics. Please choose a question to attempt from the following: Back to Unit 2 Menu

Global Protein-Based Multiplex Assay Market Research Report 2021

Washington Vineyard Acreage Report: 2011

STAT 5302 Applied Regression Analysis. Hawkins

APFEL package and interface

Please sign and date here to indicate that you have read and agree to abide by the above mentioned stipulations. Student Name #4

The Fruits We Eat. The Fruits We Eat

Biosignal Processing Mari Karsikas

Roux Bot Home Cooker. UC Santa Cruz, Baskin Engineering Senior Design Project 2015

Candidate Agreement. The American Wine School (AWS) WSET Level 4 Diploma in Wines & Spirits Program PURPOSE

The Development of a Weather-based Crop Disaster Program

Problem How does solute concentration affect the movement of water across a biological membrane?

Audrey Page. Brooke Sacksteder. Kelsi Buckley. Title: The Effects of Black Beans as a Flour Replacer in Brownies. Abstract:

Pinto and Great Northern Bean Prices: Historical Trends and Seasonal Patterns

FOOD SAFETY RISK ASSESSMENT FOR CCS CANTEENS AND KITCHENS. General purpose Catering kitchen, Hot & Cold Drinks. Created on 22/11/2010

Experiment 2: ANALYSIS FOR PERCENT WATER IN POPCORN

FINISHED FOOD SPECIFICATION SHEET

Notes on the Philadelphia Fed s Real-Time Data Set for Macroeconomists (RTDSM) Indexes of Aggregate Weekly Hours. Last Updated: December 22, 2016

Labor Requirements and Costs for Harvesting Tomatoes. Zhengfei Guan, 1 Feng Wu, and Steven Sargent University of Florida

GRADE: 11. Time: 60 min. WORKSHEET 2

Which of your fingernails comes closest to 1 cm in width? What is the length between your thumb tip and extended index finger tip? If no, why not?

De La Salle University Dasmariñas

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

Factors influencing growth performance and estimation of genetic parameters in crossbred pigs

Caffeine in Energy Drinks

EXECUTIVE SUMMARY. 1. When do Asian clams reproduce in Lake George? 2. How fast do Asian clams grow in Lake George?

Biocides IT training Vienna - 4 December 2017 IUCLID 6

THE INTERNATIONAL OLIVE COUNCIL

ENGI E1006 Percolation Handout

GCSE 4091/01 DESIGN AND TECHNOLOGY UNIT 1 FOCUS AREA: Food Technology

2 nd Midterm Exam-Solution

ALBINISM AND ABNORMAL DEVELOPMENT OF AVOCADO SEEDLINGS 1

Transcription:

Fall 2015 Solutions Biostats691F: Practical Data Management and Statistical Computing Assignment 8: Creating a Preliminary Data Report - The Fetal Lung Maturity Study Data for the study were available in.csv format in the file flmraw.csv. The file contained missing values (unusual characters) for birthweight, which was dropped from the file. Data included 141 records. A check for duplicate records showed one repeat ID with 2 different FLM values. Since it couldn t be determined which record is correct, or if one is an ID error, both were dropped from the analysis file. (Note other options: Could keep 1 of these records and set flm value to missing (.), while keeping hi flm50/flm70. Could keep both re-assigning ID numbers. Make it clear what was done!!) PROGRAM: hw10_2013p1.sas Duplicate records? patid flm resp blood gestage term flm50 flm70 71 160 OK none. unknown hi hi 71 80 OK none. unknown hi hi The analysis file included 139 observations with 9 variables: Data Set Name HW10.FLM1 Observations 139 Member Type DATA Variables 9 Engine V9 Indexes 0 Created Tuesday, December 03, 2013 01:20:10 PM Observation Length 72 Filename e:\sasexamples\flm1.sas7bdat Variables in Creation Order # Variable Type Len Format Label 1 flm Num 8 FLM:*Assay*Value 2 resp Num 8 RESPFMT. RESP:*illness 3 blood Num 8 BLDFMT. BLOOD:*in*sample 4 gestage Num 8 GESTAGE:*at*delivery 5 patid Num 8 PATID:*ID # 6 term Num 8 TERMFMT. TERM:*at*delivery 7 flm50 Num 8 LHFMT. FLM50:*assay*<50 8 flm70 Num 8 LHFMT. FLM70:*assay*<70 9 resp2 Num 8 YNFMT. RESP2:*ill at *birth A review of variable values indicates 20 (14%) cases with respiratory illness (18 RDS, 2 TTN). RESP:*illness resp Frequency Percent OK 119 85.61 RDS 18 12.95 TTN 2 1.44 RESP2:*ill at *birth resp2 Frequency Percent No 119 85.61 Yes 20 14.39 solutions8_2015.docx - 1-12/03/2013

No cases were postdates, 25 (18%) were fullterm, and 98 (71%) were preterm, and 16 (12%) were of unknown gestational age. TERM:*at*delivery Cumulative Cumulative term Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ unknown 16 11.51 16 11.51 preterm 98 70.50 114 82.01 fullterm 25 17.99 139 100.00 Blood was found in 33 (24%) of the samples. BLOOD:*in*sample blood Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ none 106 76.26 blood 33 23.74 A review of the distribution of FLM values indicates they ranged from 9 to 217, with a median (Q1-Q3) of 86 (52-117). Two extreme values were noted (214 for patid=138, 217 for patid=139) but these are retained in the data as legitimate, unless later learn that they are out of range. Using 50 as a cutoff value, 33 (24%) would be considered at risk for immature lungs; using 70 as the cutoff, 53 (38%) would be considered at risk. FLM values Quantile Estimate 100% Max 217 99% 214 95% 160 90% 160 75% Q3 117 50% Median 86 25% Q1 52 10% 25 5% 12 1% 10 0% Min 9 Extreme Observations -------------Lowest-------- -------------Highest------------ Value patid gestage Value patid gestage 9 210 24.6 160 260 32.0 10 219 28.7 160 263 38.7 10 96 29.3 160 270 39.7 10 34. 214 138 38.0 11 257 24.0 217 139 36.0 FLM50:*assay*<50 flm50 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ hi 106 76.26 lo 33 23.74 FLM70:*assay*<70 flm70 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ hi 86 61.87 lo 53 38.13 solutions8_2015.docx - 2-12/03/2013

solutions8_2015.docx - 3-12/03/2013

Review of the association of FLM with the outcome of respiratory illness, indicates that 3 cases (15% of cases) would be missed for at risk status based upon a cutoff value of 50. Using 70 as the cutpoint, 1 case (5% of cases) would be missed. flm50(flm50:*assay*<50) resp2(resp2:*ill at *birth) Frequency Row Pct Col Pct No Yes Total hi 103 3 106 97.17 2.83 86.55 15.00 lo 16 17 33 48.48 51.52 13.45 85.00 Total 119 20 139 flm70(flm70:*assay*<70) resp2(resp2:*ill at *birth) Frequency Row Pct Col Pct No Yes Total hi 85 1 86 98.84 1.16 71.43 5.00 lo 34 19 53 64.15 35.85 28.57 95.00 Total 119 20 139 solutions8_2015.docx - 4-12/03/2013

(Beyond scope of material covered before assignment). The following figures are examples of use of PROC PLOT and PROC SGPLOT to examine association of gestational age and FLM while showing a third factor (blood in sample and outcome(rds/ttn)). These indicate that the one case with respiratory distress (Y) that is missed by both cutpoints also had blood in the sample (b). (Beyond scope of assignment, but FYI) PROC PLOT: Plot of gestage*flm=resp2. G E 50 ˆ S T A G E N N : 40 ˆ N N * N Y N NN N N N a ƒƒƒnƒƒnnnnnnnnnnnnnnƒƒƒƒƒƒnƒƒƒƒƒ t NYNN NNNNNNN NNNN * YNNN Y NNNN N NN d NNYNN N N e 30 ˆ YYN N l Y Y N i Y v Y e Y r y 20 ˆ Šˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒ 0 50 100 150 200 250 FLM:*Assay*Value NOTE: 16 missing. 51 hidden. Plot of gestage*flm=blood. G E 50 ˆ S T A G E n n : 40 ˆ n n * n n b nn n n n a ƒƒƒbƒƒnnbnnnnbnnnnnnƒƒƒƒƒƒnƒƒƒƒƒ t nnbb nnnnnbn nnnn * nbnn n nnnn n nn d bbnnn n b e 30 ˆ nnb b l n n n i b v n e b r y 20 ˆ Šˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒ 0 50 100 150 200 250 FLM:*Assay*Value NOTE: 16 missing. 51 hidden. solutions8_2015.docx - 5-12/03/2013

PROC SGPLOT: solutions8_2015.docx - 6-12/03/2013

Programs: OPTIONS PAGESIZE=60 LINESIZE=115 NOCENTER NONUMBER NODATE ERRORS=3; ***********************************************************; *** PROJECT: BE691f Assignment 10 ***; *** DATE: DEC 2013 ***; *** FILE: hw10_2013p1.sas ***; *** PROGRAMMER: penny pekow ***; *** RE: reading data into SAS ***; *** create formats ***; *** create term variable ***; *** create FLM 50,70 indicators ***; *** check for duplicate records ***; *** check for outliers, odd values ***; *** label, format, store data ***; *** ************************************************ *** INPUT FILES: flmraw.txt data translated ***; *** from Mac system ***; *** OUTPUT FILES: flm1.sas7bdat ***; *** on PEKOW3 \SASexamples ***; ***********************************************************; TITLE1 'PROGRAM: hw10_2013p1.sas '; libname hw10 'e:\sasexamples'; ** edit to correct directory **; ** create and save formats **; proc format cntlout=hw10.flmfmt1; value respfmt 0='OK' 1='RDS' 2='TTN'; value bldfmt value lhfmt value ynfmt 0='none' 1='blood'; 0='hi' 1='lo'; 0='No' 1='Yes'; value termfmt. = 'unknown' 1 = 'preterm' 2 = 'fullterm' 3 = 'postdate'; ** read data into temp SAS file **; data flm1(drop=bw); ** reminder: edit filename to correct directory **; infile 'e:\sasexamples\flmraw.csv' missover delimiter=',' ; input flm resp blood gestage bw patid; ** create term variable **; if gestage <= 0 then term =.; else if 0 < gestage < 37 then term = 1; else if 37<=gestage <=42 then term = 2; else if 42< gestage < 48 then term = 3; solutions8_2015.docx - 7-12/03/2013

** create indicators of LOW test: flm <50, <70 **; flm50 = (0 < flm < 50); flm70 = (0 < flm < 70); label flm = 'FLM:*Assay*Value' resp = 'RESP:*illness' blood = 'BLOOD:*in*sample' gestage = 'GESTAGE:*at*delivery' patid = 'PATID:*ID #' term = 'TERM:*at*delivery' flm50 = 'FLM50:*assay*<50' flm70 = 'FLM70:*assay*<70'; format resp respfmt. blood bldfmt. term termfmt. flm50 flm70 lhfmt.; ** check for duplicate records **; proc sort data=flm1; by patid; data dups; set flm1; by patid; if first.patid = 0 or last.patid=0; /* keeps cases with repeat ids */ proc print data=dups; id patid; title2 'Duplicate records?'; ** save data after deleting dups **; data hw10.flm1; set flm1; ** delete duplicates (both) since can't tell **; ** if it is two people or one, and if just one **; ** don't know which is valid **; if patid=71 then delete; ** create 0/1 indicator of resp illness **; ** 1 = rds or ttn **; resp2 = (resp > 0); label resp2 = 'RESP2:*ill at *birth'; format resp2 ynfmt.; proc contents position data=hw10.flm1; title2 'FLM DATA Contents'; ** simple freq and descriptive stats to look at values **; proc freq data=hw10.flm1; tables resp resp2 blood flm50 flm70 /nocum; tables term; tables term / missing ; title2; proc univariate plot data=hw10.flm1; id patid gestage; var flm gestage; solutions8_2015.docx - 8-12/03/2013

OPTIONS PAGESIZE=60 LINESIZE=80 NOCENTER NONUMBER NODATE errors=3; ***********************************************************; *** PROJECT: PH691f Assignment 10 ***; *** DATE: DEC 2013 ***; *** FILE: hw10_2013p2.sas ***; *** PROGRAMMER: penny pekow ***; *** RE: crosstabs and plots ***; *** ************************************************ *** INPUT FILES: flm1.sas7bdat on PEKOW3 \SASexamples ***; ***********************************************************; TITLE1 'PROGRAM: hw10_2013p2.sas '; libname hw10 'e:\sasexamples'; ** read in formats **; proc format cntlin=hw10.flmfmt1; ** cross-tabulation of flm indicators and resp outcome **; proc freq data=hw10.flm1; tables (flm50 flm70)* resp2 / nopercent; proc plot data=hw10.flm1 vpercent=50 hpercent=50; plot flm * gestage = resp2 / href=37 haxis = 22 to 42 by 5 vref=50 70 vaxis= 0 to 250 by 50 ; plot flm * gestage = blood / href=37 haxis = 22 to 42 by 5 vref=50 70 vaxis= 0 to 250 by 50; ods graphics on / imagename='flm1group'; proc sgplot data=hw10.flm1; ** define variable labels to use as axis labels **; label gestage='gestational Age (weeks)' flm='lung Maturity Assay'; ** define x,y and variable to label data points **; scatter y=flm x=gestage / group=resp2; ods graphics off; ods graphics on / imagename='flm1blood'; proc sgplot data=hw10.flm1; ** define variable labels to use as axis labels **; label gestage='gestational Age (weeks)' flm='lung Maturity Assay'; ** define x,y and variable to label data points **; scatter y=flm x=gestage / group=blood; ods graphics off; solutions8_2015.docx - 9-12/03/2013