Fall 2015 Solutions Biostats691F: Practical Data Management and Statistical Computing Assignment 8: Creating a Preliminary Data Report - The Fetal Lung Maturity Study Data for the study were available in.csv format in the file flmraw.csv. The file contained missing values (unusual characters) for birthweight, which was dropped from the file. Data included 141 records. A check for duplicate records showed one repeat ID with 2 different FLM values. Since it couldn t be determined which record is correct, or if one is an ID error, both were dropped from the analysis file. (Note other options: Could keep 1 of these records and set flm value to missing (.), while keeping hi flm50/flm70. Could keep both re-assigning ID numbers. Make it clear what was done!!) PROGRAM: hw10_2013p1.sas Duplicate records? patid flm resp blood gestage term flm50 flm70 71 160 OK none. unknown hi hi 71 80 OK none. unknown hi hi The analysis file included 139 observations with 9 variables: Data Set Name HW10.FLM1 Observations 139 Member Type DATA Variables 9 Engine V9 Indexes 0 Created Tuesday, December 03, 2013 01:20:10 PM Observation Length 72 Filename e:\sasexamples\flm1.sas7bdat Variables in Creation Order # Variable Type Len Format Label 1 flm Num 8 FLM:*Assay*Value 2 resp Num 8 RESPFMT. RESP:*illness 3 blood Num 8 BLDFMT. BLOOD:*in*sample 4 gestage Num 8 GESTAGE:*at*delivery 5 patid Num 8 PATID:*ID # 6 term Num 8 TERMFMT. TERM:*at*delivery 7 flm50 Num 8 LHFMT. FLM50:*assay*<50 8 flm70 Num 8 LHFMT. FLM70:*assay*<70 9 resp2 Num 8 YNFMT. RESP2:*ill at *birth A review of variable values indicates 20 (14%) cases with respiratory illness (18 RDS, 2 TTN). RESP:*illness resp Frequency Percent OK 119 85.61 RDS 18 12.95 TTN 2 1.44 RESP2:*ill at *birth resp2 Frequency Percent No 119 85.61 Yes 20 14.39 solutions8_2015.docx - 1-12/03/2013
No cases were postdates, 25 (18%) were fullterm, and 98 (71%) were preterm, and 16 (12%) were of unknown gestational age. TERM:*at*delivery Cumulative Cumulative term Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ unknown 16 11.51 16 11.51 preterm 98 70.50 114 82.01 fullterm 25 17.99 139 100.00 Blood was found in 33 (24%) of the samples. BLOOD:*in*sample blood Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ none 106 76.26 blood 33 23.74 A review of the distribution of FLM values indicates they ranged from 9 to 217, with a median (Q1-Q3) of 86 (52-117). Two extreme values were noted (214 for patid=138, 217 for patid=139) but these are retained in the data as legitimate, unless later learn that they are out of range. Using 50 as a cutoff value, 33 (24%) would be considered at risk for immature lungs; using 70 as the cutoff, 53 (38%) would be considered at risk. FLM values Quantile Estimate 100% Max 217 99% 214 95% 160 90% 160 75% Q3 117 50% Median 86 25% Q1 52 10% 25 5% 12 1% 10 0% Min 9 Extreme Observations -------------Lowest-------- -------------Highest------------ Value patid gestage Value patid gestage 9 210 24.6 160 260 32.0 10 219 28.7 160 263 38.7 10 96 29.3 160 270 39.7 10 34. 214 138 38.0 11 257 24.0 217 139 36.0 FLM50:*assay*<50 flm50 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ hi 106 76.26 lo 33 23.74 FLM70:*assay*<70 flm70 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ hi 86 61.87 lo 53 38.13 solutions8_2015.docx - 2-12/03/2013
solutions8_2015.docx - 3-12/03/2013
Review of the association of FLM with the outcome of respiratory illness, indicates that 3 cases (15% of cases) would be missed for at risk status based upon a cutoff value of 50. Using 70 as the cutpoint, 1 case (5% of cases) would be missed. flm50(flm50:*assay*<50) resp2(resp2:*ill at *birth) Frequency Row Pct Col Pct No Yes Total hi 103 3 106 97.17 2.83 86.55 15.00 lo 16 17 33 48.48 51.52 13.45 85.00 Total 119 20 139 flm70(flm70:*assay*<70) resp2(resp2:*ill at *birth) Frequency Row Pct Col Pct No Yes Total hi 85 1 86 98.84 1.16 71.43 5.00 lo 34 19 53 64.15 35.85 28.57 95.00 Total 119 20 139 solutions8_2015.docx - 4-12/03/2013
(Beyond scope of material covered before assignment). The following figures are examples of use of PROC PLOT and PROC SGPLOT to examine association of gestational age and FLM while showing a third factor (blood in sample and outcome(rds/ttn)). These indicate that the one case with respiratory distress (Y) that is missed by both cutpoints also had blood in the sample (b). (Beyond scope of assignment, but FYI) PROC PLOT: Plot of gestage*flm=resp2. G E 50 ˆ S T A G E N N : 40 ˆ N N * N Y N NN N N N a ƒƒƒnƒƒnnnnnnnnnnnnnnƒƒƒƒƒƒnƒƒƒƒƒ t NYNN NNNNNNN NNNN * YNNN Y NNNN N NN d NNYNN N N e 30 ˆ YYN N l Y Y N i Y v Y e Y r y 20 ˆ Šˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒ 0 50 100 150 200 250 FLM:*Assay*Value NOTE: 16 missing. 51 hidden. Plot of gestage*flm=blood. G E 50 ˆ S T A G E n n : 40 ˆ n n * n n b nn n n n a ƒƒƒbƒƒnnbnnnnbnnnnnnƒƒƒƒƒƒnƒƒƒƒƒ t nnbb nnnnnbn nnnn * nbnn n nnnn n nn d bbnnn n b e 30 ˆ nnb b l n n n i b v n e b r y 20 ˆ Šˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒ 0 50 100 150 200 250 FLM:*Assay*Value NOTE: 16 missing. 51 hidden. solutions8_2015.docx - 5-12/03/2013
PROC SGPLOT: solutions8_2015.docx - 6-12/03/2013
Programs: OPTIONS PAGESIZE=60 LINESIZE=115 NOCENTER NONUMBER NODATE ERRORS=3; ***********************************************************; *** PROJECT: BE691f Assignment 10 ***; *** DATE: DEC 2013 ***; *** FILE: hw10_2013p1.sas ***; *** PROGRAMMER: penny pekow ***; *** RE: reading data into SAS ***; *** create formats ***; *** create term variable ***; *** create FLM 50,70 indicators ***; *** check for duplicate records ***; *** check for outliers, odd values ***; *** label, format, store data ***; *** ************************************************ *** INPUT FILES: flmraw.txt data translated ***; *** from Mac system ***; *** OUTPUT FILES: flm1.sas7bdat ***; *** on PEKOW3 \SASexamples ***; ***********************************************************; TITLE1 'PROGRAM: hw10_2013p1.sas '; libname hw10 'e:\sasexamples'; ** edit to correct directory **; ** create and save formats **; proc format cntlout=hw10.flmfmt1; value respfmt 0='OK' 1='RDS' 2='TTN'; value bldfmt value lhfmt value ynfmt 0='none' 1='blood'; 0='hi' 1='lo'; 0='No' 1='Yes'; value termfmt. = 'unknown' 1 = 'preterm' 2 = 'fullterm' 3 = 'postdate'; ** read data into temp SAS file **; data flm1(drop=bw); ** reminder: edit filename to correct directory **; infile 'e:\sasexamples\flmraw.csv' missover delimiter=',' ; input flm resp blood gestage bw patid; ** create term variable **; if gestage <= 0 then term =.; else if 0 < gestage < 37 then term = 1; else if 37<=gestage <=42 then term = 2; else if 42< gestage < 48 then term = 3; solutions8_2015.docx - 7-12/03/2013
** create indicators of LOW test: flm <50, <70 **; flm50 = (0 < flm < 50); flm70 = (0 < flm < 70); label flm = 'FLM:*Assay*Value' resp = 'RESP:*illness' blood = 'BLOOD:*in*sample' gestage = 'GESTAGE:*at*delivery' patid = 'PATID:*ID #' term = 'TERM:*at*delivery' flm50 = 'FLM50:*assay*<50' flm70 = 'FLM70:*assay*<70'; format resp respfmt. blood bldfmt. term termfmt. flm50 flm70 lhfmt.; ** check for duplicate records **; proc sort data=flm1; by patid; data dups; set flm1; by patid; if first.patid = 0 or last.patid=0; /* keeps cases with repeat ids */ proc print data=dups; id patid; title2 'Duplicate records?'; ** save data after deleting dups **; data hw10.flm1; set flm1; ** delete duplicates (both) since can't tell **; ** if it is two people or one, and if just one **; ** don't know which is valid **; if patid=71 then delete; ** create 0/1 indicator of resp illness **; ** 1 = rds or ttn **; resp2 = (resp > 0); label resp2 = 'RESP2:*ill at *birth'; format resp2 ynfmt.; proc contents position data=hw10.flm1; title2 'FLM DATA Contents'; ** simple freq and descriptive stats to look at values **; proc freq data=hw10.flm1; tables resp resp2 blood flm50 flm70 /nocum; tables term; tables term / missing ; title2; proc univariate plot data=hw10.flm1; id patid gestage; var flm gestage; solutions8_2015.docx - 8-12/03/2013
OPTIONS PAGESIZE=60 LINESIZE=80 NOCENTER NONUMBER NODATE errors=3; ***********************************************************; *** PROJECT: PH691f Assignment 10 ***; *** DATE: DEC 2013 ***; *** FILE: hw10_2013p2.sas ***; *** PROGRAMMER: penny pekow ***; *** RE: crosstabs and plots ***; *** ************************************************ *** INPUT FILES: flm1.sas7bdat on PEKOW3 \SASexamples ***; ***********************************************************; TITLE1 'PROGRAM: hw10_2013p2.sas '; libname hw10 'e:\sasexamples'; ** read in formats **; proc format cntlin=hw10.flmfmt1; ** cross-tabulation of flm indicators and resp outcome **; proc freq data=hw10.flm1; tables (flm50 flm70)* resp2 / nopercent; proc plot data=hw10.flm1 vpercent=50 hpercent=50; plot flm * gestage = resp2 / href=37 haxis = 22 to 42 by 5 vref=50 70 vaxis= 0 to 250 by 50 ; plot flm * gestage = blood / href=37 haxis = 22 to 42 by 5 vref=50 70 vaxis= 0 to 250 by 50; ods graphics on / imagename='flm1group'; proc sgplot data=hw10.flm1; ** define variable labels to use as axis labels **; label gestage='gestational Age (weeks)' flm='lung Maturity Assay'; ** define x,y and variable to label data points **; scatter y=flm x=gestage / group=resp2; ods graphics off; ods graphics on / imagename='flm1blood'; proc sgplot data=hw10.flm1; ** define variable labels to use as axis labels **; label gestage='gestational Age (weeks)' flm='lung Maturity Assay'; ** define x,y and variable to label data points **; scatter y=flm x=gestage / group=blood; ods graphics off; solutions8_2015.docx - 9-12/03/2013