Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

Similar documents
To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

THE STATISTICAL SOMMELIER

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

This appendix tabulates results summarized in Section IV of our paper, and also reports the results of additional tests.

Predicting Wine Quality

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

OF THE VARIOUS DECIDUOUS and

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

IT 403 Project Beer Advocate Analysis

Relation between Grape Wine Quality and Related Physicochemical Indexes

STAT 5302 Applied Regression Analysis. Hawkins

Online Appendix to The Effect of Liquidity on Governance

Appendix Table A1 Number of years since deregulation

Valuation in the Life Settlements Market

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Appendix A. Table A.1: Logit Estimates for Elasticities

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

A Hedonic Analysis of Retail Italian Vinegars. Summary. The Model. Vinegar. Methodology. Survey. Results. Concluding remarks.

Napa Highway 29 Open Wineries

Statistics & Agric.Economics Deptt., Tocklai Experimental Station, Tea Research Association, Jorhat , Assam. ABSTRACT

Gasoline Empirical Analysis: Competition Bureau March 2005

wine 1 wine 2 wine 3 person person person person person

Appendix A. Table A1: Marginal effects and elasticities on the export probability

Not to be published - available as an online Appendix only! 1.1 Discussion of Effects of Control Variables

Regression Models for Saffron Yields in Iran

PSYC 6140 November 16, 2005 ANOVA output in R

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Lollapalooza Did Not Attend (n = 800) Attended (n = 438)

Internet Appendix for CEO Personal Risk-taking and Corporate Policies TABLE IA.1 Pilot CEOs and Firm Risk (Controlling for High Performance Pay)

Online Appendix to Voluntary Disclosure and Information Asymmetry: Evidence from the 2005 Securities Offering Reform

Climate change may alter human physical activity patterns

Hybrid ARIMA-ANN Modelling for Forecasting the Price of Robusta Coffee in India

INFLUENCE OF THIN JUICE ph MANAGEMENT ON THICK JUICE COLOR IN A FACTORY UTILIZING WEAK CATION THIN JUICE SOFTENING

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

What does radical price change and choice reveal?

Effects of Election Results on Stock Price Performance: Evidence from 1976 to 2008

CORRELATIONS BETWEEN CUTICLE WAX AND OIL IN AVOCADOS

Citrus Attributes: Do Consumers Really Care Only About Seeds? Lisa A. House 1 and Zhifeng Gao

Evaluating Population Forecast Accuracy: A Regression Approach Using County Data

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Missing Data Treatments

Investment Wines. - Risk Analysis. Prepared by: Michael Shortell & Adiam Woldetensae Date: 06/09/2015

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Table A.1: Use of funds by frequency of ROSCA meetings in 9 research sites (Note multiple answers are allowed per respondent)

The Development of a Weather-based Crop Disaster Program

Risk Assessment Project II Interim Report 2 Validation of a Risk Assessment Instrument by Offense Gravity Score for All Offenders

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Flexible Working Arrangements, Collaboration, ICT and Innovation

Labor Supply of Married Couples in the Formal and Informal Sectors in Thailand

Detecting Melamine Adulteration in Milk Powder

Multiple Imputation for Missing Data in KLoSA

Structural Reforms and Agricultural Export Performance An Empirical Analysis

From VOC to IPA: This Beer s For You!

An Advanced Tool to Optimize Product Characteristics and to Study Population Segmentation

Homework 1 - Solutions. Problem 2

FACTORS DETERMINING UNITED STATES IMPORTS OF COFFEE

Online Appendix for. To Buy or Not to Buy: Consumer Constraints in the Housing Market

Internet Appendix. For. Birds of a feather: Value implications of political alignment between top management and directors

Ex-Ante Analysis of the Demand for new value added pulse products: A

Internet Appendix for Does Stock Liquidity Enhance or Impede Firm Innovation? *

Increasing Toast Character in French Oak Profiles

Oenometrics VII Conference Reims, May 11-13, Predicting Italian wines quality from weather data and experts ratings (DRAFT)

Buying Filberts On a Sample Basis

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Table 1: Number of patients by ICU hospital level and geographical locality.

AMERICAN ASSOCIATION OF WINE ECONOMISTS

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Customs Policies and Trade Efficiency

The age of reproduction The effect of university tuition fees on enrolment in Quebec and Ontario,

Power and Priorities: Gender, Caste, and Household Bargaining in India

ICT Use and Exports. Patricia Kotnik, Eva Hagsten. This is a working draft. Please do not cite or quote without permission of the authors.

ONLINE APPENDIX APPENDIX A. DESCRIPTION OF U.S. NON-FARM PRIVATE SECTORS AND INDUSTRIES

THE IMPACT OF THE DEEPWATER HORIZON GULF OIL SPILL ON GULF COAST REAL ESTATE MARKETS

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

Imputation of multivariate continuous data with non-ignorable missingness

Using Growing Degree Hours Accumulated Thirty Days after Bloom to Help Growers Predict Difficult Fruit Sizing Years

Nuclear reactors construction costs: The role of lead-time, standardization and technological progress

A New Approach for Smoothing Soil Grain Size Curve Determined by Hydrometer

Online Appendix. for. Female Leadership and Gender Equity: Evidence from Plant Closure

Analysis of Things (AoT)

Acetic acid dissociates immediately in solution. Reaction A does not react further following the sample taken at the end of

The Effects of Presidential Politics on CEO Compensation

Flexible Imputation of Missing Data

BLUEBERRY MUFFIN APPLICATION RESEARCH COMPARING THE FUNCTIONALITY OF EGGS TO EGG REPLACERS IN BLUEBERRY MUFFIN FORMULATIONS RESEARCH SUMMARY

Modeling Wine Quality Using Classification and Regression. Mario Wijaya MGT 8803 November 28, 2017

Introduction Methods

November K. J. Martijn Cremers Lubomir P. Litov Simone M. Sepe

D Lemmer and FJ Kruger

On-line Appendix for the paper: Sticky Wages. Evidence from Quarterly Microeconomic Data. Appendix A. Weights used to compute aggregate indicators

Financing Decisions of REITs and the Switching Effect

Varietal Specific Barrel Profiles

Transcription:

Curtis Miller MATH 3080 Final Project pg. 1 Curtis Miller 4/10/14 MATH 3080 Final Project Problem 1: Car Data The first question asks for an analysis on car data. The data was collected from the Kelly Blue Book by S. Kiuper in 2008, and included a number of variables on 2005 General Motors vehicles, including: vehicles suggested price; the number of miles the vehicle has been driven (mileage); the manufacturer of the vehicle (make); the model of the vehicle; the specific type of car model of the vehicle (trim); the body type; the number of cylinders the vehicle has in the engine; the engine size (liter); the number of doors; whether the vehicle has cruise control; whether the vehicle has upgraded speakers; and whether the vehicle has leather seats. I developed linear regression models to try to describe the price of the vehicle as a function of other variables. My first model was a simple linear regression model that did not transform the data other than to turn certain categorical variables (make, model, trim, type, cruise, sound, and leather) into dummy variables in the model. In the case of cruise, sound, and leather, there is only one dummy variable in the model that takes the value 1 if the respective condition is met and 0 otherwise. Make, model, trim, and type are much more complex, and a dummy is created for each of the possible conditions save 1 (so there are dummy variables for all but one make of car, for example). The coefficients of all the variables are then estimated, and the resulting model analyzed. All estimated models for the car data are listed in Table 1. The coefficients for the type dummy variables and liters could not be estimated, and were dropped by the R statistical package. Those variables must have been linearly dependent on other variables in the model. A number of coefficients are statistically different from zero, and of all the models, model 1 has the highest adjusted. But the diagnostic plots (Figure 2) of

Curtis Miller MATH 3080 Final Project pg. 2 the model show that model 1 does not fit the Gauss-Markov assumptions of linear regression very well. The plot of residuals vs. fitted values shows that the variance of the residuals is not constant, and the residuals are not independent of the fitted values. The Q-Q plot of the residuals suggests that they are not normally distributed, which violates another vital assumption. The plot of residuals vs. leverage indicates that there are influential datapoints. The residual vs. index plot shows that there are a few very large residuals, which is troubling. The residual vs model plot and the residual vs mileage plot indicate that the variance of the residuals is not independent of two important variables. But the actual vs. predicted price is fairly linear, which indicates that linear relationships are probably appropriate. These factors together suggest that the first model has problems. I create my other models to try to rectify these problems. For model 2, I took a different approach when choosing which variables to include. Using Figure 1, I examined the scatterplots of the variables plotted against each other along with their correlation coefficients. I concluded that some of the variables were too similar to each other, raising a problem with multicollinearity, so I decided to drop some variables that appeared to be redundant. Those variables included doors and cylinders. After examining the coefficients for the variables in model 1, I decided that trim did not contribute much to the model, so I dropped the dummy variables associated with trim as well. All other variables remained in the model, unmodified. The adjusted of the model dropped, but some of the problems that were apparent in the diagnostic plot of the first model improved, though the model is still not perfect (see Figure 3). The residuals variance is less a function of fitted values than before, and the Q-Q plot of the residuals suggested they are more normally distributed than before, but there is still heteroskedasticity in the residuals and they are still not normally distributed. There are still influential datapoints, as indicated by the residual vs. leverage plot, and the residuals vs. index

Curtis Miller MATH 3080 Final Project pg. 3 plot is still not perfect, but the other plots have improved. Meanwhile, most of the coefficients in the second model are statistically different from zero, save for the coefficients of cruise and sound, and leather is statistically significant only at the level. So I felt I should consider two more models. The third model I estimated is similar to the second, but I removed the variables for cruise, sound, and leather. Dropping those variables changed little in the model; no variables changed level of significance, the adjusted did not change, and the diagnostic plots (Figure 4) did not change much. Thus the variables leather, cruise, and sound do not appear to contribute much to predicting the price of a car. The fourth model is identical to the third save for mileage; I transformed the mileage variable, estimating a coefficient for 1 rather than mileage as a linear variable. My motivation for the change was that I felt that mileage need not have a constant impact on price. In the other models, a one-unit change in mileage is associated with a change in price. But by transforming mileage, the new interpretation of the coefficient is that a 1% change in mileage is associated with a change in price (this I learned in econometrics). I found this more theoretically appealing; the difference in price between the 100 th and the 200 th mile need not be the same as the difference between the 1100 th and the 1200 th mile. However, the fourth model did not fare as well as the second and third models. It has the lowest adjusted of all the models, and the diagnostic plots (Figure 5) are not much better than the diagnostic plots of the third model; while the residuals appear to be somewhat more homoskedastic and appropriately centered, they are less normal than the residuals of the third model. Worse, the residual vs. mileage plot became worse; it appears that the variances of the 1 Note that log is the natural log, sometimes denoted ln.

Curtis Miller MATH 3080 Final Project pg. 4 residuals are not independent of mileage. I conclude that the fourth model is not much of an improvement over the third model. Given the choice between these four models, I prefer the third. It fits the data reasonably well, and while it does not fit the assumptions necessary for statistical inference in linear regression very well, it fares better than any other model estimated. The third model also does not include variables that have been found to have minimal impact on price (save for some dummies). Thus I feel that the third model is the best model for predicting price. Problem 2: Dolphin Data The second question asks for an analysis of the sound pressure of dolphin sonar signals compared to the distance (range) from the dolphin to the target. The data is from Marianne Rasmussen, collected off the coast of Iceland near Keflavik. The pressures were corrected for water density and were expected to increase with distance. The first model I estimated was a simple linear model. The equation of the model is: The estimated regression coefficients are listed in Table 2 (along with the coefficients of all other regression equations I estimated). As expected, the coefficient for range is positive and statistically different from zero. However, the adjusted is not very high. Furthermore, looking at Figure 6, the assumptions necessary for statistically evaluating the model are violated. The Q-Q plot of the residuals indicates that the residuals do not appear to be normally distributed, and there is evidence of heteroskedasticity in the plot of the residuals versus the fitted values. The plot of residuals vs. index shows there are a few large residuals, but otherwise appears fine. The residuals vs. range plot, though, shows that the residuals are not independent of range, which I do not desire.the scatter plot of the data (the red line being the estimated regression line) not only shows evidence of heteroskedasticity, but suggests that a linear model does not

Curtis Miller MATH 3080 Final Project pg. 5 appear to be appropriate; the data is curved, and the data is more or less likely to fall above or below the line depending on the range. The plot of actual versus predicted values also shows a bend rather than a linear fit, providing further evidence that a linear model may not be appropriate for this data. This analysis of the linear model suggests that a nonlinear model would provide a better fit of the data. I consider two alternatives to the linear model: a logarithmic model, and a quadratic model. The equation of the logarithmic model (the second model) is: Note that when interpreting the coefficient in the model, a 1% change in range is associated with a change of in the sound pressure. The equation of the quadratic model is: (There is no simple interpretation of the coefficients of the quadratic model.) The diagnostic plots for the logarithmic and quadratic models are shown in Figure 7 and Figure 8, respectively. Both models have similar benefits and pitfalls. Their adjusted s are higher than the adjusted of the linear model (the adjusted of the logarithmic model being the highest of them all). The Q-Q plots of the residuals of both models are more normal than the Q-Q plot of the linear model, but neither are perfect, so the normality assumption does not hold for either model. Judging by the residual vs. fitted value plots and the residuals vs. range plots, both models appear to be less heteroskedastic than the linear model, though they are not perfectly homoskedastic and the quadratic model s plot seems to indicate that the distribution of the residuals in the quadratic model may not be independent of the fitted values. The actual vs. predicted value plots are much more linear than in the linear model for both models, though the logarithmic model s plot is the most linear of them all. Finally, the scatterplots show that the

Curtis Miller MATH 3080 Final Project pg. 6 estimated equations for the linear and quadratic models have shapes similar to the general shape of the scatter plot, though the logarithmic model appears to provide the best fit of all models. Given the choice between all the models, I feel that the logarithmic model of the data is the best. Not only does the logarithmic model have the highest adjusted, its diagnostic plots are the least problematic of all the models (though not perfect). The linear model is the worst, and the quadratic model falls in the middle. So I recommend the logarithmic model for describing the data. According to the linear model, the predicted sound pressure of a dolphin s sonar signal from a range of nine meters is 207.169 db. The logarithmic model, in comparison, would predict a sound pressure of 207.647 db. Since I prefer the logarithmic model, I believe its prediction to be better (though the difference between the two is very small).

Curtis Miller MATH 3080 Final Project pg. 7 Appendix: Tables and Figures Table 1: Regression tables for car prices Model 1 Model 2 Model 3 Model 4 (Intercept) 26408.03 (514.17) *** 24658.68 (1713.45) *** 24761.12 (1707.75) *** 40869.22 (2077.97) *** Mileage -0.19 (0.00) *** -0.18 (0.01) *** -0.18 (0.01) *** Make Cadillac 39454.35 (572.73) *** 30297.33 (719.38) *** 30490.59 (719.13) *** 30184.94 (806.97) *** Make Chevrolet -6518.50 (484.13) *** -4232.57 (461.38) *** -4243.81 (462.60) *** -4146.36 (518.58) *** Make Pontiac -7952.81 (574.94) *** -3976.87 (1043.12) *** -3868.11 (1039.43) *** -4147.29 (1164.93) *** Make SAAB 5253.56 (879.93) *** 2980.00 (1042.18) ** 3103.02 (1045.17) ** 2947.59 (1171.56) * Make Saturn -7037.11 (568.86) *** -5919.03 (647.33) *** -5957.52 (649.34) *** -5857.36 (727.86) *** Model 9_3-2410.90 (783.76) ** 3616.52 (923.20) *** 3535.74 (925.98) *** 3415.01 (1038.00) ** Model 9_3 HO 3808.90 (670.55) *** 4650.13 (895.08) *** 4645.06 (897.86) *** 4626.28 (1006.58) *** Model 9_5 5246.46 (536.60) *** 6248.72 (775.92) *** 6428.74 (775.30) *** 6234.83 (868.91) *** Model 9_5 HO 5801.01 (782.73) *** 4718.39 (812.42) *** 4878.94 (812.07) *** 4802.66 (910.40) *** Model AVEO -6419.73 (437.35) *** -3753.77 (979.35) *** -3672.80 (977.58) *** -4153.52 (1095.24) *** Model Bonneville 5405.50 (410.77) *** 326.64 (1122.51) 272.45 (1120.88) 596.48 (1256.22) Model Cavalier -4420.89 (375.26) *** -3392.03 (752.12) *** -3038.35 (740.40) *** -3337.42 (829.65) *** Model Century -7216.45 (404.09) *** -6441.44 (625.47) *** -6410.25 (627.75) *** -6300.84 (703.77) *** Model Classic -2892.15 (491.26) *** -2347.87 (854.20) ** -2007.48 (845.16) * -2276.38 (947.11) * Model Cobalt -3466.84 (376.16) *** -2386.63 (755.98) ** -2037.12 (743.19) ** -2313.26 (832.83) ** Model Corvette 19642.07 (500.65) *** 10560.64 (1060.94) *** 10638.66 (1064.63) *** 10853.81 (1193.30) *** Model CST-V -17632.51 (571.79) *** -13833.28 (861.43) *** -13975.70 (862.98) *** -13601.00 (967.44) *** Model CTS -32789.49 (568.84) *** -22165.54 (1040.36) *** -22067.12 (1042.23) *** -21552.46 (1169.35) *** Model Deville -29643.86 (571.59) *** -19919.45 (614.67) *** -20040.34 (611.81) *** -19610.94 (686.86) *** Model G6 4262.34 (558.08) *** 479.31 (939.62) 370.00 (935.59) 519.85 (1048.65) Model Grand Am -495.74 (576.66) -3574.18 (719.71) *** -3372.41 (719.22) *** -3107.03 (805.93) *** Model Grand Prix 1735.12 (560.04) ** -1677.15 (1027.83) -1851.74 (1022.80) -1723.31 (1146.43) Model GTO 13984.50 (599.27) *** 4514.45 (1761.26) * 4519.35 (1764.87) * 5018.34 (1977.92) * Model Impala 1095.66 (428.52) * 380.33 (424.12) 425.67 (424.63) 2.42 (476.00) Model Ion -2833.01 (523.98) *** -569.52 (630.76) -478.48 (620.08) -851.39 (694.49) Model Lacrosse -348.90 (568.38) -2202.68 (413.53) *** -2202.72 (415.02) *** -2204.77 (465.32) *** Model Lesabre -1690.57 (569.23) ** -3915.49 (448.71) *** -3912.65 (450.22) *** -3997.29 (504.62) *** Model Malibu -608.26 (419.67) -2225.72 (419.52) *** -1991.64 (407.58) *** -2160.24 (456.81) *** Model STS-V6-25399.21 (570.89) *** -16665.20 (825.74) *** -16675.26 (826.17) *** -16435.07 (926.75) *** Model STS-V8-19845.82 (571.67) *** -13457.73 (713.22) *** -13541.59 (713.45) *** -13573.96 (799.72) *** Model Sunfire -2572.78 (595.68) *** -3842.32 (709.12) *** -3598.75 (706.93) *** -3337.73 (792.67) *** Trim Aero Sedan 4D -6468.94 (404.21) *** Trim Aero Wagon 4D -4920.24 (571.83) *** Trim Arc Conv 2D 3535.82 (404.06) *** Trim Arc Sedan 4D -3583.32 (407.73) *** Trim Arc Wagon 4D -3074.13 (573.22) *** Trim AWD Sportwagon 4D 1411.07 (405.98) *** Trim Conv 2D 5035.17 (594.25) *** Trim Coupe 2D 71.16 (434.33) Trim Custom Sedan 4D -2863.05 (404.25) *** Trim CX Sedan 4D -2732.57 (404.40) *** Trim CXL Sedan 4D -1387.57 (404.72) *** Trim DHS Sedan 4D 4598.27 (567.99) *** Trim DTS Sedan 4D 5246.46 (568.11) ***

Curtis Miller MATH 3080 Final Project pg. 8 Model 1 Model 2 Model 3 Model 4 Trim GT Coupe 2D 373.47 (572.57) Trim GT Sedan 4D 641.11 (493.68) Trim GT Sportwagon 1065.74 (405.20) ** Trim GTP Sedan 4D 3506.23 (553.55) *** Trim GXP Sedan 4D 2346.55 (405.53) *** Trim Linear Conv 2D 7117.24 (404.51) *** Trim Linear Wagon 4D -3821.23 (573.43) *** Trim LS Coupe 2D 651.03 (434.50) Trim LS Hatchback 4D 1289.81 (405.56) ** Trim LS MAXX Hback 4D 1068.17 (487.55) * Trim LS Sedan 4D 1000.76 (368.95) ** Trim LS Sport Coupe 2D 455.87 (496.72) Trim LS Sport Sedan 4D 1082.19 (496.37) * Trim LT Coupe 2D 4013.17 (595.03) *** Trim LT Hatchback 4D 1537.76 (412.91) *** Trim LT MAXX Hback 4D 1479.81 (487.13) ** Trim LT Sedan 4D 1127.20 (368.21) ** Trim MAXX Hback 4D 727.98 (489.19) Trim Quad Coupe 2D 1546.22 (477.68) ** Trim SE Sedan 4D -1389.63 (405.10) *** Trim Sedan 4D -92.62 (400.79) Trim Special Ed Ultra 4D 1653.00 (568.69) ** Trim SS Coupe 2D 5412.80 (597.04) *** Trim SS Sedan 4D 6183.01 (510.69) *** Trim SVM Hatchback 4D -794.84 (407.22) Cruise 69.40 (101.52) 24.74 (156.78) Sound 211.26 (79.76) ** 215.67 (121.17) Leather 295.36 (92.87) ** 333.38 (140.46) * Type Coupe -6186.96 (355.42) *** -6210.08 (356.04) *** -6158.61 (399.07) *** Type Hatchback -6292.90 (414.19) *** -6220.53 (414.57) *** -5997.48 (464.43) *** Type Sedan -6476.88 (321.90) *** -6478.99 (322.82) *** -6365.97 (361.74) *** Type Wagon -5743.29 (525.64) *** -5814.46 (526.84) *** -5595.90 (590.57) *** Liter 2348.71 (429.13) *** 2401.27 (429.69) *** 2300.32 (481.58) *** log(mileage) -1982.02 (88.98) *** R 2 0.99 0.98 0.98 0.98 Adj. R 2 0.99 0.98 0.98 0.97 Num. obs. 804 804 804 804 F statistic 1307.22 957.44 1026.87 813.05 *** ** * p < 0.001, p < 0.01, p < 0.05 Models predicting vehicle price (note that standard deviations of the coefficients are listed to the side in parentheses)

Curtis Miller MATH 3080 Final Project pg. 9 Figure 1: Scatterplots (above diagonal) and correlations (below diagonal) of car data

Curtis Miller MATH 3080 Final Project pg. 10 Figure 2: Diagnostic plots of model 1 of the car data

Curtis Miller MATH 3080 Final Project pg. 11 Figure 3: Diagnostic plots of model 2 of the car data

Curtis Miller MATH 3080 Final Project pg. 12 Figure 4: Diagnostic plots of model 3 of the car data

Curtis Miller MATH 3080 Final Project pg. 13 Figure 5: Diagnostic plots of model 4 of the car data

Curtis Miller MATH 3080 Final Project pg. 14 Table 2: Regression table for sound pressure versus range Model 1 Model 2 Model 3 (Intercept) 195.9923 (0.1820) *** 192.2367 (0.2191) *** 192.3104 (0.3009) *** Range 1.2419 (0.0276) *** 2.6703 (0.0995) *** log(range) 7.0133 (0.1315) *** Range 2-0.1004 (0.0068) *** R 2 0.5542 0.6355 0.6074 Adj. R 2 0.5540 0.6352 0.6069 Num. obs. 1634 1634 1634 *** ** * p < 0.001, p < 0.01, p < 0.05 Statistical models predicting SoundPressure from Range (note that standard deviations of the coefficients are listed to the side in parentheses)

Curtis Miller MATH 3080 Final Project pg. 15 Figure 6: Diagnostic plots of model 1 of the dolphin data

Curtis Miller MATH 3080 Final Project pg. 16 Figure 7: Diagnostic plots of model 2 of the dolphin data

Curtis Miller MATH 3080 Final Project pg. 17 Figure 8: Diagnostic plots of model 3 of the dolphin data