To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016 Data Preparation: 1. Separate trany variable into Manual which takes value of 1 and 0 (1 means manual, 0 means automatic), and Speed that takes category of 3-speed, 4 speed and 5 speed. 2. Change variable tcharger and scharger to dummy variables that take value of 1 and 0 (1 means tcharger, 0 means not tcharger; same with scharger ) 3. Convert displ into gallon instead of liter. 4. Convert year variable in to dummy variable after 2014 which indicates whether it is before or after 2014. 5. Generate an interaction term using after 2014 * Manual Part A: Summary: Our target is MPG. While including many factors that we think are important to MPG, we also tried different model variations such as adding interaction terms manual ## after 2014, subtracting variables one by one. Our conclusion is that variables that included in Model 1 explains MPG the best, also the interpretation of the result makes sense in the real world. Model 1: Based on my understanding of the topic, I believe there are several important factors that affects the MPG of a car: # of cylinders, engine displacement, drive type, fuel type, manual or automatic, year made, turbocharged or supercharged. Unit: all relevant variable in the model is in the unit of gallon. Target: combo08 Input: cylinders, displ_gal, drive, fueltype1, manual, speed, year, tcharger, scharger Other parameters: seed: 424565, partition: 70/30/0
MPG = -240.35 0.44 * # of cylinders 7.25 * Engine displacement {0.58 if drive4-wheel Drive, 1.58 if drive4-wheel or All-Wheel Drive, - 0.08 if driveall-wheel Drive, -2.07 if drivefront- Wheel Drive, 1.77 if drive Part-time 4-Wheel Drive, 0.17 if driverear-wheel Drive} {5.19 if fuletype1 Midgrade, 7.85 if fueltype1nuaturalgas, 6.99 if fueltype1premium, 7.1 if fueltype1regular} + {0.69 if manual, 0 if automatic} + 0.13 * # of year {1.19 if turbocharged, 0 if not} {0.84 if supercharged, 0 if not} This training output shows that all variables included in the model are significant except the drive type driverear-wheel Drive and driveall-wheel Drive. In addition, # of cylinders, Engine displacement, drive types, fueltype, turbocharged, and supercharged are all negatively associated with MPG; while # of year and transmission type is positively associated with MPG. Evaluation: The result actually makes sense, as technology getting more and more advanced, the cars that were made more recently will have higher MPG. Also, manual cars on average have higher MPG than automatic cars. The Pseudo R-square is 0.80, which means 80% of the variation in y can explained by all the predictors. Figure 1 Model 2: Considering the fact that far more cars in 2014 are automatic than manual, but that may not be true at the start of the data set, we generated a after 2014 variable which denotes the cars
are produced after 2014 if it is 1, and before 2014 if it is 0. We created an interaction term of after 2014 * manual, to see is there an additional effect of being manual car that were produced after 2014. Unit: all relevant variable in the model is in the unit of gallon. Target: combo08 Input: cylinders, displ_gal, drive, fueltype1, manual, year after 2014, tcharger, scharger, year after 2014##manual Other parameters: seed: 4245346, partition: 70/30/0 MPG = 25.98 0.46 * # of cylinders 6.24 * Engine displacement + {1.35 if drive4-wheel Drive, -0.35 if drive4-wheel or All-Wheel Drive, 1.66 if driveall-wheel Drive, 3.43 if drivefront-wheel Drive, -0.39 if drive Part-time 4-Wheel Drive, 0.17 if driverear-wheel Drive} + {0.55 if manual, 0 if automatic} + {3.28 if year after 2014, 0 if before 2014} {0.46 if turbocharged, 0 if not} {0.68 if supercharged, 0 if not} + {-0.19 if manual after 2014, 0 if not manual after 2014} Evaluation: As Figure 2 shows, the Pseudo R-square is 0.66, which is lower than Model 1. In addition, reading the actual regression equation for Model 2 above, the interpretation doesn t make as much sense as Model 1. Especially for the coefficient on the interaction term manual after 2014, it is hard to comprehend why there is a negative effect. Figure 2:
Part B: Summary: The target is Manual. We comparing different models trying to find out the one that has the highest accuracy prediction percentage based on confusion matrix. We conclude that Model 2 including Speed and vclass is the best model based on analysis, which can correctly predict 92% of validation dataset. Model 1: Unit: all relevant variable in the model is in the unit of gallon. Target: Manual Input: city08, Co2TailpipeGpm, cylinders, displ_gal, drive, fueltype1, year, highway08, tcharger, scharger Other parameters: seed: 424346, partition: 70/30/0 Manual (0,1) = 113.72 + β1 * mpg in city + β2 * Co2TailpipeGpm + β3 * # OF CYLINDERS + β4 * displacement in gallon + β5* drive type + β6 * fuel type + β7 * year + β8 * mgp in highway + β9 * turbo charged + β10 * super changed Coefficients Table:
Table 1 and 2 show that the model can correctly predict 7003 automatic cases, which is 63% of all the validation data; and it can also correctly predict 947 manual cases, which is 9% of all the validation data. For the rest of 28%, the model failed to predict. Table 1 Confusion Matrix of Model 1 Prediction on Validation dataset (count) Predicted Automatic Manual Actual Automatic 7003 441 Manual 2716 947 Table 2 Confusion Matrix of Model 1 Prediction on Validation dataset (percentage) Predicted Error Rate Automatic Manual Actual Automatic 0.63 0.04 0.06 Manual 0.24 0.09 0.74 Model 2: We include speed and vclass variable, while dropping drive and fueltype1. Unit: all relevant variable in the model is in the unit of gallon. Target: Manual Input: city08, Co2TailpipeGpm, cylinders, displ_gal, drive, fueltype1, year after 2014, highway08, tcharger, scharger Other parameters: seed: 4245346, partition: 70/30/0 Manual (0,1) = 515.82 + β1 * mpg in city + β2 * Co2TailpipeGpm + β3 * # OF CYLINDERS + β4 * displacement in gallon + β5i* speed type + β6i * EPA vehicle size class + β7 * year + β8 * mgp in highway + β9 * turbo charged + β10 * super changed
*As we can see from the coefficient table, the Speed variable is insignificant in the model. Thus, we tried to exclude Speed; however, in the confusion matrix showed a dramatic accuracy percentage drop. Thus, we decide to keep it in our model. Table 3 and 4 show that the model can correctly predict 6698 automatic cases, which is 62% of all the validation data; and it can also correctly predict 3227 manual cases, which is 30% of all the validation data. For the rest of 8%, the model failed to predict. The accuracy rate is higher than the previous model. Table 3 Confusion Matrix of Model 2 Prediction on Validation dataset (count) Predicted Automatic Manual Actual Automatic 6698 442 Manual 474 3227 Table 4 Confusion Matrix of Model 2 Prediction on Validation dataset (percentage) Predicted Error Rate Automatic Manual Actual Automatic 0.62 0.04 0.06 Manual 0.04 0.30 0.13
Part C: Summary: After running a linear regression and classifying cars according to the transmission type, we want to create a model that determines how the variables interact with other variables to play an effect on determining the outcome of the combined MPG for fuel type 1. The inclusion of these interaction terms will provide a more accurate model that assesses the relationship between the inputs and target variable. Specifically, the age of the car should be interacted with the input variables because cars are engineered to become more efficient over time. For our working model, we categorized the years into decades, where the variable names are: eighties, nineties, twothousand, and twoten. Respectively, these variables translate into these values: values from the 1980s, values from the 1990s, values from the 2000s, and values from the 2010s. Model without interaction terms Unit: all relevant variable in the model is in the unit of gallon. Target: combo08 Input: cylinders, displ_gal, drive, fueltype1, manual, year dummies, tcharger, scharger Other parameters: seed: 12345, partition: 70/30/0 Before we create models with interacted terms, we ran a model with the created year dummy variables. We will use the results from this model as our baseline case to compare our interacted models. The Rattle output is shown in Appendix A while the graphical results are shown below: A) Graphical depiction of the data distribution
B) Predicted vs. Observed Model
Model with year##manual Next, we included the interaction term between years and whether the car runs on manual transmission or not. This is an important interaction to observe because we can hypothesize that cars that manufactured in more recent years are less likely to be run on manual transmission. Thus, this pattern could be correlated with patterns observed in MPG. The model is as follows: comb08=b 0 +city08+b 1 co2tailpipegpm+ B 2 cylinders + B 3 displ_ga+ B 4 drive +.B K eighties*manual +B K+1 nineties*manual + B k+2 twothousand*manual+ B k+3 twoten*manual When the model was placed into Rattle, it produced the following results: 1 A) Predicted vs. Observed Model 1 The summary of the linear regression model is shown in Appendix B.
The model shows the Pseudo R-squared is.9952, which shows that the observed points fit the predicted model very well. Also, the regression analysis shows that two of the interactions showed statistical significance, implying that they have some effect on determining the target variable. Model with year##pv4 In addition, we created an interaction term between the 4-door passenger volume and year. This is an important interaction to observe because we hypothesize that cars with greater volume would have lower fuel efficiency as the car would need to move more weight. comb08=b 0 +city08+b 1 co2tailpipegpm+ B 2 cylinders + B 3 displ_ga+ B 4 drive +.B K eighties*pv4 +B K+1 nineties* pv4 + B k+2 twothousand* pv4+ B k+3 twoten* pv4 Rattle produced the following results: 2 A) Predicted vs. Observed Model 2 The summary of the linear regression model is shown in Appendix C.
While the pseudo r-squared stayed the same, including this particular interaction term changed the B coefficients for the model. However, none of the coefficients of the interaction terms showed statistical significance, implying that the interaction between the terms did not have an effect on the target variable. Model with year##co2tailpipgpm For our third interaction term, we interacted tailpipe CO2 in grams/mile and year. This is an important interaction to observe because newer cars that face higher emission standards tend to have lower emission of tailpipe CO2. We want to observe if this decrease in emission of CO2 also reflects a relationship with better MPG. comb08=b 0 +city08+b 1 co2tailpipegpm+ B 2 cylinders + B 3 displ_ga+ B 4 drive +.B K eighties* co2tailpipgpm +B K+1 nineties* co2tailpipgpm + B k+2 twothousand* co2tailpipgpm + B k+3 twoten* co2tailpipgpm Rattle produced the following results: 3 A) Predicted vs. Observed Model 3 The summary of the linear regression model is shown in Appendix D.
Once again, the r-squared value stayed the same, reflecting that the observed data points fit the predicted model well. In this model, all interaction terms showed high statistical significance, implying that all interaction terms had an effect on the target variable. Other Issues: In addition to the inputs provided in the vehicle data from the U.S. Department of Energy, it would be useful to have information on whether the car has air conditioning and the other technological systems (sound system) given that they also expend energy and thus would also play a role in influencing the MPG efficiency. Furthermore, it would be useful to include the total weight of the car because we would hypothesize that heavier cars consume more energy to move the car.
Appendix A Call: lm(formula = comb08 ~., data = crs$dataset[crs$train, c(crs$input, crs$target)]) Residuals: Min 1Q Median 3Q Max -1.19066-0.23276 0.01995 0.22896 1.57990 Coefficients: (26 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) -2.4217340467 2.3400979427-1.035 0.300733 city08 0.6090828411 0.0020068185 303.507 < 2e-16 co2tailpipegpm -0.0016837384 0.0002421873-6.952 3.69e-12 cylinders 0.0300722922 0.0045195960 6.654 2.92e-11 displ_ga0.24 0.4415180039 0.5490757508 0.804 0.421341 displ_ga0.26 0.6227234984 0.4299244537 1.448 0.147505 displ_ga0.29-0.1740191334 0.4461602370-0.390 0.696512 displ_ga0.32 0.1405816565 0.4333594153 0.324 0.745638 displ_ga0.34-0.0026512686 0.4300561851-0.006 0.995081 displ_ga0.37 0.0061692889 0.4314898060 0.014 0.988593 displ_ga0.40 0.0449831282 0.4293255099 0.105 0.916554 displ_ga0.42-0.0990142384 0.4291427527-0.231 0.817530 displ_ga0.45 0.1177813083 0.4324885784 0.272 0.785368 displ_ga0.48-0.1993026091 0.4292233469-0.464 0.642413 displ_ga0.50-0.1008289525 0.4296291619-0.235 0.814453 displ_ga0.53-0.2615914370 0.4291570218-0.610 0.542168 displ_ga0.55-0.2063038773 0.4317562447-0.478 0.632779 displ_ga0.58-0.2883648985 0.4294400469-0.671 0.501915 displ_ga0.61-0.3707779357 0.4295014980-0.863 0.387995 displ_ga0.63-0.2755870628 0.4294260912-0.642 0.521037 displ_ga0.66-0.3702465319 0.4294215081-0.862 0.388587 displ_ga0.69-0.3942615069 0.4300460895-0.917 0.359262 displ_ga0.71-0.3662314277 0.4299430321-0.852 0.394326 displ_ga0.74-0.4958097993 0.4298438425-1.153 0.248731 displ_ga0.77-0.2756814633 0.4305727069-0.640 0.522005 displ_ga0.79-0.4257032573 0.4296508322-0.991 0.321787 displ_ga0.82-0.5448379662 0.4304759811-1.266 0.205646 displ_ga0.85-0.4290285873 0.4299730824-0.998 0.318385 displ_ga0.87-0.4644220712 0.4301054966-1.080 0.280248 displ_ga0.90-0.4865367761 0.4301716993-1.131 0.258054 displ_ga0.92-0.4261846287 0.4297778113-0.992 0.321383 displ_ga0.95-0.4628637816 0.4299440871-1.077 0.281684 displ_ga0.98-0.4371734614 0.4300367836-1.017 0.309356 displ_ga1.00-0.4831604006 0.4299472848-1.124 0.261123 displ_ga1.03-0.3905848852 0.4304664591-0.907 0.364229 displ_ga1.06-0.4237980866 0.4299797115-0.986 0.324328 displ_ga1.08-0.4822379361 0.4325727330-1.115 0.264941 displ_ga1.11-0.4340042361 0.4302495595-1.009 0.313116 displ_ga1.14-0.4231416487 0.4299315141-0.984 0.325023 displ_ga1.16-0.4430726521 0.4308181492-1.028 0.303751 displ_ga1.19-0.3896138506 0.4326336659-0.901 0.367830 displ_ga1.22-0.4248368319 0.4304720307-0.987 0.323697 displ_ga1.24-0.5291894460 0.4307254930-1.229 0.219234
displ_ga1.27-0.4309839131 0.4309622989-1.000 0.317296 displ_ga1.29-0.4178189496 0.4304350810-0.971 0.331712 displ_ga1.32-0.3807172994 0.4303375289-0.885 0.376330 displ_ga1.37-0.2529379476 0.4306861984-0.587 0.557014 displ_ga1.40-0.4809927855 0.4304726387-1.117 0.263852 displ_ga1.43-0.2798629557 0.4309188752-0.649 0.516050 displ_ga1.45-0.3153015091 0.4317434422-0.730 0.465215 displ_ga1.48-0.3455303936 0.4314751423-0.801 0.423248 displ_ga1.51-0.3225660130 0.4304322452-0.749 0.453623 displ_ga1.53-0.2072795606 0.4315957953-0.480 0.631045 displ_ga1.56-0.0967387965 0.4310866243-0.224 0.822443 displ_ga1.59-0.4388297383 0.4310778680-1.018 0.308696 displ_ga1.61-0.3501673163 0.4406162348-0.795 0.426783 displ_ga1.64-0.5796730668 0.4307297986-1.346 0.178382 displ_ga1.66-0.2513822254 0.4477250739-0.561 0.574485 displ_ga1.69-0.7554028830 0.4368398324-1.729 0.083778 displ_ga1.72-0.4257060716 0.4328929054-0.983 0.325421 displ_ga1.74-1.1910209672 0.4409356695-2.701 0.006915 displ_ga1.77-0.0750071583 0.4342276733-0.173 0.862860 displ_ga1.80 0.2361059868 0.4319065934 0.547 0.584617 displ_ga1.85-0.6952471462 0.4524661118-1.537 0.124411 displ_ga1.96 0.0407839627 0.4635198799 0.088 0.929887 displ_ga2.11-0.3882762779 0.4452706766-0.872 0.383217 displ_ga2.19-0.7556153140 0.4575766517-1.651 0.098682 displ_ga2.22-0.6185814792 0.4499721729-1.375 0.169234 displ NA NA NA NA drive4-wheel Drive -0.0318667866 0.0300924054-1.059 0.289627 drive4-wheel or All-Wheel Drive 0.0282168216 0.0246319951 1.146 0.251999 driveall-wheel Drive -0.0518695383 0.0273701671-1.895 0.058089 drivefront-wheel Drive 0.0248945676 0.0241373636 1.031 0.302377 drivepart-time 4-Wheel Drive -0.1024583698 0.0458330699-2.235 0.025396 driverear-wheel Drive -0.0032680520 0.0231572398-0.141 0.887773 engid -0.0000007213 0.0000001799-4.009 6.13e-05 CA.model 0.0056180687 0.0158867061 0.354 0.723617 fuelcost08-0.0021230274 0.0005455026-3.892 9.97e-05 fueltype1midgrade Gasoline -0.4740609076 0.0610182925-7.769 8.21e-15 fueltype1natural Gas -0.4105540877 0.0661723638-6.204 5.58e-10 fueltype1premium Gasoline -0.0967315827 0.0378983052-2.552 0.010704 fueltype1regular Gasoline -0.2426262231 0.0223865307-10.838 < 2e-16 highway08 0.3151744603 0.0018576810 169.660 < 2e-16 pv4-0.0000105232 0.0000821307-0.128 0.898049 tranyauto (AV-S8) 0.2557313011 0.4858385280 0.526 0.598635 tranyauto (AV) 0.5001211883 0.4341691465 1.152 0.249372 tranyautomatic (A1) -0.0743128639 0.5422835873-0.137 0.891003 tranyautomatic (A6) 0.0635311951 0.3844732984 0.165 0.868755 tranyautomatic (AM5) -0.1882752770 0.4222429051-0.446 0.655678 tranyautomatic (AV-S6) 0.2151677048 0.3670105420 0.586 0.557699 tranyautomatic (AV) 0.3233230596 0.4856389045 0.666 0.505565 tranyautomatic (S4) 0.1838921008 0.3446767571 0.534 0.593678 tranyautomatic (S5) 0.1996004214 0.3438718695 0.580 0.561617 tranyautomatic (S6) 0.1653573036 0.3436987787 0.481 0.630442 tranyautomatic (S7) 0.0969365960 0.3446460811 0.281 0.778510 tranyautomatic (S8) 0.1219030812 0.3440830930 0.354 0.723129 tranyautomatic (S9) 0.1442612937 0.3644300188 0.396 0.692216 tranyautomatic (variable gear ratios) 0.2237698798 0.3439189922 0.651 0.515280 tranyautomatic 3-spd 0.1890650062 0.3438178835 0.550 0.582394
tranyautomatic 4-spd 0.1766820992 0.3437139971 0.514 0.607230 tranyautomatic 5-spd 0.1655396398 0.3437267565 0.482 0.630093 tranyautomatic 6-spd 0.1258491044 0.3438357719 0.366 0.714357 tranyautomatic 6spd 0.5999999532 0.4856950238 1.235 0.216715 tranyautomatic 7-spd 0.1399613149 0.3438959979 0.407 0.684021 tranyautomatic 8-spd 0.0935405364 0.3452224470 0.271 0.786426 tranyautomatic 9-spd 0.1689719222 0.3489465213 0.484 0.628224 tranymanual 3-spd 0.2150273413 0.3468771160 0.620 0.535333 tranymanual 4-spd 0.2194354147 0.3439336887 0.638 0.523469 tranymanual 5-spd 0.2169655996 0.3437309840 0.631 0.527911 tranymanual 6-spd 0.1721377975 0.3436955587 0.501 0.616486 tranymanual 7-spd 0.2832647875 0.3486179040 0.813 0.416492 Manual NA NA NA NA TransmissionAutomatic NA NA NA NA TransmissionManual NA NA NA NA Speed(A6) NA NA NA NA Speed(AM5) NA NA NA NA Speed(AV-S6) NA NA NA NA Speed(AV-S8) NA NA NA NA Speed(AV) NA NA NA NA Speed(S4) NA NA NA NA Speed(S5) NA NA NA NA Speed(S6) NA NA NA NA Speed(S7) NA NA NA NA Speed(S8) NA NA NA NA Speed(S9) NA NA NA NA Speed(variable NA NA NA NA Speed3-spd NA NA NA NA Speed4-spd NA NA NA NA Speed5-spd NA NA NA NA Speed6-spd NA NA NA NA Speed6spd NA NA NA NA Speed7-spd NA NA NA NA Speed8-spd NA NA NA NA Speed9-spd NA NA NA NA VClassLarge Cars -0.0085732664 0.0133195371-0.644 0.519801 VClassMidsize Cars -0.0218198510 0.0092506680-2.359 0.018345 VClassMidsize Station Wagons -0.0219022280 0.0216235051-1.013 0.311123 VClassMidsize-Large Station Wagons -0.0311604611 0.0179830179-1.733 0.083149 VClassMinicompact Cars -0.0089238577 0.0161247146-0.553 0.579976 VClassMinivan - 2WD -0.0219167856 0.0252889401-0.867 0.386140 VClassMinivan - 4WD -0.0091248337 0.0583729806-0.156 0.875783 VClassSmall Pickup Trucks 0.0071505635 0.0215443880 0.332 0.739968 VClassSmall Pickup Trucks 2WD 0.0430138488 0.0241627597 1.780 0.075060 VClassSmall Pickup Trucks 4WD 0.0468520499 0.0328094214 1.428 0.153303 VClassSmall Sport Utility Vehicle 2WD 0.0509802021 0.0263436609 1.935 0.052978 VClassSmall Sport Utility Vehicle 4WD 0.0616203600 0.0257132723 2.396 0.016563 VClassSmall Station Wagons -0.0097006935 0.0130871864-0.741 0.458558 VClassSpecial Purpose Vehicle -0.3308800415 0.3439988129-0.962 0.336128 VClassSpecial Purpose Vehicle 2WD 0.0331181023 0.0202052971 1.639 0.101210 VClassSpecial Purpose Vehicle 4WD 0.0195015867 0.0276706212 0.705 0.480956 VClassSpecial Purpose Vehicles -0.0118675557 0.0160625099-0.739 0.460014 VClassSpecial Purpose Vehicles/2wd -0.2031794718 0.2430615501-0.836 0.403209 VClassSpecial Purpose Vehicles/4wd 0.4869481677 0.3439006274 1.416 0.156801 VClassSport Utility Vehicle - 2WD -0.0315782968 0.0147310961-2.144 0.032071 VClassSport Utility Vehicle - 4WD 0.0121376849 0.0150854216 0.805 0.421060
VClassStandard Pickup Trucks -0.0162506804 0.0154491830-1.052 0.292865 VClassStandard Pickup Trucks 2WD 0.0210839112 0.0173209310 1.217 0.223521 VClassStandard Pickup Trucks 4WD 0.0295671913 0.0195013908 1.516 0.129492 VClassStandard Pickup Trucks/2wd -0.2161035755 0.2433116583-0.888 0.374455 VClassStandard Sport Utility Vehicle 2WD -0.1296779192 0.0358232486-3.620 0.000295 VClassStandard Sport Utility Vehicle 4WD -0.0136021005 0.0280767608-0.484 0.628063 VClassSubcompact Cars -0.0196238297 0.0096562414-2.032 0.042140 VClassTwo Seaters 0.0223171480 0.0141514439 1.577 0.114803 VClassVans 0.0148544860 0.0174657506 0.850 0.395060 VClassVans Passenger 0.1142571793 0.2430979585 0.470 0.638356 VClassVans, Cargo Type 0.0341493669 0.0243140042 1.405 0.160179 VClassVans, Passenger Type 0.0335305680 0.0274663308 1.221 0.222178 after.2014 0.0451878412 0.0112822924 4.005 6.21e-05 year 0.0008192440 0.0010650260 0.769 0.441768 yousavespend -0.0003376612 0.0001114336-3.030 0.002447 tcharger -0.0310274840 0.0088460455-3.507 0.000453 scharger 0.0015965937 0.0183814730 0.087 0.930784 decade 0.0031447466 0.0011324864 2.777 0.005493 Eighties 0.0700271077 0.0235398159 2.975 0.002934 Nineties 0.0143929768 0.0140589847 1.024 0.305960 Twothousand NA NA NA NA Twoten NA NA NA NA --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3431 on 24306 degrees of freedom (1463 observations deleted due to missingness) Multiple R-squared: 0.9952, Adjusted R-squared: 0.9952 F-statistic: 3.394e+04 on 150 and 24306 DF, p-value: < 2.2e-16 ==== ANOVA ==== Analysis of Variance Table Response: comb08 Df Sum Sq Mean Sq F value Pr(>F) city08 1 584032 584032 4962188.4905 < 2.2e-16 *** co2tailpipegpm 1 6217 6217 52818.0944 < 2.2e-16 *** cylinders 1 814 814 6919.1595 < 2.2e-16 *** displ_ga 64 1566 24 207.9411 < 2.2e-16 *** drive 6 410 68 580.4267 < 2.2e-16 *** engid 1 346 346 2939.5246 < 2.2e-16 *** CA.model 1 7 7 58.8118 1.800e-14 *** fuelcost08 1 18 18 153.2497 < 2.2e-16 *** fueltype1 4 434 109 922.6904 < 2.2e-16 *** highway08 1 5227 5227 44412.6484 < 2.2e-16 *** pv4 1 1 1 6.0192 0.014158 * trany 27 10 0 3.2908 1.680e-08 *** VClass 33 13 0 3.3870 1.836e-10 *** after.2014 1 3 3 27.1192 1.929e-07 *** year 1 0 0 4.1220 0.042340 * yousavespend 1 1 1 8.9761 0.002738 ** tcharger 1 1 1 9.8312 0.001718 ** scharger 1 0 0 0.0800 0.777252 decade 1 0 0 0.3279 0.566930
Eighties 1 3 3 21.4492 3.652e-06 *** Nineties 1 0 0 1.0481 0.305960 Residuals 24306 2861 0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 [1] "\n" Time taken: 1.81 secs Rattle timestamp: 2016-04-12 17:30:01 seungkookang Appendix B Call: lm(formula = comb08 ~., data = crs$dataset[crs$train, c(crs$input, crs$target)]) Residuals: Min 1Q Median 3Q Max -1.17706-0.24987 0.05533 0.25276 2.55808 Coefficients: (2 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 3.1954909050 0.4467419732 7.153 8.73e-13 *** city08 0.6427607886 0.0014711425 436.913 < 2e-16 *** co2tailpipegpm -0.0012357748 0.0000838742-14.734 < 2e-16 *** cylinders 0.0077209069 0.0033522699 2.303 0.021276 * displ -0.0016039293 0.0047443164-0.338 0.735310 engid 0.0000003545 0.0000001654 2.143 0.032145 * camodel -0.0010733426 0.0159615113-0.067 0.946387 fuelcost08-0.0015585928 0.0003494274-4.460 8.22e-06 *** highway08 0.3308656241 0.0014524238 227.802 < 2e-16 *** pv4-0.0000042527 0.0000663520-0.064 0.948897 manual 0.0562400288 0.0120178053 4.680 2.89e-06 *** after2014 0.0254553553 0.0096299110 2.643 0.008214 ** yousavespend -0.0002894438 0.0000702708-4.119 3.82e-05 *** tcharger 0.0264131146 0.0076658533 3.446 0.000571 *** scharger 0.0009885123 0.0178359165 0.055 0.955802 eighties 0.0542186073 0.0110431385 4.910 9.18e-07 *** nineties 0.0294131844 0.0097203029 3.026 0.002481 ** twothousand 0.0314295099 0.0087056403 3.610 0.000307 *** twoten NA NA NA NA drive2-0.0035878072 0.0014659451-2.447 0.014394 * vclass2 0.0011035693 0.0002602736 4.240 2.24e-05 *** manual_eighties -0.0510219812 0.0157425601-3.241 0.001193 ** manual_nineties -0.0413133618 0.0150074304-2.753 0.005912 ** manual_twothousand -0.0132139148 0.0150172224-0.880 0.378913 manual_twoten NA NA NA NA --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3596 on 24984 degrees of freedom (913 observations deleted due to missingness) Multiple R-squared: 0.995, Adjusted R-squared: 0.9949 F-statistic: 2.238e+05 on 22 and 24984 DF, p-value: < 2.2e-16
==== ANOVA ==== Analysis of Variance Table Response: comb08 Df Sum Sq Mean Sq F value Pr(>F) city08 1 620941 620941 4802791.6066 < 2.2e-16 *** co2tailpipegpm 1 6545 6545 50626.4060 < 2.2e-16 *** cylinders 1 793 793 6130.7913 < 2.2e-16 *** displ 1 4 4 31.7813 0.000000017441 *** engid 1 249 249 1924.2543 < 2.2e-16 *** camodel 1 19 19 149.5094 < 2.2e-16 *** fuelcost08 1 41 41 316.1082 < 2.2e-16 *** highway08 1 7919 7919 61247.5166 < 2.2e-16 *** pv4 1 1 1 8.6551 0.0032645 ** manual 1 5 5 35.7265 0.000000002301 *** after2014 1 0 0 1.6807 0.1948397 yousavespend 1 2 2 16.6191 0.000045829379 *** tcharger 1 1 1 10.3929 0.0012666 ** scharger 1 0 0 0.0015 0.9688293 eighties 1 1 1 6.0919 0.0135871 * nineties 1 0 0 0.4910 0.4834671 twothousand 1 2 2 12.9609 0.0003187 *** drive2 1 0 0 3.6298 0.0567666. vclass2 1 2 2 17.0331 0.000036853093 *** manual_eighties 1 1 1 6.2586 0.0123655 * manual_nineties 1 1 1 7.9475 0.0048191 ** manual_twothousand 1 0 0 0.7743 0.3789125 Residuals 24984 3230 0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 [1] "\n" Time taken: 0.16 secs Rattle timestamp: 2016-04-12 19:02:48 seungkookang Appendix C Call: lm(formula = comb08 ~., data = crs$dataset[crs$train, c(crs$input, crs$target)]) Residuals: Min 1Q Median 3Q Max -1.17824-0.24869 0.05627 0.25044 2.55458 Coefficients: (2 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 3.2272922820 0.4468372799 7.223 5.25e-13 *** city08 0.6427759517 0.0014738486 436.121 < 2e-16 *** co2tailpipegpm -0.0012329050 0.0000842435-14.635 < 2e-16 *** cylinders 0.0072352669 0.0033529847 2.158 0.03095 * displ -0.0024375377 0.0047468297-0.514 0.60760
engid 0.0000002992 0.0000001650 1.814 0.06970. camodel -0.0014902465 0.0159773616-0.093 0.92569 fuelcost08-0.0015790754 0.0003494875-4.518 6.26e-06 *** highway08 0.3308121388 0.0014528274 227.702 < 2e-16 *** pv4 0.0000669432 0.0001075188 0.623 0.53354 manual 0.0275425500 0.0052528650 5.243 1.59e-07 *** after2014 0.0242161186 0.0096159931 2.518 0.01180 * yousavespend -0.0002939711 0.0000702829-4.183 2.89e-05 *** tcharger 0.0258103639 0.0076893961 3.357 0.00079 *** scharger -0.0001941384 0.0178426566-0.011 0.99132 eighties 0.0431405710 0.0108952110 3.960 7.53e-05 *** nineties 0.0212642384 0.0099760462 2.132 0.03306 * twothousand 0.0299242747 0.0092199242 3.246 0.00117 ** twoten NA NA NA NA drive2-0.0030513643 0.0014636570-2.085 0.03710 * vclass2 0.0010801695 0.0002602231 4.151 3.32e-05 *** pv4_eighties -0.0001779384 0.0001559518-1.141 0.25389 pv4_nineties -0.0000853722 0.0001412213-0.605 0.54550 pv4_twothousand -0.0000276421 0.0001340777-0.206 0.83666 pv4_twoten NA NA NA NA --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3597 on 24984 degrees of freedom (913 observations deleted due to missingness) Multiple R-squared: 0.9949, Adjusted R-squared: 0.9949 F-statistic: 2.237e+05 on 22 and 24984 DF, p-value: < 2.2e-16 ==== ANOVA ==== Analysis of Variance Table Response: comb08 Df Sum Sq Mean Sq F value Pr(>F) city08 1 620941 620941 4800204.7357 < 2.2e-16 *** co2tailpipegpm 1 6545 6545 50599.1377 < 2.2e-16 *** cylinders 1 793 793 6127.4891 < 2.2e-16 *** displ 1 4 4 31.7642 0.000000017595 *** engid 1 249 249 1923.2178 < 2.2e-16 *** camodel 1 19 19 149.4288 < 2.2e-16 *** fuelcost08 1 41 41 315.9380 < 2.2e-16 *** highway08 1 7919 7919 61214.5276 < 2.2e-16 *** pv4 1 1 1 8.6504 0.0032729 ** manual 1 5 5 35.7073 0.000000002324 *** after2014 1 0 0 1.6798 0.1949600 yousavespend 1 2 2 16.6102 0.000046046040 *** tcharger 1 1 1 10.3873 0.0012705 ** scharger 1 0 0 0.0015 0.9688377 eighties 1 1 1 6.0886 0.0136123 * nineties 1 0 0 0.4908 0.4835850 twothousand 1 2 2 12.9540 0.0003199 *** drive2 1 0 0 3.6278 0.0568333. vclass2 1 2 2 17.0239 0.000037031459 *** pv4_eighties 1 0 0 1.1323 0.2872931 pv4_nineties 1 0 0 0.3407 0.5594526 pv4_twothousand 1 0 0 0.0425 0.8366640
Residuals 24984 3232 0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 [1] "\n" Time taken: 0.15 secs Rattle timestamp: 2016-04-12 19:05:26 seungkookang ====================================================================== Appendix D Call: lm(formula = comb08 ~., data = crs$dataset[crs$train, c(crs$input, crs$target)]) Residuals: Min 1Q Median 3Q Max -1.19842-0.25012 0.05325 0.25507 2.60744 Coefficients: (2 not defined because of singularities) Estimate Std. Error t value Pr(> t ) (Intercept) 4.5214029735 0.4577515874 9.877 < 2e-16 *** city08 0.6386045447 0.0015038259 424.653 < 2e-16 *** co2tailpipegpm -0.0020604433 0.0001067956-19.293 < 2e-16 *** cylinders 0.0141782988 0.0033792704 4.196 2.73e-05 *** displ -0.0014896710 0.0047308524-0.315 0.752852 engid 0.0000007499 0.0000001676 4.475 7.68e-06 *** camodel -0.0012997322 0.0159072961-0.082 0.934881 fuelcost08-0.0022685356 0.0003528810-6.429 1.31e-10 *** highway08 0.3294072346 0.0014506165 227.081 < 2e-16 *** pv4 0.0000027545 0.0000661010 0.042 0.966762 manual 0.0294139636 0.0052208719 5.634 1.78e-08 *** after2014 0.0102421362 0.0096595491 1.060 0.289013 yousavespend -0.0004340549 0.0000709699-6.116 9.73e-10 *** tcharger 0.0234235601 0.0076492885 3.062 0.002200 ** scharger 0.0095584364 0.0177890690 0.537 0.591051 eighties -0.3483295443 0.0313131942-11.124 < 2e-16 *** nineties -0.2663131469 0.0302353996-8.808 < 2e-16 *** twothousand -0.1041053862 0.0296096084-3.516 0.000439 *** twoten NA NA NA NA drive2-0.0038898558 0.0014577052-2.668 0.007624 ** co2_eighties 0.0008298540 0.0000642096 12.924 < 2e-16 *** co2_nineties 0.0006210918 0.0000619338 10.028 < 2e-16 *** co2_twothousand 0.0003173050 0.0000618303 5.132 2.89e-07 *** co2_twoten NA NA NA NA vclass2 0.0014707108 0.0002610411 5.634 1.78e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3583 on 24984 degrees of freedom (913 observations deleted due to missingness) Multiple R-squared: 0.995, Adjusted R-squared: 0.995 F-statistic: 2.253e+05 on 22 and 24984 DF, p-value: < 2.2e-16 ==== ANOVA ====
Analysis of Variance Table Response: comb08 Df Sum Sq Mean Sq F value Pr(>F) city08 1 620941 620941 4835985.1962 < 2.2e-16 *** co2tailpipegpm 1 6545 6545 50976.3009 < 2.2e-16 *** cylinders 1 793 793 6173.1631 < 2.2e-16 *** displ 1 4 4 32.0010 1.558e-08 *** engid 1 249 249 1937.5534 < 2.2e-16 *** camodel 1 19 19 150.5427 < 2.2e-16 *** fuelcost08 1 41 41 318.2929 < 2.2e-16 *** highway08 1 7919 7919 61670.8173 < 2.2e-16 *** pv4 1 1 1 8.7149 0.0031591 ** manual 1 5 5 35.9735 2.028e-09 *** after2014 1 0 0 1.6923 0.1933043 yousavespend 1 2 2 16.7340 4.314e-05 *** tcharger 1 1 1 10.4647 0.0012183 ** scharger 1 0 0 0.0015 0.9687218 eighties 1 1 1 6.1340 0.0132673 * nineties 1 0 0 0.4944 0.4819596 twothousand 1 2 2 13.0505 0.0003038 *** drive2 1 0 0 3.6548 0.0559179. co2_eighties 1 11 11 82.7279 < 2.2e-16 *** co2_nineties 1 9 9 66.6638 3.371e-16 *** co2_twothousand 1 3 3 23.7731 1.091e-06 *** vclass2 1 4 4 31.7422 1.780e-08 *** Residuals 24984 3208 0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 [1] "\n" Time taken: 0.14 secs Rattle timestamp: 2016-04-12 19:07:00 seungkookang ======================================================================