INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS Subject CS1B Actuarial Statistics
Question 1 (i) # Data entry before <- c(155, 152, 146, 153, 146, 160, 139, 148) after <- c(145, 147, 123, 137, 141, 142, 140, 138) # define x as the pair-wise differences of the study's results x <- before - after # define and calculate intermediate variables mx <- mean(x) sdx <- sd(x) nx <- length(x) alpha <- 0.1 # 1-0.9 (the confidence level) t_quantile <- qt(p = alpha / 2, # two-sided interval need to divide alpha by 2 df = nx - 1, lower.tail = FALSE) # gives upper tail i.e. P(X > x) # assuming x follows a normal distribution with unknown variance: c_int <- c(mx - t_quantile * sdx / sqrt(nx), mx + t_quantile * sdx / sqrt(nx)) c_int [1] 5.46661 16.03339 ALTERNATIVE SOLUTIONS Using the t.test function t.test(x = before - after, conf.level = 0.9) One Sample t-test data: before - after t = 3.8549, df = 7, p-value = 0.006253 alternative hypothesis: true mean is not equal to 0 90 percent confidence interval: 5.46661 16.03339 sample estimates: mean of x 10.75 Restricting function output to only return the required confidence interval t.test(x = before - after, conf.level = 0.9)$conf.int [1] 5.46661 16.03339 attr(,"conf.level") [1] 0.9 Using the Paired t-test functionality of the t.test function t.test(x = before, y = after, paired = TRUE, conf.level = 0.9)
Paired t-test data: before and after t = 3.8549, df = 7, p-value = 0.006253 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: 5.46661 16.03339 sample estimates: mean of the differences 10.75 [8] (ii) # define and calculate intermediate values mu <- 10 t_stat <- (mx - mu) / (sdx / sqrt(nx)) pval <- pt(t_stat, df = nx - 1, lower.tail = FALSE) pval [1] 0.3978637 With p-value of 0.3979 we do not reject the null hypothesis at 0.01 significance level ALTERNATIVE SOLUTIONS Using the t.test function which also returns the p-value t.test(x = before - after, alternative = "greater", mu = 10, conf.level = 0.99) One Sample t-test data: before - after t = 0.26894, df = 7, p-value = 0.3979 alternative hypothesis: true mean is greater than 10 99 percent confidence interval: 2.389646 Inf sample estimates: mean of x 10.75 Restrict the t.test function ouptut to only include the p-value t.test(x = before - after, alternative = "greater", mu = 10, conf.level = 0.99)$p.value [1] 0.3978637 Using the Paired t-test functionality of the t.test function t.test(x = before, y = after, paired = TRUE, alternative = "greater", mu = 10,
conf.level = 0.99) Paired t-test data: before and after t = 0.26894, df = 7, p-value = 0.3979 alternative hypothesis: true difference in means is greater than 10 99 percent confidence interval: 2.389646 Inf sample estimates: mean of the differences 10.75 [7] [Total 15] Question 2 (i) (a) Posterior mean given as Z*x + (1-Z)*alpha/beta where alpha/beta is the prior (gamma) mean of λ. {2} M = 1000 alpha = 100 beta = 1 Z = 1/(beta+1) pm = X = numeric(m) for(m in 1:M){ lam = rgamma(1,shape=alpha,rate=beta) x = rpois(1,lam) X[m] = x pm[m] = Z*x + (1-Z)*alpha/beta } {10} (b) hist(pm,main="histogram of posterior means", xlab="posterior mean",ylab="frequency")
{3} [15] (ii) (a) round(mean(pm),3) round(var(pm),3) (b) MC mean and variance of posterior mean estimates: 99.904, 51.167. {2} mg = numeric(m) for(m in 1:M){ lam = rgamma(1,shape=alpha,rate=beta) x = rpois(1,lam) y = rgamma(1000,shape=alpha+x, rate=beta+1) mg[m] = mean(y) } round(mean(mg),3) round(var(mg),3) MC mean and variance of posterior from Gamma samples: 99.900, 51.238. {10} [12] Note that this solution to part (ii) uses a new set of Monte Carlo repetitions. This is not necessary, and full credit can be given for combining parts (i) and (ii) in a single exercise. Clearly the precise numerical values for the means and variances will differ from implementation to implementation.
(iii) The similarity of the Monte Carlo estimates of the mean and variance and those from the Gamma(α + xx, β + 1) sample demonstrates that the posterior Distribution for the Poisson/Gamma credibility model is the Gamma(α + xx, β + 1). [3] [Total 30] Question 3 (i) All values are positive integer with some values more than 1, so use Poisson distribution as error structure. [5] (ii) model <- glm(formula = Claim.number ~ Age + factor(car.group) + Area + factor(ncd) + Gender, data = datatrain, family = poisson()) summary(model) Deviance Residuals: Min 1Q Median 3Q Max -1.3203-0.5624-0.4445-0.3185 3.4254 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.895413 0.245584-7.718 1.18e-14 *** Age -0.008371 0.001231-6.802 1.03e-11 *** factor(car.group)2 0.283162 0.285122 0.993 0.320648 factor(car.group)3 0.311122 0.282852 1.100 0.271356 factor(car.group)4 0.116037 0.292479 0.397 0.691563 factor(car.group)5 0.578672 0.265600 2.179 0.029352 * factor(car.group)6 0.912683 0.251178 3.634 0.000279 *** factor(car.group)7 0.260364 0.287397 0.906 0.364968 factor(car.group)8 0.914236 0.253831 3.602 0.000316 *** factor(car.group)9 0.877120 0.252408 3.475 0.000511 *** factor(car.group)10 0.914426 0.250011 3.658 0.000255 *** factor(car.group)11 0.799044 0.250434 3.191 0.001420 ** factor(car.group)12 1.025303 0.245168 4.182 2.89e-05 *** factor(car.group)13 1.011678 0.248305 4.074 4.61e-05 *** factor(car.group)14 1.118560 0.242695 4.609 4.05e-06 *** factor(car.group)15 1.103179 0.244711 4.508 6.54e-06 *** factor(car.group)16 0.996932 0.247455 4.029 5.61e-05 *** factor(car.group)17 1.128584 0.242389 4.656 3.22e-06 *** factor(car.group)18 1.198728 0.239721 5.001 5.72e-07 *** factor(car.group)19 1.422781 0.238307 5.970 2.37e-09 *** factor(car.group)20 1.317913 0.238579 5.524 3.31e-08 *** AreaEast Midlands 0.146664 0.132611 1.106 0.268739 AreaLondon 0.318305 0.126773 2.511 0.012045 * AreaNI 0.393303 0.125065 3.145 0.001662 ** AreaNorth East -0.060812 0.138426-0.439 0.660439 AreaNorth West -0.193799 0.143745-1.348 0.177590 AreaSouth East -0.323157 0.151830-2.128 0.033303 * AreaSouth West -0.097663 0.142546-0.685 0.493256 AreaWales -0.309704 0.148283-2.089 0.036744 * AreaWest Midlands -0.068206 0.141474-0.482 0.629730 AreaYorkshire and the Humber 0.100272 0.132276 0.758 0.448421 factor(ncd)1-0.456078 0.086523-5.271 1.36e-07 *** factor(ncd)2-0.679426 0.091303-7.441 9.96e-14 *** factor(ncd)3-0.885118 0.096849-9.139 < 2e-16 ***
factor(ncd)4-0.963244 0.101502-9.490 < 2e-16 *** factor(ncd)5-1.097532 0.104299-10.523 < 2e-16 *** GenderMale 0.259982 0.064879 4.007 6.14e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 4569.9 on 7999 degrees of freedom Residual deviance: 4085.6 on 7963 degrees of freedom AIC: 6455.7 [10] (iii) Male policyholders have higher mean of reported claims (by exp(0.259982) 1 = 29.7%) than female policyholders. The difference is significant (p-value = 6.14e-5). [10] (iv) Compare to Null model; the deviance is reduced by 484.3 while the degrees of freedom reduce by 36. The observed difference in deviance (484.3) is very high compared to the values of the χχ 2 36 distribution, so the fitted model is significant/good (alternatively, 2 compare the deviance of the fitted model (4085.6) to the χχ 7963 distribution.) [10] (v) (a) datatrain$age2= datatrain$age^2 {2} (b) model1 <- glm(formula = Claim.number ~ Age + Age2 + factor(car.group) + Area + factor(ncd) + Gender, data = datatrain, family = poisson()) summary(model1) Deviance Residuals: Min 1Q Median 3Q Max -1.2760-0.5532-0.4261-0.3063 3.2856 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -4.118e-01 2.816e-01-1.462 0.143613 Age -7.268e-02 6.329e-03-11.484 < 2e-16 *** Age2 5.603e-04 5.405e-05 10.366 < 2e-16 *** factor(car.group)2 2.913e-01 2.852e-01 1.022 0.307011 factor(car.group)3 2.854e-01 2.829e-01 1.009 0.313049 factor(car.group)4 1.361e-01 2.925e-01 0.465 0.641617 factor(car.group)5 5.608e-01 2.656e-01 2.111 0.034750 * factor(car.group)6 8.935e-01 2.512e-01 3.557 0.000376 *** factor(car.group)7 2.729e-01 2.875e-01 0.949 0.342393 factor(car.group)8 9.135e-01 2.539e-01 3.598 0.000320 *** factor(car.group)9 8.766e-01 2.524e-01 3.473 0.000515 *** factor(car.group)10 9.127e-01 2.501e-01 3.650 0.000263 *** factor(car.group)11 8.012e-01 2.505e-01 3.198 0.001383 ** factor(car.group)12 1.039e+00 2.452e-01 4.239 2.25e-05 *** factor(car.group)13 9.643e-01 2.483e-01 3.883 0.000103 *** factor(car.group)14 1.109e+00 2.427e-01 4.568 4.93e-06 *** factor(car.group)15 1.111e+00 2.445e-01 4.543 5.54e-06 *** factor(car.group)16 9.760e-01 2.474e-01 3.945 7.99e-05 ***
factor(car.group)17 1.131e+00 2.424e-01 4.667 3.06e-06 *** factor(car.group)18 1.188e+00 2.397e-01 4.957 7.14e-07 *** factor(car.group)19 1.420e+00 2.383e-01 5.962 2.49e-09 *** factor(car.group)20 1.322e+00 2.386e-01 5.540 3.03e-08 *** AreaEast Midlands 9.654e-02 1.327e-01 0.727 0.466981 AreaLondon 3.190e-01 1.268e-01 2.516 0.011863 * AreaNI 3.837e-01 1.251e-01 3.067 0.002161 ** AreaNorth East -6.318e-02 1.385e-01-0.456 0.648255 AreaNorth West -2.015e-01 1.438e-01-1.401 0.161092 AreaSouth East -3.268e-01 1.519e-01-2.151 0.031438 * AreaSouth West -1.148e-01 1.426e-01-0.805 0.420806 AreaWales -3.166e-01 1.484e-01-2.134 0.032820 * AreaWest Midlands -8.999e-02 1.415e-01-0.636 0.524886 AreaYorkshire and the Humber 1.060e-01 1.323e-01 0.801 0.423004 factor(ncd)1-4.530e-01 8.657e-02-5.233 1.67e-07 *** factor(ncd)2-6.705e-01 9.130e-02-7.344 2.07e-13 *** factor(ncd)3-8.753e-01 9.692e-02-9.031 < 2e-16 *** factor(ncd)4-9.487e-01 1.016e-01-9.342 < 2e-16 *** factor(ncd)5-1.089e+00 1.043e-01-10.439 < 2e-16 *** GenderMale 2.657e-01 6.496e-02 4.091 4.30e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 4569.9 on 7999 degrees of freedom Residual deviance: 3981.3 on 7962 degrees of freedom AIC: 6353.4 {12} (c) The p-value of the age squared coefficient shows that it is significant. Also, the deviance is reduced more than twice the change in degrees of freedom. So the variable is significantly associated with the number of reported claims. {6} [20] [Total 55] END OF MARKING SCHEDULE