INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS. Subject CS1B Actuarial Statistics

Similar documents
Comparing R print-outs from LM, GLM, LMM and GLMM

Poisson GLM, Cox PH, & degrees of freedom

wine 1 wine 2 wine 3 person person person person person

The R survey package used in these examples is version 3.22 and was run under R v2.7 on a PC.

Missing Data Treatments

Faculty of Science FINAL EXAMINATION MATH-523B Generalized Linear Models

Summary of Main Points

> Y=degre=="deces" > table(y) Y FALSE TRUE

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

PSYC 6140 November 16, 2005 ANOVA output in R

Online Appendix to. Are Two heads Better Than One: Team versus Individual Play in Signaling Games. David C. Cooper and John H.

STAT 5302 Applied Regression Analysis. Hawkins

Model Log-Linear (Bagian 2) Dr. Kusman Sadik, M.Si Program Studi Pascasarjana Departemen Statistika IPB, 2018/2019

Guatemala. 1. Guatemala: Change in food prices

Gail E. Potter, Timo Smieszek, and Kerstin Sailer. April 24, 2015

Final Exam Financial Data Analysis (6 Credit points/imp Students) March 2, 2006

AJAE Appendix: Testing Household-Specific Explanations for the Inverse Productivity Relationship

To: Professor Roger Bohn & Hyeonsu Kang Subject: Big Data, Assignment April 13th. From: xxxx (anonymized) Date: 4/11/2016

Appendix Table A1 Number of years since deregulation

Which of your fingernails comes closest to 1 cm in width? What is the length between your thumb tip and extended index finger tip? If no, why not?

RELATIVE EFFICIENCY OF ESTIMATES BASED ON PERCENTAGES OF MISSINGNESS USING THREE IMPUTATION NUMBERS IN MULTIPLE IMPUTATION ANALYSIS ABSTRACT

Biologist at Work! Experiment: Width across knuckles of: left hand. cm... right hand. cm. Analysis: Decision: /13 cm. Name

> library(sem) > cor.mat<-read.moments(names=c("ten1", "ten2", "ten3", "wor1", "wor2", + "wor3", "irthk1", "irthk2", "irthk3", "body1", "body2",

Ex-Ante Analysis of the Demand for new value added pulse products: A

Rheological and physicochemical studies on emulsions formulated with chitosan previously dispersed in aqueous solutions of lactic acid

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

HW 5 SOLUTIONS Inference for Two Population Means

Lesson 23: Newton s Law of Cooling

A Note on a Test for the Sum of Ranksums*

Missing Data Methods (Part I): Multiple Imputation. Advanced Multivariate Statistical Methods Workshop

Handling Missing Data. Ashley Parker EDU 7312

Homework 1 - Solutions. Problem 2

2 nd Midterm Exam-Solution

Lack of Credibility, Inflation Persistence and Disinflation in Colombia

Credit Supply and Monetary Policy: Identifying the Bank Balance-Sheet Channel with Loan Applications. Web Appendix

Predicting Wine Quality

Perspective of the Labor Market for security guards in Israel in time of terror attacks

Valuation in the Life Settlements Market

Buying Filberts On a Sample Basis

Zeitschrift für Soziologie, Jg., Heft 5, 2015, Online- Anhang

Panel A: Treated firm matched to one control firm. t + 1 t + 2 t + 3 Total CFO Compensation 5.03% 0.84% 10.27% [0.384] [0.892] [0.

R Analysis Example Replication C10

Multiple Imputation for Missing Data in KLoSA

Statistics 5303 Final Exam December 20, 2010 Gary W. Oehlert NAME ID#

Problem Set #3 Key. Forecasting

Internet Appendix to. The Price of Street Friends: Social Networks, Informed Trading, and Shareholder Costs. Jie Cai Ralph A.

Statistics: Final Project Report Chipotle Water Cup: Water or Soda?

Wine Rating Prediction

Relationships Among Wine Prices, Ratings, Advertising, and Production: Examining a Giffen Good

Online Appendix to The Effect of Liquidity on Governance

Missing value imputation in SAS: an intro to Proc MI and MIANALYZE

Long term impacts of facilitating temporary contracts: A comparative analysis of Italy and Spain using birth cohorts

1.3 Box & Whisker Plots

Bags not: avoiding the undesirable Laurie and Winifred Bauer

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

Northern Region Central Region Southern Region No. % of total No. % of total No. % of total Schools Da bomb

Thought: The Great Coffee Experiment

November K. J. Martijn Cremers Lubomir P. Litov Simone M. Sepe

Cointegration Analysis of Commodity Prices: Much Ado about the Wrong Thing? Mindy L. Mallory and Sergio H. Lence September 17, 2010

MATERIALS AND METHODS

Tariff vs non tariff barriers in seafood trade

Relation between Grape Wine Quality and Related Physicochemical Indexes

Structural Reforms and Agricultural Export Performance An Empirical Analysis

Imputation of multivariate continuous data with non-ignorable missingness

From VOC to IPA: This Beer s For You!

The premium for organic wines

A Study on Consumer Attitude Towards Café Coffee Day. Gonsalves Samuel and Dias Franklyn. Abstract

CHAPTER VI TEA INDUSTRY IN TAMIL NADU

Online Appendix. for. Female Leadership and Gender Equity: Evidence from Plant Closure

BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER ECONOMETRIC ANALYSIS

Figure S2. Measurement locations for meteorological stations. (data made available by KMI:

A latent class approach for estimating energy demands and efficiency in transport:

Citrus Attributes: Do Consumers Really Care Only About Seeds? Lisa A. House 1 and Zhifeng Gao

Napa Highway 29 Open Wineries

Acetic acid dissociates immediately in solution. Reaction A does not react further following the sample taken at the end of

Preferred citation style

J. Best 1 A. Tepley 2

Supporing Information. Modelling the Atomic Arrangement of Amorphous 2D Silica: Analysis

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

The R&D-patent relationship: An industry perspective

Appendix A. Table A.1: Logit Estimates for Elasticities

Return to wine: A comparison of the hedonic, repeat sales, and hybrid approaches

Can You Tell the Difference? A Study on the Preference of Bottled Water. [Anonymous Name 1], [Anonymous Name 2]

Rituals on the first of the month Laurie and Winifred Bauer

ONLINE APPENDIX APPENDIX A. DESCRIPTION OF U.S. NON-FARM PRIVATE SECTORS AND INDUSTRIES

Eestimated coefficient. t-value

Level 2 Mathematics and Statistics, 2016

De La Salle University Dasmariñas

An application of cumulative prospect theory to travel time variability

Flexible Working Arrangements, Collaboration, ICT and Innovation

Selection bias in innovation studies: A simple test

Method for the imputation of the earnings variable in the Belgian LFS

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

*p <.05. **p <.01. ***p <.001.

Predicting Fruitset Model Philip Schwallier, Amy Irish- Brown, Michigan State University

Gender and Firm-size: Evidence from Africa

Volume 30, Issue 1. Gender and firm-size: Evidence from Africa

Survival of the Fittest: The Impact of Eco-certification on the Performance of German Wineries. Patrizia Fanasch University of Paderborn, Germany

Not to be published - available as an online Appendix only! 1.1 Discussion of Effects of Control Variables

Transcription:

INSTITUTE AND FACULTY OF ACTUARIES CURRICULUM 2019 SPECIMEN SOLUTIONS Subject CS1B Actuarial Statistics

Question 1 (i) # Data entry before <- c(155, 152, 146, 153, 146, 160, 139, 148) after <- c(145, 147, 123, 137, 141, 142, 140, 138) # define x as the pair-wise differences of the study's results x <- before - after # define and calculate intermediate variables mx <- mean(x) sdx <- sd(x) nx <- length(x) alpha <- 0.1 # 1-0.9 (the confidence level) t_quantile <- qt(p = alpha / 2, # two-sided interval need to divide alpha by 2 df = nx - 1, lower.tail = FALSE) # gives upper tail i.e. P(X > x) # assuming x follows a normal distribution with unknown variance: c_int <- c(mx - t_quantile * sdx / sqrt(nx), mx + t_quantile * sdx / sqrt(nx)) c_int [1] 5.46661 16.03339 ALTERNATIVE SOLUTIONS Using the t.test function t.test(x = before - after, conf.level = 0.9) One Sample t-test data: before - after t = 3.8549, df = 7, p-value = 0.006253 alternative hypothesis: true mean is not equal to 0 90 percent confidence interval: 5.46661 16.03339 sample estimates: mean of x 10.75 Restricting function output to only return the required confidence interval t.test(x = before - after, conf.level = 0.9)$conf.int [1] 5.46661 16.03339 attr(,"conf.level") [1] 0.9 Using the Paired t-test functionality of the t.test function t.test(x = before, y = after, paired = TRUE, conf.level = 0.9)

Paired t-test data: before and after t = 3.8549, df = 7, p-value = 0.006253 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: 5.46661 16.03339 sample estimates: mean of the differences 10.75 [8] (ii) # define and calculate intermediate values mu <- 10 t_stat <- (mx - mu) / (sdx / sqrt(nx)) pval <- pt(t_stat, df = nx - 1, lower.tail = FALSE) pval [1] 0.3978637 With p-value of 0.3979 we do not reject the null hypothesis at 0.01 significance level ALTERNATIVE SOLUTIONS Using the t.test function which also returns the p-value t.test(x = before - after, alternative = "greater", mu = 10, conf.level = 0.99) One Sample t-test data: before - after t = 0.26894, df = 7, p-value = 0.3979 alternative hypothesis: true mean is greater than 10 99 percent confidence interval: 2.389646 Inf sample estimates: mean of x 10.75 Restrict the t.test function ouptut to only include the p-value t.test(x = before - after, alternative = "greater", mu = 10, conf.level = 0.99)$p.value [1] 0.3978637 Using the Paired t-test functionality of the t.test function t.test(x = before, y = after, paired = TRUE, alternative = "greater", mu = 10,

conf.level = 0.99) Paired t-test data: before and after t = 0.26894, df = 7, p-value = 0.3979 alternative hypothesis: true difference in means is greater than 10 99 percent confidence interval: 2.389646 Inf sample estimates: mean of the differences 10.75 [7] [Total 15] Question 2 (i) (a) Posterior mean given as Z*x + (1-Z)*alpha/beta where alpha/beta is the prior (gamma) mean of λ. {2} M = 1000 alpha = 100 beta = 1 Z = 1/(beta+1) pm = X = numeric(m) for(m in 1:M){ lam = rgamma(1,shape=alpha,rate=beta) x = rpois(1,lam) X[m] = x pm[m] = Z*x + (1-Z)*alpha/beta } {10} (b) hist(pm,main="histogram of posterior means", xlab="posterior mean",ylab="frequency")

{3} [15] (ii) (a) round(mean(pm),3) round(var(pm),3) (b) MC mean and variance of posterior mean estimates: 99.904, 51.167. {2} mg = numeric(m) for(m in 1:M){ lam = rgamma(1,shape=alpha,rate=beta) x = rpois(1,lam) y = rgamma(1000,shape=alpha+x, rate=beta+1) mg[m] = mean(y) } round(mean(mg),3) round(var(mg),3) MC mean and variance of posterior from Gamma samples: 99.900, 51.238. {10} [12] Note that this solution to part (ii) uses a new set of Monte Carlo repetitions. This is not necessary, and full credit can be given for combining parts (i) and (ii) in a single exercise. Clearly the precise numerical values for the means and variances will differ from implementation to implementation.

(iii) The similarity of the Monte Carlo estimates of the mean and variance and those from the Gamma(α + xx, β + 1) sample demonstrates that the posterior Distribution for the Poisson/Gamma credibility model is the Gamma(α + xx, β + 1). [3] [Total 30] Question 3 (i) All values are positive integer with some values more than 1, so use Poisson distribution as error structure. [5] (ii) model <- glm(formula = Claim.number ~ Age + factor(car.group) + Area + factor(ncd) + Gender, data = datatrain, family = poisson()) summary(model) Deviance Residuals: Min 1Q Median 3Q Max -1.3203-0.5624-0.4445-0.3185 3.4254 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.895413 0.245584-7.718 1.18e-14 *** Age -0.008371 0.001231-6.802 1.03e-11 *** factor(car.group)2 0.283162 0.285122 0.993 0.320648 factor(car.group)3 0.311122 0.282852 1.100 0.271356 factor(car.group)4 0.116037 0.292479 0.397 0.691563 factor(car.group)5 0.578672 0.265600 2.179 0.029352 * factor(car.group)6 0.912683 0.251178 3.634 0.000279 *** factor(car.group)7 0.260364 0.287397 0.906 0.364968 factor(car.group)8 0.914236 0.253831 3.602 0.000316 *** factor(car.group)9 0.877120 0.252408 3.475 0.000511 *** factor(car.group)10 0.914426 0.250011 3.658 0.000255 *** factor(car.group)11 0.799044 0.250434 3.191 0.001420 ** factor(car.group)12 1.025303 0.245168 4.182 2.89e-05 *** factor(car.group)13 1.011678 0.248305 4.074 4.61e-05 *** factor(car.group)14 1.118560 0.242695 4.609 4.05e-06 *** factor(car.group)15 1.103179 0.244711 4.508 6.54e-06 *** factor(car.group)16 0.996932 0.247455 4.029 5.61e-05 *** factor(car.group)17 1.128584 0.242389 4.656 3.22e-06 *** factor(car.group)18 1.198728 0.239721 5.001 5.72e-07 *** factor(car.group)19 1.422781 0.238307 5.970 2.37e-09 *** factor(car.group)20 1.317913 0.238579 5.524 3.31e-08 *** AreaEast Midlands 0.146664 0.132611 1.106 0.268739 AreaLondon 0.318305 0.126773 2.511 0.012045 * AreaNI 0.393303 0.125065 3.145 0.001662 ** AreaNorth East -0.060812 0.138426-0.439 0.660439 AreaNorth West -0.193799 0.143745-1.348 0.177590 AreaSouth East -0.323157 0.151830-2.128 0.033303 * AreaSouth West -0.097663 0.142546-0.685 0.493256 AreaWales -0.309704 0.148283-2.089 0.036744 * AreaWest Midlands -0.068206 0.141474-0.482 0.629730 AreaYorkshire and the Humber 0.100272 0.132276 0.758 0.448421 factor(ncd)1-0.456078 0.086523-5.271 1.36e-07 *** factor(ncd)2-0.679426 0.091303-7.441 9.96e-14 *** factor(ncd)3-0.885118 0.096849-9.139 < 2e-16 ***

factor(ncd)4-0.963244 0.101502-9.490 < 2e-16 *** factor(ncd)5-1.097532 0.104299-10.523 < 2e-16 *** GenderMale 0.259982 0.064879 4.007 6.14e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 4569.9 on 7999 degrees of freedom Residual deviance: 4085.6 on 7963 degrees of freedom AIC: 6455.7 [10] (iii) Male policyholders have higher mean of reported claims (by exp(0.259982) 1 = 29.7%) than female policyholders. The difference is significant (p-value = 6.14e-5). [10] (iv) Compare to Null model; the deviance is reduced by 484.3 while the degrees of freedom reduce by 36. The observed difference in deviance (484.3) is very high compared to the values of the χχ 2 36 distribution, so the fitted model is significant/good (alternatively, 2 compare the deviance of the fitted model (4085.6) to the χχ 7963 distribution.) [10] (v) (a) datatrain$age2= datatrain$age^2 {2} (b) model1 <- glm(formula = Claim.number ~ Age + Age2 + factor(car.group) + Area + factor(ncd) + Gender, data = datatrain, family = poisson()) summary(model1) Deviance Residuals: Min 1Q Median 3Q Max -1.2760-0.5532-0.4261-0.3063 3.2856 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -4.118e-01 2.816e-01-1.462 0.143613 Age -7.268e-02 6.329e-03-11.484 < 2e-16 *** Age2 5.603e-04 5.405e-05 10.366 < 2e-16 *** factor(car.group)2 2.913e-01 2.852e-01 1.022 0.307011 factor(car.group)3 2.854e-01 2.829e-01 1.009 0.313049 factor(car.group)4 1.361e-01 2.925e-01 0.465 0.641617 factor(car.group)5 5.608e-01 2.656e-01 2.111 0.034750 * factor(car.group)6 8.935e-01 2.512e-01 3.557 0.000376 *** factor(car.group)7 2.729e-01 2.875e-01 0.949 0.342393 factor(car.group)8 9.135e-01 2.539e-01 3.598 0.000320 *** factor(car.group)9 8.766e-01 2.524e-01 3.473 0.000515 *** factor(car.group)10 9.127e-01 2.501e-01 3.650 0.000263 *** factor(car.group)11 8.012e-01 2.505e-01 3.198 0.001383 ** factor(car.group)12 1.039e+00 2.452e-01 4.239 2.25e-05 *** factor(car.group)13 9.643e-01 2.483e-01 3.883 0.000103 *** factor(car.group)14 1.109e+00 2.427e-01 4.568 4.93e-06 *** factor(car.group)15 1.111e+00 2.445e-01 4.543 5.54e-06 *** factor(car.group)16 9.760e-01 2.474e-01 3.945 7.99e-05 ***

factor(car.group)17 1.131e+00 2.424e-01 4.667 3.06e-06 *** factor(car.group)18 1.188e+00 2.397e-01 4.957 7.14e-07 *** factor(car.group)19 1.420e+00 2.383e-01 5.962 2.49e-09 *** factor(car.group)20 1.322e+00 2.386e-01 5.540 3.03e-08 *** AreaEast Midlands 9.654e-02 1.327e-01 0.727 0.466981 AreaLondon 3.190e-01 1.268e-01 2.516 0.011863 * AreaNI 3.837e-01 1.251e-01 3.067 0.002161 ** AreaNorth East -6.318e-02 1.385e-01-0.456 0.648255 AreaNorth West -2.015e-01 1.438e-01-1.401 0.161092 AreaSouth East -3.268e-01 1.519e-01-2.151 0.031438 * AreaSouth West -1.148e-01 1.426e-01-0.805 0.420806 AreaWales -3.166e-01 1.484e-01-2.134 0.032820 * AreaWest Midlands -8.999e-02 1.415e-01-0.636 0.524886 AreaYorkshire and the Humber 1.060e-01 1.323e-01 0.801 0.423004 factor(ncd)1-4.530e-01 8.657e-02-5.233 1.67e-07 *** factor(ncd)2-6.705e-01 9.130e-02-7.344 2.07e-13 *** factor(ncd)3-8.753e-01 9.692e-02-9.031 < 2e-16 *** factor(ncd)4-9.487e-01 1.016e-01-9.342 < 2e-16 *** factor(ncd)5-1.089e+00 1.043e-01-10.439 < 2e-16 *** GenderMale 2.657e-01 6.496e-02 4.091 4.30e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 4569.9 on 7999 degrees of freedom Residual deviance: 3981.3 on 7962 degrees of freedom AIC: 6353.4 {12} (c) The p-value of the age squared coefficient shows that it is significant. Also, the deviance is reduced more than twice the change in degrees of freedom. So the variable is significantly associated with the number of reported claims. {6} [20] [Total 55] END OF MARKING SCHEDULE