I found this chapter more enjoyable than expected (not quite enjoyable the way ice cream is enjoyable, but still), essentially because it re-introduced me to some statistical ideas I hadn’t looked at for a long time. In some respects it was trip back in time, to simple discussions of MLEs and methods of moments estimators – the first things I learned in the last course I took in a physical classroom.

A quick review of the actual material. We are presented with the main probability distributions used to model the size of individual insurance claims – exponential, normal, lognormal, gamma, Pareto, generalised Pareto, Burr and Weibull. It is observed that distributions individual claims are often positively skewed, which is an obvious influence on this list. Much of chapter is taken up with defining the characteristics of these distributions – their distribution functions and moments.

The second part of the material is a discussion of three methods of fitting distributions to data, or more precisely, of finding the parameters which fit the data the best, given a particular choice of distribution. After you have done this, the only place to go is obviously to perform some sort of formal test of goodness of fit, and the annointed method to be in the CT6 material is the chi-squared test. A solid choice.

The material is rounded out with some commentary on mixture distributions, which may or may not be important for modelling individual losses, but will definitely be important for modelling portfolios of losses where the number of losses and the size of these losses are random varables.

It is my intention from now to make reference to the US/ Canadian exam system, where the equivalent SOA/CAS/ CIA exam is Exam C/4. I have two reasons for doing this.

Number 1: Some of my readers may be in the US or Canada. The stats WordPress gives me tells me that half my readers this week were in the US, albeit from a very low readership so far.

Numbert 2: There are many free resources available for the SOA/CAS exams, probably far more than for the Institute and Faculty of Actuaries exams. Most likely this simply follows from the size of the United States. Whatever, I figure knowing which material is equivalent gives people on either side of the Atlantic access to a lot more teaching aides and practice questions. In future posts, I aim to mention some of these free resources that can be applied.

In the present case, the CT6 core reading covers topics also in the SOA/CAS syllabus, grouped together in simiarl way. In fact, the main difference is that the SOA/CAS material goes further, presenting some extra tests of hypotheses and some more material on graphical methods (or so I infer from blogs and forums dealing with preparation for this exam, and also from the text ‘Loss Models From Data to Decisions by Willmot, Klugman and Panjer, which appears to follow the US-Canadian Exam C/4 material very closely in its choice of topics).

I am developing a habit of writing at least one thing that I would do differently if I were writing the material on a particular topic. In my last post I said I would introduce the exponential family earlier. In the C/4 exam there is apparently some mention made of graphical methods of assessing goodness of fit. Why not in CT6? Not even the simple Q-Q plot is there. (if you haven’t met the Q-Q plot, take the path of least resistance, and let Wikipedia save you -> http://en.wikipedia.org/wiki/Q-Q_plot)

It’s intuitive, it’s easy to interpret, it highlights differences in skewness and tail weight, the kind of differences that this chapter of CT6 actually emphasises. It can be done instantly in R, or it can be done with a little bit of stuffing around in Excel.

It is as simple as plotting the quantiles of your data set against the quantiles of a proposed distribution. If it’s a nice straight line at 45 degrees, they match exactly. The further you move from this ideal, the less they match. If they match in the middle but diverge at either or both ends, then one has heavier tales than the other. If the match at one end but not at the other, they have different skewness.

‘How to Lie with Statistics’ makes statistics seem like its all about the graphs. And why not? Let’s get visual!

## Leave a Reply