Archive | Bayesian Statistics RSS feed for this section

Hatfields and McCoys

18 Nov

The recently published cartoon by xkcd below tore the scab off the festering tension between the Hatfields and the McCoys of the statistical world – ‘Bayesians’ and ‘Frequentists’.

http://imgs.xkcd.com/comics/frequentists_vs_bayesians.png

Bayesians vs Frequentists

While these terms could simply refer to categories of statistical tools, or differing viewpoints on the way statistics and probability can be interpreted, they have turned into the names of two warring factions.
Some of the flavour of the argument, as re-ignited by the cartoon can be seen here http://andrewgelman.com/2012/11/16808/

For a longer view see here for a review of a book charting the development of statistics and the Bayesian/ Frequentist rivalry.
In the mean time, when learning about Bayesian statistics for actuarial exams you can rise above it all, and just choose the best tool for the job. A hammer doesn’t mind if sometimes you a screwdriver.

Credibility Theory Continued

14 Nov

It is straight forward to explain express the problem of credibility – given the existence of a well established premium of interest, how much credibility should be given to smaller, more specific data which supports a different parameter.

There are libraries of methods to assess credibility, but the CT6 syllabus focuses on a small number of Bayesian methods, all of which could be called empirical Bayes in the sense used by  statisticians such as George Cassella (although the CT6 notes use Empirical Bayes to refer to a specific non-parametric methods in Chapter 6). Loss Models explores a larger selection of methods than the CT6 core reading, and skimming some of the non-CT6 methods or even reading the table of contents can be a good way to contextualise what’s in the core reading.

Note that both Loss Distributions and the SOA/CAS have more methods of assessing credibility than the British CT6 core reading. R.E. Beard comments in Risk Theory (and possibly the comment can be found in Daykin’s Practical Risk Theory ?) that the study of Credibility was done with more enthusiasm in the States than in the UK.

In the first of the two chapters on Credibility in theCT6 core reading, then, three models are presented. The first is a thumbnail sketch of limited fluctuation credibility, the ‘old school’ credibility method, which is treated in more detail in Loss Distributions or in one of the references from my last post.

More serious treatment is given to the two Bayesian models : Poisson-gamma for counts data (number of claims arriving) and normal-normal for claim severity. These are both common empirical Bayes methods, and the first can be found as the illustrative example on the Wikipedia entry for Empirical Bayes – although note that the treatment gives different equations to those in the core reading as it is for a single observation model (which has a link to a related motor accident example).

The normal-normal model is also a commonly used Empirical Bayes model, and slightly more complex than the Poisson-gamma model. It is a little harder to find free web resources for the Normal-Normal model, although there is a great introductory paper which covers them here -> www.biostat.jhsph.edu/~fdominic/teaching/…/Casella.EmpBayes.pdf (although you need to not be put off by a more ‘mathy’ treatment). The normal-normal is also summarised here -> http://www.biostat.jhsph.edu/~fdominic/teaching/BM/2-4.pdf, where it is called the normal model for unknown mean, known variance.

As far as both of these models are concerned, there are treatments of both in widely available texts, which are identical to what the core reading presents, except that they don’t present formulae for finding Z, which is only really required to make the models comparable to the limited fluctuation models.

Next time, we shall look at what the Core Reading calls ‘Empirical Bayes Credibility’ and the rest of the world knows as a non-parametric sub-species of ‘Empirical Bayes’. This is seems to be an area of the Core Reading which is frequently panned on forums, and hopefully we can find some other resources to look at which are a tad clearer.

 

Chapter 2: Bayesian Statistics

15 Oct

Bayesian statistics is one of the few areas in the actuarial syllabus I’ve seen before, but when I first encountered it as a beginning statistics major, it made no sense, both from the point of view of how to do it, and from the point of view of what for.

To understand why Bayesian statistics might be important to an actuary, well the best thing to do is to read the rest of the CT6 (or C/4) notes. To understand why and how it is interesting to a statistician you could read the Scholarpedia article -> http://www.scholarpedia.org/article/Bayesian_statistics

This article has been written and reviewed by some of the biggest names currently working in the field!

The scholarpedia version is, despite being relatively short, magisterial and comprehensive. As one might expect from those involved, some of the biggest names in Bayesian statistics academic practice.

For another short and sweet view of the topic, one could also read the introduction at the bottom of the page (a small amount of scrolling may be required before you get to it as of today, 15/10/2012, due to election notices) at bayesian.org under the heading ‘What is Bayesian Analysis?’

For me, an obvious omission from the Acted treatment of this topic is that after emphasising the use of conjugate priors, there was no discussion on how to find the damn things. Also, not much discussion of diffuse priors. With respect to the first point, the notes make it seem like conjugate priors are usually available, whereas they are very rare outside the exponential family (although the exponential family does, of course, contain some of the most used probability distributions). The strangeness is compounded given that it is essential to understand exponential families of distributions in order to understand generalised linear models, and hence this family of distributions is taught later in the subject.

With respect to diffuse priors, it should be noted that they are also difficult critters, and it is hard to find truly non-informative priors. James Berger, one of the heavyweights of Bayesian decision theory apparently only admits the existence of four (4) (the word apparently appears in the preceeding sentence because my only reference is my third year Bayesian Statistics lecture notes, and the quote is not referenced.Most likely it appears in this paper http://www.stat.duke.edu/~berger/papers/catalog.html(1985), but I can’t be certain because I only found the paper one second before rewriting this parenthesis. Reading it will have to wait), although I think at least one of his four is a set of priors, rather than a single specific distribution.

To give weight to my rant about how easy it is to find conjugate priors, I give below the steps to finding the proposed by Raiffa and Schlaifer (not quite the originators of the term and the concept, but they appear to have it given natural conjugates a lot of momentum), as written in S. James Press Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Second Edition (Dover Books on Mathematics)
) , a text available in a Dover reprint for only slightly more than a nominal amount.

“…write the density or likelihood function for the observable random variables and then interchange the roles of the observable random variables and the parameters, assuming the latter to be random and the former to be fixed and known. Modifying the proportionlity constant appropriately so that the new ‘density’ integrates to unity and letting the fixed parameters be abitrary provides a density that is in this sense is ‘conjugate’ to the original.”

Not terrifying difficult, but maybe not trivial enough to not be a distraction if you’re not specifically after testing it?