Archive | December, 2013

Data Mining in Insurance

17 Dec

I have been working on a project which uses data mining techniques to use predict insurance outcomes. I have leaving it for a long time to write up the resources I found, as part of the point of this blog was to diarise the stuff I found during exactly this sort of research. I think that originally I wanted to give relatively detailed summaries of these items, but I begin to realise that I am in danger of never writing them up at all.

This first pamphlet is a good high level summary of data mining techniques and how they can be applied to some general insurance problems. Handy if you need to explain concepts to a non technical person.

The paper below emphasizes CART across a range of insurance contexts, and like the paper below discusses hybridising CART and MARS techniques (although they are by the same authors).

Below is a comprehensive study of claim size prediction using a hybrid CART/ MARS model. Interestingly, the hybridisation is achieved within a single model, rather than creating separate models within an enesmble, for example by boosting. The authors don’t address the topic of boosting at all, in fact, which possibly a more obvious approach. This presentation is in fact a more detailed look at one of examples from the paper above.

This last is a more specific look at text mining in relation to a topic which is one of the concerns of the CT6 exam – claim prediction – but obviously using techniques not currently set for examination in the actuarial exam system.

Sad News

17 Dec

Normal Deviate is no more! Vale and vive!

Hmmm, I have no cats, but my posting is haphazard! Hopefully, I can rise to Normal Deviate’s standard of targetted and thoughtful blogging as my blogging prowess matures.

The Epitome of Data Science

3 Dec

Robert Christian is a leading Bayesian statistician, and, like many Bayesian statisticians, an avid blogger (really, frequentists don’t seem to blog as much. Or maybe, there are really only Bayesian and ambivalent/ agnostic statisticians these days).

Christian generously posts what he is doing with his classes on his blog (or ‘og, as he prefers). For a few years now, he held a seminar series on classic papers (list found here: Last week, one of his students found a paper not included on the list which in some ways symbolises the meaning of data science as where statistics meets computer science:

The paper is here:

And here is Christian’s write up of his student’s seminar with his own response to the paper

The paper is simply a proposal of how to calculate some commonly used statistics on data too big to fit in memory, using the approach of chopping the data set into smaller pieces. Christian raises some mathematical concerns.

In some ways, though, the correctness of the approach is not as interesting as the fact that academic statisticians are putting serious effort into dealing with the obstacles thrown up by datasets being greater than computers’ ability to process them, which will hopefully lead to the discipline of statistics having more of a Big Data voice. It is weird, though, that by doing this sort of work, we have gone full circle to the pre-computing age, where finding workable approximations to allow calculation by hand of statistics on data with a few hundred rows was a serious topic of interest. All of which makes re-reading the review ( of Quenouille’s Rapid Statistical Calculations (which I have never seen for sale anywhere) a slightly odd experience when the reviewer says that computers have made that sort of thing irrelevant!