Archive | October, 2013

Logistic Regression: a Predictive Modelling View

25 Oct

As suggested in my last post but one, I am attempting a parallel reading of Johnson and Kuhn’s Applied Predictive Modelling and Hastie, Tibshirani and Friedman’s Elements of Statistical Learning. Johnson and Kuhn themselves give implicit  support to this approach, recommending Hastie et al. as a more deeply mathematical companion to their applied text. Hence, this is something of an experiment to see how reading the two together ‘works’. There’s fairly obviously a large amount of material between the two texts – I will only attempt a reading of a small and hopefully representational selection of topics.

Kuhn and Johnson almost don’t seem to have their hearts in the discussion on logistic regression. Their conclusion is that there are better ways to model binary data, and logistic regression is just the warm up act for those. More properly, they prefer algorithms more suited to unsupervised learning, and want to introduce algorithms which require less human training.

 For mine, this is a pity, as clients who claim they want the most accurate model may later turn out to really want the most explainable model. It may be that they just think that there has been so much written on topic and enough of it sufficiently well that there is nothing new to say. They recommend Regression Modelling Strategies (Harrell, 2001) as a text for learning more about logistic regression.

I don’t have this book, but from the course notes made available by the author ( it does look like a text with a great deal of useful advice for practicioners. According to the book’s website (which has a great many useful links but also a great many dead links), there is a second edition due any time now (intended for September 2013), so maybe hold off until it appears.

Hastie et al’s approach, is, as expected, more mathematical than the approach used by Kuhn and Johnson. The Hastie et al approach is to derive the loglinear equations for conditional probability in the multinomial case, and observe that the 2 class model simplifies the equations considerably.

In the Hastie et al treatment, this derivation is necessary as they proceed to explain how to use estimate the parameters using the Newton-Raphson algorithm – providing the nuts and bolts for someone who wants to program the algorithm for themselves. Kuhn and Johnson assume that you think more like engineers than maths researchers, and are happy for someone else to do the programming for you.

While not everyone may have been craving such a strong dose of matrix algebra and numerical integration, if you need a refresher, or just extra advice, on how logistic regression models are traditionally interpreted, Hastie et al’s example performs this function, whereas Kuhn and Johnson adds little in this area, instead briefly calculating their example’s ROC curve and AUC in order to compare it with other classifier models. Hastie et al also provides high level advice on how to choose variables when using a logistic regression model.

Possibly this was not an ideal staring place, as the best aspects of the logistic regression in Kuhn Johnson were more the pointers to other resources. We will see later how things pan out when looking at linear discriminant analysis.

Just Plain Silliness

21 Oct

Result of search for lectures given by Professor F Harrell, Chair of Biostatistics at Vanderbilt University. I guess the YouTube search engine got it a bit right, as there is a clear biostats theme.

Predictive Modeling

21 Oct

Over the next few blog posts, which may be intermittent, but hopefully with smaller gaps than the last couple of gaps, we are going to take a sideways tour into predictive modelling, which is closer to what I am currently doing than strictly actuarial studies. Just as before, for me the purpose is to force close study, and if others can benefit, that’s a bonus.

Recently I received from a riparian bookselling website the book Applied Predictive Modeling (Kuhn and Johnson, 2013) (note one ‘l’) , having ordered it only three months earlier. As the title suggests, the thrust of this text is introduce predictive modeling techniques (whether originating as data mining or statistical techniques) in the context of their application to problem solving, rather than with respect to their theoretical origins or with a view to critiquing them, mathematically or otherwise. In fact, the authors suggest The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
(Hastie, et al) as a good theoretical companion.

As a device for forcing reading with a critical mind, I propose to read and compare the sections of both books dealing with the same topics, starting with the topics I am personally most familiar with, before moving to a couple of areas newer to me. Part of the object is to discover or partially uncover where the practical and theoretical are different and where one ways gives way for the other and back again.

Before the end of this tour we will also look at the sections on data pre-processing and ‘other considerations’ which bookend the discussions of individual modelling techniques. In some ways these sections are the most important, as they provide an especial opportunity for the authors to discuss the practice of modelling, the book’s raison d’être and strength, as well as being the areas in this text that are least often discussed in other texts.