As suggested in my last post but one, I am attempting a parallel reading of Johnson and Kuhn’s Applied Predictive Modelling and Hastie, Tibshirani and Friedman’s Elements of Statistical Learning. Johnson and Kuhn themselves give implicit support to this approach, recommending Hastie et al. as a more deeply mathematical companion to their applied text. Hence, this is something of an experiment to see how reading the two together ‘works’. There’s fairly obviously a large amount of material between the two texts – I will only attempt a reading of a small and hopefully representational selection of topics.

Kuhn and Johnson almost don’t seem to have their hearts in the discussion on logistic regression. Their conclusion is that there are better ways to model binary data, and logistic regression is just the warm up act for those. More properly, they prefer algorithms more suited to unsupervised learning, and want to introduce algorithms which require less human training.

For mine, this is a pity, as clients who claim they want the most accurate model may later turn out to really want the most explainable model. It may be that they just think that there has been so much written on topic and enough of it sufficiently well that there is nothing new to say. They recommend Regression Modelling Strategies (Harrell, 2001) as a text for learning more about logistic regression.

I don’t have this book, but from the course notes made available by the author (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf) it does look like a text with a great deal of useful advice for practicioners. According to the book’s website (which has a great many useful links but also a great many dead links), there is a second edition due any time now (intended for September 2013), so maybe hold off until it appears.

Hastie et al’s approach, is, as expected, more mathematical than the approach used by Kuhn and Johnson. The Hastie et al approach is to derive the loglinear equations for conditional probability in the multinomial case, and observe that the 2 class model simplifies the equations considerably.

In the Hastie et al treatment, this derivation is necessary as they proceed to explain how to use estimate the parameters using the Newton-Raphson algorithm – providing the nuts and bolts for someone who wants to program the algorithm for themselves. Kuhn and Johnson assume that you think more like engineers than maths researchers, and are happy for someone else to do the programming for you.

While not everyone may have been craving such a strong dose of matrix algebra and numerical integration, if you need a refresher, or just extra advice, on how logistic regression models are traditionally interpreted, Hastie et al’s example performs this function, whereas Kuhn and Johnson adds little in this area, instead briefly calculating their example’s ROC curve and AUC in order to compare it with other classifier models. Hastie et al also provides high level advice on how to choose variables when using a logistic regression model.

Possibly this was not an ideal staring place, as the best aspects of the logistic regression in Kuhn Johnson were more the pointers to other resources. We will see later how things pan out when looking at linear discriminant analysis.