Archive | January, 2014

Historical Musings on R

31 Jan

I don’t usually reblog, but this seems like an interesting link

Does what it says on the tin!

It’s actually almost a reblog itself, basically being a YouTube conversation between S orginator John Chambers, and celebrity statistician Trevor Hastie.

Kaggle Leaderboard Weirdness

29 Jan

Earlier this week I finally, after about half a dozen false starts, posted a legal entry to a Kaggle competition, and then when I saw how far off the pace I was, I posted another half a dozen over the course of a day, improving very slightly each time. If the competition ran for a decade, I’d have a pretty good chance of winning, I reckon…

While I now understand how addictive Kaggle is – it hits the sweet spot between instant gratification and highly delayed gratification – I find the leaderboard kind of weird and frustrating because so many people upload the benchmark – the trivial solution the competition organisers upload to have a line in the sand. In this competition, the benchmark is a file of all zeroes.

This time yesterday, there were around a hundred entries that were just the benchmark, out of about 180. Today, for some reason, all the entries so far appear to have been removed, so there are only about thirty – but twenty of those are the benchmark again! I get that people just want to upload something so they can say they participated, but so many all zero files is just the thing getting out of hand.

2014: New Year’s Plans (Dreams?)

14 Jan

This is the first in a short series, and covers my R and computer programming pipe dreams for 2014. Another post will cover my maths and statistics pipe dreams, and who knows, I may find there are other dreams not covered at all.

To a certain extent, these pipe dreams begin to make concrete the drift away from actuarial studies that some of the more careful readers may have noticed. Since I left engineering, and became effectively a predictive modeler, a lot of the impetus to complete actuarial studies has fallen away. To me, though, the two areas are certainly related, and I present exhibit A, my earlier post on ‘Data Mining in the Insurance Industry’, which effectively covers papers explaining how to do some of the goals of CT6 by different means, to support this claim.

My immediate plans, then, come in three buckets – learn more maths, learn more computer programming and learn more statistics (in which category I include statistical and machine learning). The aim of the first two is obviously to support the last aim, so the selection of topics will be somewhat influenced by this consideration.

In this post, I will just talk about computer programming, as I rambling enough, without trying to cover three different areas of self learning. I am taking my cues in this area from a couple of blog posts from Cosma Shalizi, where he puts the case for computer programming as a vital skill for statisticians, and gives some basic prompts on what this means in practice.

Shalizi’s first piece of advice is to take a real programming class, or, if you can’t do that, read a real programming book. He recommends Structure and Interpretation of Computer Programs, and seeing as it is available for free, I say ‘that will do just fine’.

SICP, as it seems to be popularly known, teaches programming via the functional programming language Scheme. I would like to learn a little about functional programming, but I would also like to lean a programming language which is more commonly used for data analysis. Hence in addition to reading SICP  I want to read Think Python, which is also free, but which teaches the Python language (obviously)

Both of these books are listed, with many others on the GITHUB Free Programming Books page