Robert Christian is a leading Bayesian statistician, and, like many Bayesian statisticians, an avid blogger (really, frequentists don’t seem to blog as much. Or maybe, there are really only Bayesian and ambivalent/ agnostic statisticians these days).

Christian generously posts what he is doing with his classes on his blog (or ‘og, as he prefers). For a few years now, he held a seminar series on classic papers (list found here: https://www.ceremade.dauphine.fr/~xian/M2classics.html). Last week, one of his students found a paper not included on the list which in some ways symbolises the meaning of data science as where statistics meets computer science:

The paper is here:

http://www.personal.psu.edu/users/j/x/jxz203/lin/Lin_pub/2013_ASMBI.pdf

And here is Christian’s write up of his student’s seminar with his own response to the paper

http://xianblog.wordpress.com/2013/11/29/reading-classics-3-2/

The paper is simply a proposal of how to calculate some commonly used statistics on data too big to fit in memory, using the approach of chopping the data set into smaller pieces. Christian raises some mathematical concerns.

In some ways, though, the correctness of the approach is not as interesting as the fact that academic statisticians are putting serious effort into dealing with the obstacles thrown up by datasets being greater than computers’ ability to process them, which will hopefully lead to the discipline of statistics having more of a Big Data voice. It is weird, though, that by doing this sort of work, we have gone full circle to the pre-computing age, where finding workable approximations to allow calculation by hand of statistics on data with a few hundred rows was a serious topic of interest. All of which makes re-reading the review (http://www.tandfonline.com/doi/abs/10.1080/00207547308929950#.Up5q8MQW2Cl) of Quenouille’s Rapid Statistical Calculations (which I have never seen for sale anywhere) a slightly odd experience when the reviewer says that computers have made that sort of thing irrelevant!