Archive | R RSS feed for this section

R Packages for Managers

22 Apr

Roger Peng, in his e-text, ‘Mastering Data Science’, makes the off-hand comment to the effect that if you are going to do something twice in R, write a function, but if you’re going to do it three times write a package (actually he’s self-plagiarising from his own book, Executive Data Science, which I don’t have)

When writing about functions and packages in R, Peng advances several of the usual arguments in favour of their use, such as avoiding rework, creating more readable code etc. In my opinion just listing off those standard reasons undersells the benefits of creating functions and packages, especially in a corporate environment.

A huge challenge in a corporate environment is to convert employee knowledge and experience , in an environment where lack of time and sometimes people breeds a culture of getting things out the door quickly without pausing for reflection or, crucially, documentation. Hence, if an employee goes out the door, their knowledge and experience goes with them. Asking people to write packages which collect the processes they applied during a particular project keeps a substantial part of that knowledge inside the organisation.

The other undersold virtue of writing functions and packages is that it is an antidote to R turning into a command line environment rather than a software environment. That is, it moves users away from inputting strings of R commands, effectively making themselves part of the program, to writing something closer to conventional programs, though usually small ones.

In my own work, I see a particular opening for moving activity toward functions and packages as we try to sell the same idea to three different potential customers, involving a similar process of providing customised (i.e. to the potential customer’s data set) toy examples before doing similar work across each customer’s data set. With three being Peng’s threshold where I need to create packages (and there will likely be some rework for individual customers, e.g. the same task performed on different date ranges), I seem to be squarely in the category that needs to write packages.

Lattice R Users Group Talk

14 Aug

Last night I gave a talk on the Lattice package of R, held together by the idea that Lattice is an expression of Bill Cleveland’s overall philosophy of visualisation. I don’t know that I put my argument very clearly, but I think the fact of having an argument made the talk a tiny bit less incoherent!

After the talk there was some discussion of a few things – the use of colour schemes for one, but also two questions I didn’t have answers to, although I made a couple of totally wrong guesses!

Question 1: Can you include numeric scale on Lattice trivariate functions (and how)?

Answer – Yes, there is a scale argument, which must be included as a list.

Hence, given you have your surface dataframe pre-prepared:

wireframe(z~x*y,, scales=list(z=list(arrows=FALSE, distance =1)))

Question 2: Can you use the print() function to arrange graphics objects produced from multiple R graphics packages?

So far as I can tell, the answer is a qualified ‘yes’, where the qualification is that you need to be working with a graphics package which produces a storeable graphics object – lattice obviously does, and it looks like ggplot2 does also. Another package I selected at random, vcd, does not, however.