Selling Data Science

9 May

Creating sales documents and pitches that list out all the shiny new things that our data science application can do is very tempting. We worked hard on those features and everyone will appreciate them, right?

Well, not really. For one, it’s very likely your target audience doesn’t have the technical ability to understand the point of what you’re selling. After all, if they had your technical skills, they wouldn’t be thinking of hiring a data science, they’d just be doing it themselves.

The next problem is that you can’t trust that the customer realises how your solution helps them out of their present predicament. Moreover, it’s disrespectful to get them to do your job for you. Hence, you need to make sure your pitch joins the dots between what you intend to do for the customer and how it’s going to make their life easier.

In sales parlance this is known as ‘selling the benefits’ – that is, making it clear to the potential customer how buying your product will improve their lives, and has been encapsulated in the phrase ‘nobody wants to buy a bed – they want a good night’s sleep’.The rub is that in most data science scenarios the problem that corresponds to the potential benefit is a business problem – such as reduced inventory or decreased cost of sales – rather than a human problem, such as a getting a good night’s sleep.

Therefore, being able to complete the journey from feature to benefit requires some knowledge of your customer’s business (whereas everyone knows the benefits of a good night sleep – and the horrors of not getting one – far fewer under the fine points of mattress springing and bed construction) and the ability to explain the links. This last is crucial, as the benefits of your work are too important to allow your customer an opportunity to miss them.

What all this means in the end is that the approach of inspecting data sets in the hope of finding ‘insights’ will often fail, and may border on being dangerous. Instead you need to start with what your customer is trying to achieve, what problems they are facing before seeing which problems correspond with data that can be used to build tools that can overcome the problem.

Timothy, Paul and Data Science

7 May

Like any other atheist who regularly attends an evangelical church, I often find myself wondering how to apply the sermon to my life. A recent example which seemed a little easier than other occasions was a sermon from a guest preacher on succession planning.

Part of the point for this preacher is that he’s a kind of mentor for a number of churches, so he traipses around Australia advising other pastors how to do things better – and also sees them failing, often for predictable reasons. Hence, when he spoke about succession, he was talking from experience.

Of course, succession planning isn’t specifically about churches. The phrase is more commonly heard in corporate settings. His solution to the problem – in the end a call to spread the Gospel, not all that surprisingly from an evangelical preacher – initially seemed one that had no application to the corporate world, but after a little reflection actually seemed very applicable.

By the pastor’s logic, the gospel was effectively the knowledge needed to participate in his religion. So, by extension, succession planning was about the transfer of knowledge. In a way, this is not a revolutionary idea – of course succession planning is about the transfer of knowledge of a working environment, customers, skills to get a job done.

But the emphasis is so often on the leader of an organisation and (too often, in both senses) his immediate reports. Hence the emphasis is on the knowledge that they will bring into the company, the skills in running a business they learnt elsewhere that they will aply to your company. It’s like judging future converts for the abilities they bring to a church from outside – their ability to speak in public, to be great fund raisers – rather than the pastor’s idea of succession planning through

The alternative is succession planning starting from the ground up – making succession planning being with the people make your product or provide your service, and the people who secure your customers. In a very real way they are your business. Certainly, in my own career in manufacturing, I’ve seen the results of failing to ensure knowledge is transferred from people who make the product to others. In short, when they retire, there are delays and defects as people attempt to re-discover the skills.

The previous blog post was about one way that skills can be transferred – the knowledge of a process can be converted into a computer program, with sensible commenting and documentation. Not a solution in itself, not the only alternative, but an extra tool that can be employed. From this perspective, the sermon was another way of seeing the big picture way that that tool can be employed in a Data Science setting.

R Packages for Managers

22 Apr

Roger Peng, in his e-text, ‘Mastering Data Science’, makes the off-hand comment to the effect that if you are going to do something twice in R, write a function, but if you’re going to do it three times write a package (actually he’s self-plagiarising from his own book, Executive Data Science, which I don’t have)

When writing about functions and packages in R, Peng advances several of the usual arguments in favour of their use, such as avoiding rework, creating more readable code etc. In my opinion just listing off those standard reasons undersells the benefits of creating functions and packages, especially in a corporate environment.

A huge challenge in a corporate environment is to convert employee knowledge and experience , in an environment where lack of time and sometimes people breeds a culture of getting things out the door quickly without pausing for reflection or, crucially, documentation. Hence, if an employee goes out the door, their knowledge and experience goes with them. Asking people to write packages which collect the processes they applied during a particular project keeps a substantial part of that knowledge inside the organisation.

The other undersold virtue of writing functions and packages is that it is an antidote to R turning into a command line environment rather than a software environment. That is, it moves users away from inputting strings of R commands, effectively making themselves part of the program, to writing something closer to conventional programs, though usually small ones.

In my own work, I see a particular opening for moving activity toward functions and packages as we try to sell the same idea to three different potential customers, involving a similar process of providing customised (i.e. to the potential customer’s data set) toy examples before doing similar work across each customer’s data set. With three being Peng’s threshold where I need to create packages (and there will likely be some rework for individual customers, e.g. the same task performed on different date ranges), I seem to be squarely in the category that needs to write packages.

Coding is not the answer (for every question)

23 Nov

There is a movement gathering steam at the moment with the aim of proliferating coding education, in itself a fine idea. Computers are everywhere, they are harder to detect than before and people need to know when people are using computers to game them – understanding a little bit of computer science is somewhere between very helpful to essential for these things – Defense Against the Dark Arts for the contemporary developed world.

Somewhere out of this movement has emerged a second movement proclaiming that teaching coding will teach people to think – seemingly an insufficient number of people were thinking until programmers started banding together to enlighten us.

Yes, part of my objection is to the slightly condescending way these people relate to the rest of us rather than their actual arguments, but there I still have objections to the content of their argument as well.They mainly stem from the fact that they’re arguing for programming as a way of teaching thinking as though other ways of learning to think were not available. In point of fact, the notion of teaching is at least as old as Socratic philosophy, and exists in a wide variety of forms, from Western and non-Western perspectives.

Sometimes coding proponents go as far as to suggest that coding is an ideal to learn maths or logic. Maybe they have a point about logic – formal logic studies maybe too esoteric for a lot of tastes.

On the other to recommend programming as a way of learning maths is kind of odd. You can only learn maths by learning maths. Natural aptitude for maths is highly correlated with natural aptitude for programming – it’s hard to imagine those weak at maths will have an easy time in programming.The strong ones will learn whichever they spend time on – either way time away from maths coding is just time away from maths.

This last is the crux of it – the proponents of coding in schools discuss the idea as though they are several hours per week of fallow time up for grabs. There are not. Something else has to go to make room for time spent coding. My personal guess is that most of the proponents of the coding in school idea are thinking of something in the humanities rather than a science or maths subject (although at least one self-identified software developer commenting on another blog wanted to reduce arithmetic teaching in schools – as if our society wasn’t innumerate enough!). I’ve read a number of data scientist cvs  – if I could change the education of that group of people, I’d be taking coding out and putting English lit in.

MOOCs and Mathematics Self-Learning

21 Oct

Over time, this blog has morphed from being about actuarial self-learning to being more about mathematics and statistics self learning, reflecting my personal career peregrinations. I kind of hope that such readers as there are not too turned off – to me it seems there is a lot of crossover. It also seems, at least from reading forums for actuarial learners (weak evidence, so feel free to provide your own counterpoint), that insufficient mathematics is at the root of a lot of difficulties that actuarial students discover along the way.

I intend to soon write a blog piece about my attempts to learn linear algebra and group theory via the slightly indirect route of learning about symmetry groups in crystallography, but today I just want to make a quick observation – the MOOC revolution isn’t coming, at least not yet, at least for mathematics.

2013 appeared to be the year of the MOOC. There were new MOOCs springing up all the time, in an ever wider array of subjects. Today it seems like the revolution has stalled – looking at Class Central, the MOOC aggregator, there are only six entries under the heading ‘Mathematics and Statistics’ (Recently Started or Starting Soon – there were also 17 courses in progress, of which 5 were non english, and two were kind of maths meta-courses a la ‘How to learn maths’), of which three are in Languages other than English. At one stage, MathBabe forecast that MOOC offers in the maths subjects most associated with non-mathematicians – single and multivariable calculus, elementary linear algebra mostly, probably also intro stats for non-statisticians, would put tertiary maths departments out of a job.

To me, unless there is growth in the courses offered – including at least a selection of the standard undergrad maths major subjects (so far, no English language abstract algebra or number theory), MOOCs, for better or worse, just aren’t going to take over the world of teaching, or even be a supplement for students beyond first year.

Self Learning Mathematics

9 Sep

The theme of this  blog has always been self learning – we started at self learning actuarial studies, have dabbled in self learning predictive modelling and now we are looking at self (re) learning mathematics, in order for a deeper push into predictive modelling and statistics.

I was reminded of the self-learning angle the other day when I stumbled across this blog:

http://latinandgreekselftaught.blogspot.com.au/2011/05/teaching-yourself-latin-and-greek.html

which charts the adventures of a gentleman self-re-learning the Latin and Ancient Greek he learnt up to the point he left tertiary education now that his time in the workforce has ended.

We have in common that there is an element of dishonesty in calling this ‘self-learning’ – this blogger above left tertiary education with an enviable grasp of the languages, helped by lecturers at uni and probably his high school teachers. He wasn’t going to stumble because the ablative case was too weird to understand, or become disheartened by deponent verbs.

In my case, I am re-learning some material that I have seen before and some other material that I haven’t seen before, and next year hope to take Linear Algebra, Abstract Algebra and Number Theory courses as non-award subjects to make sure that I have learned that material correctly.

Compared to the blogger, at least I have the advantage that where I have seen the material, it is only four or five years rather than 35 or 40 years since I worked with it. At the same time, half the motivation is to study some branches of mathematics that I think I should have studied before taking somewhat more advanced studies – Linear Algebra especially, which is obviously a foundation of statistics and spectral analysis.

My current foray into re-learning linear algebra is being supported by Serge Lang’s Introduction to Undergraduate Linear Algebra, which seems to have a terrible reputation among Amazon reviewers and commenters on places like math stack exchange. I think the reason is that the pace is fairly brisk. 

For my own part, I find the brevity a little bit refreshing, even when I am looking at stuff I have never seen before (or at least have no memory of seeing before!) The best part, is the portability which allows me to put it in a coat pocket, and take wherever I am going (some not true of the calculus text I used, by Anton Bivens Davis). Despite its brevity, it also seems to get to material which is advanced enough for my purposes – just short of the lecture notes for the course I plan to do next year, without the distracting ‘matrix operation’ notation and covering just about all of the same topics within the subject.

I also mentioned before that I had taken some more advanced studies in statistics and spectral analysis than my command of Linear Algebra ought to have allowed – it is certainly pleasurable to have various puzzles and obstacles of past studies resolved, although frustrating in the sense that I could have done better at the time with just a smidgen more Linear Algebra knowledge at my fingertips.

Lattice R Users Group Talk

14 Aug

Last night I gave a talk on the Lattice package of R, held together by the idea that Lattice is an expression of Bill Cleveland’s overall philosophy of visualisation. I don’t know that I put my argument very clearly, but I think the fact of having an argument made the talk a tiny bit less incoherent!

After the talk there was some discussion of a few things – the use of colour schemes for one, but also two questions I didn’t have answers to, although I made a couple of totally wrong guesses!

Question 1: Can you include numeric scale on Lattice trivariate functions (and how)?

Answer – Yes, there is a scale argument, which must be included as a list.

Hence, given you have your surface dataframe pre-prepared:

wireframe(z~x*y, data=your.surface.data, scales=list(z=list(arrows=FALSE, distance =1)))

Question 2: Can you use the print() function to arrange graphics objects produced from multiple R graphics packages?

So far as I can tell, the answer is a qualified ‘yes’, where the qualification is that you need to be working with a graphics package which produces a storeable graphics object – lattice obviously does, and it looks like ggplot2 does also. Another package I selected at random, vcd, does not, however.