Tag Archives: Data-mining

Some Thoughts on CRISP-DM (a response)

The always-thoughtful Alex Oftelie at Bentz Whalley Flessner recently posted about the Cross Industry Standard Process for Data Mining (CRISP-DM) and wondered how often it is truly realized. His ultimate question: “Is [CRISP-DM] a firm road map to successful data-mining, or does it suggest merely an outline of processes that is malleable?”

I’m not sure how others feel, but I have a hard time thinking of the process as “firm” or unmalleable, and there are certainly a couple of reasons why this is so.

As its name indicates, CRISP-DM is a standard: it provides a good starting framework. It is generally well-accepted, and seems to have a solid reputation as a common best practice. But like any standard or best practice, it will be most effective if we adapt it to fit the existing situation and business need. This flexibility allows us to capitalize on our capacity for creativity.

Just this weekend, I ran across some great thoughts on standards in Show Me the Numbers by Stephen Few. Chapter 12 is entitled “The Interplay of Standards and Innovation,” and in it Few notes “A good set of standards … provides a framework for innovation in the face of important … challenges. Standards and innovation are partners in the pursuit of excellence.” There may certainly be good reasons to bend the rules of CRISP-DM, especially in the pursuit of creative, effective innovation.

As I thought through this, I also was brought back to my early music theory days in college. In the first term of music theory coursework, we were subjected to loads of “rules” about voice-leading and harmonization and the like. In reality, these weren’t really rules about how music must be written, but they were guidelines: if you wanted your compositions to sound like certain Classical composers, you would be wise to observe the guidelines! Really excellent compositions in the Classical style often demonstrate a clear understanding of these concepts, while still taking liberties to skirt them for the sake of creativity.

Similarly, as we work on analytics projects, we do ourselves a favor by understanding CRISP-DM and where we are within that construct, but allowing ourselves the flexibility to veer from the established path to get to where we need to go.


When models and intuition collide…

“All models are wrong, but some are useful.”

This is a quote (from George E.P. Box, I believe) that I use pretty much any time I’m talking about our statistical models that I create for our major and planned gift fundraising programs. It’s nice in that it covers my rear and makes clear the fact that, while our models are helpful and do point us in the right direction, they are not foolproof, and we’re going to get some false positives and false negatives from time to time.

I found myself reiterating the quote again today as I defended my models to one of our frontline fundraisers.

We had found a couple of prospects who had scored well in the modeling process, and they came to our attention on a wealth screening. From a numerical perspective, these folks looked great! They were predicted to be major donors, and we confirmed that they likely had the assets to be able to make big gifts.

But the problem was the fact that, while they looked great on paper, each of them had none of the hallmark intuitive indicators of a good prospect. In fact, many of their attributes raised red flags for this particular gift officer:

  • poor (if any) giving history;
  • non-grads;
  • had attended few (if any) college events; and
  • one of them had even let the College know  back in ’97 that he wasn’t too fond of us.

This gift officer had a pretty hard time wrapping his head around the idea that he should even try to see this person.

I did my best to sell the model: “It takes a lot of non-intuitive things into consideration, so it catches things we’d never think of!” and “We do have some major donors who match some of these criteria, so it’s not entirely out of the question that he could be a major donor!” But ultimately, I know he wasn’t convinced.

So I was left in an unresolved quandary this afternoon: what do we do about people who score very high on predictive models, but who look terrible according to all the traditional “good prospect” attributes? We can’t just write them off, because then we’re essentially throwing out the model because it doesn’t match our fundraising paradigm. (And this is precisely one of the key benefits of statistical modeling: it brings to our attention those people we wouldn’t think to find on our own.)

On the other hand, don’t instinct and experience play a role in the prospecting process? I truly think there is a place for both science and art in prospect research, so shouldn’t we embrace this notion and let this prospect get vetoed, so we avoid wasting staff time and energy on someone who probably won’t pan out? (And for the record, if it wasn’t for the models, I’d discount these prospect entirely – they had few redeeming qualities as potential donors. This gift officer had a pretty good point.)

I’m not sure what my conclusion is about it yet, but I feel that I lean towards at least giving these ugly duckling prospects a shot. Sure, I’m not the guy who has to make the cold call or sit in these folks’ living rooms, but it seems like we should at least make a respectable effort to give it a shot.

Now if I can just convince this gift officer…

The best fundraising analytics/modeling blog I’ve seen…

I’m always on the lookout for good resources to inform and improve the work I do, especially when it comes to prospect research, analytics, predictive modeling, and fundraising. Generally, it’s hard to say there’s a lot out there that deals with all of these topics. You can find things on prospect research; boatloads of people seem to write about fundraising; analytics and predictive modeling are mushrooming in a number of sectors — and so is the writing about them. But there don’t seem to be a whole lot of people writing about analytics in fundraising, so I was really pleasantly surprised to stumble onto Kevin MacDonell’s blog, “CoolData.”

Kevin clearly knows his stuff when it comes to analytics and statistical techniques. He covers a wealth of topics, and while he claims to be a “non-expert,” his thorough understanding of the nearly-always-complex subject matter is exemplified by how well he explains things in his posts. (Kevin was a journalism major. I’m guessing that, based on the quality of my own writing here, you can tell that I was not.)

As any CRISP-DM process model enthusiast will attest, a quality predictive modeling project requires good business understanding and good data understanding. By extension, I would opine that any quality fundraising analytics professional will also possess solid understanding of both the business (fundraising) and the data and the techniques required to deal with that data. By FURTHER extension, the best fundraising analytics resources also get to the heart of good business understanding and good data understanding. CoolData definitely hits the mark where this is concerned.

Kevin appears to have started CoolData a mere six months ago, and he has already populated the blog with LOTS of great posts. I look forward to seeing what else CoolData will cover.