Some Thoughts on CRISP-DM (a response)

The always-thoughtful Alex Oftelie at Bentz Whalley Flessner recently posted about the Cross Industry Standard Process for Data Mining (CRISP-DM) and wondered how often it is truly realized. His ultimate question: “Is [CRISP-DM] a firm road map to successful data-mining, or does it suggest merely an outline of processes that is malleable?”

I’m not sure how others feel, but I have a hard time thinking of the process as “firm” or unmalleable, and there are certainly a couple of reasons why this is so.

As its name indicates, CRISP-DM is a standard: it provides a good starting framework. It is generally well-accepted, and seems to have a solid reputation as a common best practice. But like any standard or best practice, it will be most effective if we adapt it to fit the existing situation and business need. This flexibility allows us to capitalize on our capacity for creativity.

Just this weekend, I ran across some great thoughts on standards in Show Me the Numbers by Stephen Few. Chapter 12 is entitled “The Interplay of Standards and Innovation,” and in it Few notes “A good set of standards … provides a framework for innovation in the face of important … challenges. Standards and innovation are partners in the pursuit of excellence.” There may certainly be good reasons to bend the rules of CRISP-DM, especially in the pursuit of creative, effective innovation.

As I thought through this, I also was brought back to my early music theory days in college. In the first term of music theory coursework, we were subjected to loads of “rules” about voice-leading and harmonization and the like. In reality, these weren’t really rules about how music must be written, but they were guidelines: if you wanted your compositions to sound like certain Classical composers, you would be wise to observe the guidelines! Really excellent compositions in the Classical style often demonstrate a clear understanding of these concepts, while still taking liberties to skirt them for the sake of creativity.

Similarly, as we work on analytics projects, we do ourselves a favor by understanding CRISP-DM and where we are within that construct, but allowing ourselves the flexibility to veer from the established path to get to where we need to go.


2 responses to “Some Thoughts on CRISP-DM (a response)

  1. David Bramantoro

    Nice thoughts. But do you have any good and complete examples to show CRISP-DM is really powerful for both “classical” and “modern” problem of data mining?

  2. Probably not – my experience with CRISP-DM is largely tangential and theoretical. I’m more interested in the broader idea of applying flexible creativity to rules/guidelines so that we can do better work.

    That said, I am curious how you would define the “‘classical’ and ‘modern’ problem of data mining.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s