Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
IBM SPSS Modeler Cookbook

You're reading from   IBM SPSS Modeler Cookbook If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Arrow left icon
Product type Paperback
Published in Oct 2013
Publisher Packt
ISBN-13 9781849685467
Length 382 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Toc

Table of Contents (11) Chapters Close

Preface 1. Data Understanding FREE CHAPTER 2. Data Preparation – Select 3. Data Preparation – Clean 4. Data Preparation – Construct 5. Data Preparation – Integrate and Format 6. Selecting and Building a Model 7. Modeling – Assessment, Evaluation, Deployment, and Monitoring 8. CLEM Scripting A. Business Understanding Index

Using CHAID stumps when interviewing an SME

In this recipe we will learn how to use the interactive mode of the CHAID Modeling node to explore data. The name stump comes from the idea that we grow just one branch and stop. The exploration will have the goal of answering five questions:

  1. What variables seem predictive of the target?
  2. Do the most predictive variables make sense?
  3. What questions are most useful to pose to the Subject Matter Experts (SMEs) about data quality?
  4. What is the potential value of the favorite variables of the SMEs?
  5. What missing data challenges are present in the data?

Getting ready

We will start with a blank stream.

How to do it...

To use CHAID stumps:

  1. Add a Source node to the stream for the cup98lrn reduced vars2.txt file. Ensure that the field delimiter is Tab and that the Strip lead and trail spaces option is set to Both.
  2. Add a Type node and declare TARGET_B as flag and as the target. Set TARGET_D, RFA_2, RFA_2A, and RFA_2F, RFA_2R to None.
    How to do it...
  3. Add a CHAID Modeling node and make sure that it is in interactive mode.
    How to do it...
  4. Click on the Run button. From the menus choose Tree | Grow Branch with Custom Split. Then click on the Predictors button.
    How to do it...
  5. Allow the top variable, LASTGIFT, to form a branch. Note that LASTGIFT does not seem to have missing values.
    How to do it...
  6. Further down the list, the RAMNT_ series variables do have missing values. Placing the mouse on the root node (Node 0) choose Tree | Grow Branch with Custom Split again.
    How to do it...
  7. The figure shows RAMNT_8, but your results may differ somewhat as CHAIDtakes an internal partition and therefore does not use all of the data. The slight differences can change the ranking of similar variables. Allow the branch to grow on your selected variable.
    How to do it...
  8. Now we will break away the missing data into its own category. Repeat the steps leading up to this branch, but before clicking on the Grow button, select Custom and at the bottom, set Missing values into as Separate Node.
    How to do it...
  9. Sometimes SMEs will have a particular interest in a variable because it has been known to be valuable in the past, or they are invested in the variable in some way. Even though it is well down the list, choose the variable Wealth2 and force it to branch while ensuring that missing values are placed into a Separate node.
    How to do it...

How it works...

There are several advantages to exploring data in this way with CHAID. If you have accidentally included perfect predictors it will become obvious in a hurry. This recipe is dedicated to this phenomenon. Another advantage is that most SMEs find CHAID rather intuitive. It is easy to see what the relationships are without extensive exposure to the technique. Meanwhile, as an added benefit, the SMEs are becoming acquainted with a technique that might also be used during the modeling phase. As we have seen, CHAID can show missing data as a Separate node. This feature is shown to be useful in the Binning scale variables to address missing data recipe in Chapter 3, Data Preparation – Clean. By staying in interactive mode, the trees are kept simple; also, we can force any variable to branch even if it is not near the top of the list. Often SMEs can be quite adamant that a variable is important, while the data shows them otherwise. There are countless reasons why this might be the case, and the conversation should be allowed to unfold. One is likely to learn a great deal trying to figure out why a variable that seemed promising is not performing well in the CHAID model.

Let's examine the CHAID tree a bit more closely. The root node shows the total sample size and the percentage in each of the two categories. In the figures in this recipe, the red group is the donors group. Notice that the more recent their LASTGIFT was, the more likely that they donated. Starting with 8.286 percent for the less than or equal to 9 group, dropping down to 3.476 percent for the less than 19 group. Note that when you add up the child nodes, you get the same number as the number in the root node.

How it works...

It is recommended that you take a screenshot of at least the top 10 or so variables of interest to management or SMEs. It is a good precaution to place the images on slides, since you will be able to review and discuss without waiting for Modeler to process. Having said that, it is an excellent idea to be ready to further explore the data using this technique on live data during the meeting.

See also

  • The Using the Feature Selection node creatively to remove or decapitate perfect predictors recipe in Chapter 2, Data Preparation – Select
  • The Binning scale variables to address missing data recipe in Chapter 3, Data Preparation – Clean
You have been reading a chapter from
IBM SPSS Modeler Cookbook
Published in: Oct 2013
Publisher: Packt
ISBN-13: 9781849685467
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image