Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
IBM SPSS Modeler Essentials

You're reading from   IBM SPSS Modeler Essentials Effective techniques for building powerful data mining and predictive analytics solutions

Arrow left icon
Product type Paperback
Published in Dec 2017
Publisher Packt
ISBN-13 9781788291118
Length 238 pages
Edition 1st Edition
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Keith McCormick Keith McCormick
Author Profile Icon Keith McCormick
Keith McCormick
Jesus Salcedo Jesus Salcedo
Author Profile Icon Jesus Salcedo
Jesus Salcedo
Arrow right icon
View More author details
Toc

The data mining process (as a case study)


As Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will illustrate, there are many different types of data mining projects. For example, you may wish to create customer segments based on products purchased or service usage, so that you can develop targeted advertising campaigns. Or you may want to determine where to better position products in your store, based on customer purchase patterns. Or you may want to predict which students will drop out of school, so that you can provide additional services before this happens.

In this book, we will be using a dataset where we are trying to predict which people have incomes above or below $50,000. We may be trying to do this because we know that people with incomes above $50,000 are much more likely to purchase our products, given that previous work found that income was the most important predictor regarding product purchase. The point is that regardless of the actual data that we are using, the principles that we will be showing apply to an infinite number of data mining problems; whether you are trying to determine which customers will purchase a product, or when you will need to replace an elevator, or how many hotels rooms will be booked on a given date, or what additional complications might occur during surgery, and so on.

As was mentioned previously, Modeler supports the entire data mining process. The figure shown next illustrates exactly how Modeler can be used to compartmentalize each aspect of the CRISP-DM process model:

In Chapter 2The Basics of Using IBM SPSS Modeler, you will become familiar with the Modeler graphic user interface. In this chapter, we will be using screenshots to illustrate how Modeler represents various data mining activities. Therefore the following figures in this chapter are just providing an overview of how different tasks will look within Modeler, so for the moment do not worry about how each image was created, since you will see exactly how to create each of these in later chapters.

First and foremost, every data mining project will need to begin with well-defined business objectives. This is crucial for determining what you are trying to accomplish or learn from a project, and how to translate this into data mining goals. Once this is done, you will need to assess the current business situation and develop a project plan that is reasonable given the data and time constraints.

Once business and data mining objectives are well defined, you will need to collect the appropriate data. Chapter 3, Importing Data into Modeler will focus on how to bring data into Modeler. Remember that data mining typically uses data that was collected during the normal course of doing business, therefore it is going to be crucial that the data you are using can really address the business and data mining goals:

Once you have data, it is very important to describe and assess its quality. Chapter 4Data Quality and Exploration will focus on how to assess data quality using the Data Audit node:

Once the Data Understanding phase has been completed, it is time to move on to the Data Preparation phase. The Data Preparation phase is by far the most time consuming and creative part of a data mining project. This is because, as was mentioned previously, we are using data that was collected during the normal course of doing business, therefore the data will not be clean, it will have errors, it will include information that is not relevant, it will have to be restructured into an appropriate format, and you will need to create many new variables that extract important information. Thus, due to the importance of this phase, we have devoted several chapters to addressing these issues. Chapter 5Cleaning and Selecting Data will focus on how to select the appropriate cases, by using the Select node, and how to clean data by using the Distinct and Reclassify nodes:

Chapter 6, Combining Data Files will continue to focus on the Data Preparation phase by using both the Append and Merge nodes to integrate various data files:

Finally, Chapter 7Deriving New Fields will focus on constructing additional fields by using the Derive node:

At this point we will be ready to begin exploring relationships within the data. In Chapter 8Looking for Relationships Between Fields we will use the Distribution, Matrix, Histogram, Means, Plot, and Statistics nodes to uncover and understand simple relationships between variables:

Once the Data Preparation phase has been completed, we will move on to the Modeling phase. Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will introduce the various types of models available in Modeler and then provide an overview of the predictive models. It will also discuss how to select a modeling technique. Chapter 10Decision Tree Models will cover the theory behind decision tree models and focus specifically on how to build a CHAID model. We will also use a Partition node to generate a test design; this is extremely important because only through replication can we determine whether we have a verifiable pattern:

Chapter 11Model Assessment and Scoring is the final chapter in this book and it will provide readers with the opportunity to assess and compare models using the Analysis node. The Evaluation node will also be introduced as a way to evaluate model results:

Finally, we will spend some time discussing how to score new data and export those results to another application using the Flat File node:

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image