Packt+ | Advance your knowledge in tech

You're reading from IBM SPSS Modeler Essentials Effective techniques for building powerful data mining and predictive analytics solutions

Product type Paperback

Published in Dec 2017

Publisher Packt

ISBN-13 9781788291118

Length 238 pages

Edition 1st Edition

Tools

IBM SPSS

Concepts

Data Mining

Authors (2):

Keith McCormick

Jesus Salcedo

View More author details

Table of Contents (19) Chapters

Title Page

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Customer Feedback

Dedication

Preface

1. Introduction to Data Mining and Predictive Analytics

2. The Basics of Using IBM SPSS Modeler FREE CHAPTER

3. Importing Data into Modeler

4. Data Quality and Exploration

5. Cleaning and Selecting Data

6. Combining Data Files

7. Deriving New Fields

8. Looking for Relationships Between Fields

9. Introduction to Modeling Options in IBM SPSS Modeler

10. Decision Tree Models

11. Model Assessment and Scoring

Introduction to data mining

In this chapter, we will place IBM SPSS Modeler and its use in a broader context. Modeler was developed as a tool to perform data mining. Although the phrase predictive analytics is more common now, when Modeler was first developed in the 1990s, this type of analytics was almost universally called data mining. The use of the phrase data mining has evolved a bit since then to emphasize the exploratory aspect, especially in the context of big data and sometimes with a particular emphasis on the mining of private data that has been collected. This will not be our use of the term. Data mining can be defined in the following way:

Data mining is the search of data, accumulated during the normal course of doing business, in order to find and confirm the existence of previously unknown relationships that can produce positive and verifiable outcomes through the deployment of predictive models when applied to new data.

Several points are worth emphasizing:

The data is not new
The data that can solve the problem was not collected solely to perform data mining
The data miner is not testing known relationships (neither hypotheses nor hunches) against the data
The patterns must be verifiable
The resulting models must be capable of something useful
The resulting models must actually work when deployed on new data

In the late 1990s, a process was developed called the Cross Industry Standard Process for Data Mining (CRISP-DM). We will be drawing heavily from that tradition in this chapter, and CRISP-DM can be a powerful way to organize your work in Modeler. It is because of our use of this process in organizing this book's material that prompts us to use the term data mining. It is worth noting that the team that first developed Modeler, originally called Clementine, and the team that wrote CRISP-DM have some members in common.