Analytics, predictive analytics, and data visualization
In January 2006, Thomas H. Davenport, a well-known American academic author, published an article in Harvard Business Review called Competing on Analytics. In this article, the author explains the need for analytics in this way:
"Organizations are competing on analytics not just because they can—business today is awash in data and data crunchers—but also because they should. At a time when firms in many industries offer similar products and use comparable technologies, business processes are among the last remaining points of differentiation. And analytics competitors wring every last drop of value from those processes."
After this article, companies in different industries started to learn how to use traditional and new data sources to gain competitive advantages; but what is analytics?
Today, the term analytics is used to describe different techniques and methods that extract new knowledge from data and communicate it. The term comprises statistics, data mining, machine learning, operations research, data visualization, and many other areas.
An important point is that analytics will not provide any new value or advantage by itself; it will help people to take better decisions. Analytics is about replacing decisions based on feelings and intuition with decisions based on data and evidence.
Predictive analytics is a subset of analytics whose objective is to extract knowledge from data and use it to predict something. Eric Siegel in his book Predictive Analytics describes the term as:
"Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions."
Generally, in real life, an accurate prediction is not possible, but we can extract a lot of value from predictions with low accuracy. Think of an insurance company, they have a lot of claims to review, but have just a few people to do it. They know that some claims are fraudulent, but they don't have enough people and time to review all claims. They can randomly choose some claims or they can develop a system that selects the claims with a higher probability of fraud. If their system predictions are better than just guessing, they will improve their fraud detecting efforts and they will save a lot of money in fraudulent claims.
As we've seen, everything is about helping people to take better decisions; for this reason we've got to communicate the insights we've discovered from data in an easy to understand and intuitive way, especially when we deal with complex problems. Data visualization can help us to communicate our discoveries to our users. The term, data visualization, is used in many disciplines with many different meanings. We use this term to describe the visual representation of data; our main goal is to communicate information clearly and efficiently to business users.
In this introduction, we've used the term value many times and it's important to have an intuitive definition. We develop software solutions to obtain a business benefit; generally, we want to increase income or reduce cost. This business benefit has an economic value; the difference between this economic value and the cost of developing the solution is the value you will obtain.
Usually, a predictive analytics project follows some common steps that we call the predictive analytics process:
- Problem definition: Before we start, we need to understand the business problem and the goals.
- Extract and load data: An analytics application starts with raw data that is stored in a database, files, or other systems. We need to extract data from its original location and load it into our analytics tools.
- Prepare data: Sometimes, the data needs transformation because of its format or because of poor quality.
- Create a model: In this step, we will develop the predictive model.
- Performance evaluation: After creating the model, we'll evaluate its performance.
- Deploy the model and create a visual application: In the last step, we will deploy the predictive model and create the application for the business user.
The steps in this process don't have strict borders; sometimes, we go back and forth in the process.