Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Machine Learning By Example

You're reading from   Python Machine Learning By Example The easiest way to get into machine learning

Arrow left icon
Product type Paperback
Published in May 2017
Publisher Packt
ISBN-13 9781783553112
Length 254 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Yuxi (Hayden) Liu Yuxi (Hayden) Liu
Author Profile Icon Yuxi (Hayden) Liu
Yuxi (Hayden) Liu
Ivan Idris Ivan Idris
Author Profile Icon Ivan Idris
Ivan Idris
Arrow right icon
View More author details
Toc

Table of Contents (9) Chapters Close

Preface 1. Getting Started with Python and Machine Learning 2. Exploring the 20 Newsgroups Dataset with Text Analysis Algorithms FREE CHAPTER 3. Spam Email Detection with Naive Bayes 4. News Topic Classification with Support Vector Machine 5. Click-Through Prediction with Tree-Based Algorithms 6. Click-Through Prediction with Logistic Regression 7. Stock Price Prediction with Regression Algorithms 8. Best Practices

Generalizing with data

The good thing about data is that we have a lot of data in the world. The bad thing is that it is hard to process this data. The challenges stem from the diversity and noisiness of the data. We as humans, usually process data coming in our ears and eyes. These inputs are transformed into electrical or chemical signals. On a very basic level, computers and robots also work with electrical signals. These electrical signals are then translated into ones and zeroes. However, we program in Python in this book, and on that level normally we represent the data either as numbers, images, or text. Actually images and text are not very convenient, so we need to transform images and text into numerical values.

Especially in the context of supervised learning we have a scenario similar to studying for an exam. We have a set of practice questions and the actual exams. We should be able to answer exam questions without knowing the answers for them. This is called generalization—we learn something from our practice questions and hopefully are able to apply this knowledge to other similar questions. In machine learning, these practice questions are called training sets or training samples. They are where the models derive patterns from. And the actual exams are testing sets or testing samples. They are where the models are eventually applied and how compatible they are is what it's all about. Sometimes between practice questions and actual exams, we have mock exams to assess how well we will do in actual ones and to aid revision. These mock exams are called validation sets or validation samples in machine learning. They help us verify how well the models will perform in a simulated setting then we fine-tune the models accordingly in order to achieve greater hits.

An old-fashioned programmer would talk to a business analyst or other expert, then implement a rule that adds a certain value multiplied by another value corresponding, for instance, to tax rules. In a machine learning setting we give the computer example input values and example output values. Or if we are more ambitious, we can feed the program the actual tax texts and let the machine process the data further just like an autonomous car doesn't need a lot of human input.

This means implicitly that there is some function, for instance, a tax formula we are trying to figure out. In physics we have almost the same situation. We want to know how the universe works and formulate laws in a mathematical language. Since we don't know the actual function, all we can do is measure what error is produced, and try to minimize it. In supervised learning tasks we compare our results against the expected values. In unsupervised learning we measure our success with related metrics. For instance, we want clusters of data to be well defined, the metrics could be how similar the data points within one cluster are and how different the data points from two clusters are. In reinforcement learning, a program evaluates its moves, for example, in a chess game using some predefined function.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image