Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Python Machine Learning By Example
Python Machine Learning By Example

Python Machine Learning By Example: The easiest way to get into machine learning

Arrow left icon
Profile Icon Idris Profile Icon Yuxi (Hayden) Liu
Arrow right icon
€28.99 €41.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (30 Ratings)
Paperback May 2017 254 pages 1st Edition
eBook
€22.99 €32.99
Paperback
€28.99 €41.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Idris Profile Icon Yuxi (Hayden) Liu
Arrow right icon
€28.99 €41.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3 (30 Ratings)
Paperback May 2017 254 pages 1st Edition
eBook
€22.99 €32.99
Paperback
€28.99 €41.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€22.99 €32.99
Paperback
€28.99 €41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Python Machine Learning By Example

Exploring the 20 Newsgroups Dataset with Text Analysis Algorithms

We went through a bunch of fundamental machine learning concepts in the last chapter. We learned them along with analogies the fun way, such as studying for the exams, designing driving schedule, and so on. As promised, starting from this chapter as the second step of our learning journal, we will be discovering in detail several import machine learning algorithms and techniques. Beyond analogies, we will be exposed to and will solve real-world examples, which makes our journal more interesting. We start with a classic natural language processing problem--newsgroups topic modeling in this chapter. We will gain hands-on experience in working with text data, especially how to convert words and phrases into machine-readable values. We will be tackling the project in an unsupervised learning manner, using clustering algorithms, including k-means clustering...

What is NLP?

The 20 newsgroup dataset is composed of text, taken from news articles as its name implies. It was originally collected by Ken Lang, and is now widely used for experiments in text applications of machine learning techniques, specifically natural language processing techniques.

Natural language processing (NLP) is a significant subfield of machine learning, which deals with the interactions between machine (computer) and human (natural) languages. Natural languages are not limited to speech and conversation. They can be in writing and sign languages as well. The data for NLP tasks can be in different forms, for example, text from social media posts, web pages, even medical prescription, audio from voice mail, commands to control systems, even a favorite music or movie. Nowadays, NLP has been broadly involved in our daily lives: we can not live without machine translation; weather forecast scripts are...

Touring powerful NLP libraries in Python

After a short list of real-world applications of NLP, we will be touring the essential stack of Python NLP libraries in this chapter. These packages handle a wide range of NLP tasks as mentioned above as well as others such as sentiment analysis, text classification, named entity recognition, and many more.

The most famous NLP libraries in Python include Natural Language Toolkit (NLTK), Gensim and TextBlob. The scikit-learn library also has NLP related features. NLTK (http://www.nltk.org/) was originally developed for education purposes and is now being widely used in industries as well. There is a saying that you can't talk about NLP without mentioning NLTK. It is the most famous and leading platform for building Python-based NLP applications. We can install it simply by running the sudo pip install -U nltk command in Terminal.

NLTK comes with over 50 collections of...

The newsgroups data

The first project in this book is about the 20 newsgroups dataset found in scikit-learn. The data contains approximately 20,000 across 20 online newsgroups. A newsgroup is a place on the Internet where you can ask and answer questions about a certain topic. The data is already split into training and test sets. The cutoff point is at a certain date. The original data comes from http://qwone.com/~jason/20Newsgroups/. 20 different newsgroups are listed as follows:

  • comp.graphics
  • comp.os.ms-windows.misc
  • comp.sys.ibm.pc.hardware
  • comp.sys.mac.hardware
  • comp.windows.x
  • rec.autos
  • rec.motorcycles
  • rec.sport.baseball
  • rec.sport.hockey
  • sci.crypt
  • sci.electronics
  • sci.med
  • sci.space
  • misc.forsale
  • talk.politics.misc
  • talk.politics.guns
  • talk.politics.mideast
  • talk.religion.misc
  • alt.atheism
  • soc.religion.christian

All the documents in the dataset are in English. And from the newsgroup names, you can deduce the topics...

Getting the data

It is possible to download the data manually from the original website or many online repositories. However, there are also many versions of the dataset--some are cleaned in a certain way and some in the raw form. To avoid confusion, it is best to use a consistent acquisition method. The scikit-learn library provides a utility function of loading the dataset.Once the dataset is downloaded, it is automatically cached. We won’t need to download the same dataset twice. In most cases, caching the dataset, especially for a relatively small one, is considered a good practice. Other Python libraries also support download utilities, but not all of them implement automatic caching. This is another reason why we love scikit-learn.

To load the data, we can import the loader function for the 20 newsgroups data as follows:

>>> from sklearn.datasets import fetch_20newsgroups  

Then we can download...

Thinking about features

After we download the 20 newsgroups by whatever means we prefer, the data object called groups is now available in the program. The data object is in the form of key-value dictionary. Its keys are as follows:

>>> groups.keys()
dict_keys(['description', 'target_names', 'target', 'filenames',
'DESCR', 'data'])

The target_names key gives the newsgroups names:

>>> groups['target_names']
['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space&apos...

Visualization

It's good to visualize to get a general idea of how the data is structured, what possible issues may arise, and if there are any irregularities that we have to take care of.

In the context of multiple topics or categories, it is important to know what the distribution of topics is. A uniform class distribution is the easiest to deal with because there are no under-represented or over-represented categories. However, we frequently have a skewed distribution with one or more categories dominating. We herein use the seaborn package (https://seaborn.pydata.org/) to compute the histogram of categories and plot it utilizing the matplotlib package (https://matplotlib.org/). We can install both packages via pip. Now let’s display the distribution of the classes as follows:

>>> import seaborn as sns
>>> sns.distplot(groups.target)
<matplotlib.axes._subplots.AxesSubplot object...

Data preprocessing

We see items, which are obviously not words, such as 00 and 000. Maybe we should ignore items that contain only digits. However, 0d and 0t are also not words. We also see items as __, so maybe we should only allow items that consist only of letters. The posts contain names such as andrew as well. We can filter names with the Names corpus from NLTK we just worked with. Of course, with every filtering we apply, we have to make sure that we don't lose information. Finally, we see words that are very similar, such as try and trying, and word and words.

We have two basic strategies to deal words from the same root--stemming and lemmatization. Stemming is the more quick and dirty type approach. It involves chopping, if necessary, off letters, for example, 'words' becomes 'word' after stemming. The result of stemming doesn't have to be a valid word. Lemmatizing, on the other...

Clustering

Clustering divides a dataset into clusters. This is an unsupervised learning task since we typically don't have any labels. In the most realistic cases, the complexity is so high that we are not able to find the best division in clusters; however, we can usually find a decent approximation. The clustering analysis task requires a distance function, which indicates how close items are to each other. A common distance is Euclidean distance, which is the distance as a bird flies. Another common distance is taxicab distance, which measures distance in city blocks. Clustering was first used in the 1930s by social science researchers without modern computers.

Clustering can be hard or soft. In hard clustering, an item belongs to only to a cluster, while in soft clustering, an item can belong to multiple clusters with varying probabilities. In this book, I have used only the hard clustering method.

We can...

Topic modeling

Topics in natural language processing don't exactly match the dictionary definition and correspond to more of a nebulous statistical concept. We speak of topic models and probability distributions of words linked to topics, as we know them. When we read a text, we expect certain words appearing in the title or the body of the text to capture the semantic context of the document. An article about Python programming will have words such as class and function, while a story about snakes will have words such as eggs and afraid. Documents usually have multiple topics, for instance, this recipe is about topic models and non-negative matrix factorization, which we will discuss shortly. We can, therefore, define an additive model for topics by assigning different weights to topics.

One of the topic modeling algorithms is non-negative matrix factorization (NMF). This algorithm factorizes a matrix into...

Summary

In this chapter, we acquired the fundamental concepts of NLP as an important subfield in machine learning, including tokenization, stemming and lemmatization, POS tagging. We also explored three powerful NLP packages and realized some common tasks using NLTK. Then we continued with the main project newsgroups topic modeling. We started with extracting features with tokenization techniques as well as stemming and lemmatization. We then went through clustering and implementations of k-means clustering and non-negative matrix factorization for topic modeling. We gained hands-on experience in working with text data and tackling topic modeling problems in an unsupervised learning manner. We briefly mentioned the corpora resources available in NLTK. It would be a great idea to apply what we've learned on some of the corpora. What topics can you extract from the Shakespeare corpus?

...
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn the fundamentals of machine learning and build your own intelligent applications
  • Master the art of building your own machine learning systems with this example-based practical guide
  • Work with important classification and regression algorithms and other machine learning techniques

Description

Data science and machine learning are some of the top buzzwords in the technical world today. A resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. This book is your entry point to machine learning. This book starts with an introduction to machine learning and the Python language and shows you how to complete the setup. Moving ahead, you will learn all the important concepts such as, exploratory data analysis, data preprocessing, feature extraction, data visualization and clustering, classification, regression and model performance evaluation. With the help of various projects included, you will find it intriguing to acquire the mechanics of several important machine learning algorithms – they are no more obscure as they thought. Also, you will be guided step by step to build your own models from scratch. Toward the end, you will gather a broad picture of the machine learning ecosystem and best practices of applying machine learning techniques. Through this book, you will learn to tackle data-driven problems and implement your solutions with the powerful yet simple language, Python. Interesting and easy-to-follow examples, to name some, news topic classification, spam email detection, online ad click-through prediction, stock prices forecast, will keep you glued till you reach your goal.

Who is this book for?

This book is for anyone interested in entering the data science stream with machine learning. Basic familiarity with Python is assumed.

What you will learn

  • • Exploit the power of Python to handle data extraction, manipulation, and exploration techniques
  • • Use Python to visualize data spread across multiple dimensions and extract useful features
  • • Dive deep into the world of analytics to predict situations correctly
  • • Implement machine learning classification and regression algorithms from scratch in Python
  • • Be amazed to see the algorithms in action
  • • Evaluate the performance of a machine learning model and optimize it
  • • Solve interesting real-world problems using machine learning and Python as the journey unfolds
Estimated delivery fee Deliver to Italy

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 31, 2017
Length: 254 pages
Edition : 1st
Language : English
ISBN-13 : 9781783553112
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Italy

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Publication date : May 31, 2017
Length: 254 pages
Edition : 1st
Language : English
ISBN-13 : 9781783553112
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 103.97 116.97 13.00 saved
Artificial Intelligence with Python
€41.99
Python Machine Learning, Second Edition
€32.99
Python Machine Learning By Example
€28.99 €41.99
Total 103.97 116.97 13.00 saved Stars icon
Banner background image

Table of Contents

8 Chapters
Getting Started with Python and Machine Learning Chevron down icon Chevron up icon
Exploring the 20 Newsgroups Dataset with Text Analysis Algorithms Chevron down icon Chevron up icon
Spam Email Detection with Naive Bayes Chevron down icon Chevron up icon
News Topic Classification with Support Vector Machine Chevron down icon Chevron up icon
Click-Through Prediction with Tree-Based Algorithms Chevron down icon Chevron up icon
Click-Through Prediction with Logistic Regression Chevron down icon Chevron up icon
Stock Price Prediction with Regression Algorithms Chevron down icon Chevron up icon
Best Practices Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(30 Ratings)
5 star 70%
4 star 13.3%
3 star 3.3%
2 star 3.3%
1 star 10%
Filter icon Filter
Top Reviews

Filter reviews by




Durga Prasad Pattanayak Nov 20, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is too good for readers who has the knowledge of python and want to learn Machine learning.
Amazon Verified review Amazon
Amazon Customer Oct 26, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I started reading the book after I got it and already in love with it. What a book this is!
Amazon Verified review Amazon
GRaj Sep 23, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Very good book.
Amazon Verified review Amazon
AMIT KUMAR Sep 05, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book for someone starting to explore Machine Learning.The author's own personal experience in ML is penned into this wonderful book.Make no mistake, it has plenty of codes to support the theory.
Amazon Verified review Amazon
Amazon Customer Oct 14, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
an example is good and easy to explain
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela