Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Python Natural Language Processing
Python Natural Language Processing

Python Natural Language Processing: Advanced machine learning and deep learning techniques for natural language processing

eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Python Natural Language Processing

Practical Understanding of a Corpus and Dataset

In this chapter, we'll explore the first building block of natural language processing. We are going to cover the following topics to get a practical understanding of a corpus or dataset:

  • What is corpus?
  • Why do we need corpus?
  • Understanding corpus analysis
  • Understanding types of data attributes
  • Exploring different file formats of datasets
  • Resources for access free corpus
  • Preparing datasets for NLP applications
  • Developing the web scrapping application

What is a corpus?

Natural language processing related applications are built using a huge amount of data. In layman's terms, you can say that a large collection of data is called corpus. So, more formally and technically, corpus can be defined as follows:

Corpus is a collection of written or spoken natural language material, stored on computer, and used to find out how language is used. So more precisely, a corpus is a systematic computerized collection of authentic language that is used for linguistic analysis as well as corpus analysis. If you have more than one corpus, it is called corpora.

In order to develop NLP applications, we need corpus that is written or spoken natural language material. We use this material or data as input data and try to find out the facts that can help us develop NLP applications. Sometimes, NLP applications use a single corpus as the input...

Why do we need a corpus?

In any NLP application, we need data or corpus to building NLP tools and applications. A corpus is the most critical and basic building block of any NLP-related application. It provides us with quantitative data that is used to build NLP applications. We can also use some part of the data to test and challenge our ideas and intuitions about the language. Corpus plays a very big role in NLP applications. Challenges regarding creating a corpus for NLP applications are as follows:

  • Deciding the type of data we need in order to solve the problem statement
  • Availability of data
  • Quality of the data
  • Adequacy of the data in terms of amount

Now you may want to know the details of all the preceding questions; for that, I will take an example that can help you to understand all the previous points easily. Consider that you want to make an NLP tool that understands...

Understanding corpus analysis

In this section, we will first understand what corpus analysis is. After this, we will briefly touch upon speech analysis. We will also understand how we can analyze text corpus for different NLP applications. At the end, we will do some practical corpus analysis for text corpus. Let's begin!

Corpus analysis can be defined as a methodology for pursuing in-depth investigations of linguistic concepts as grounded in the context of authentic and communicative situations. Here, we are talking about the digitally stored language corpora, which is made available for access, retrieval, and analysis via computer.

Corpus analysis for speech data needs the analysis of phonetic understanding of each of the data instances. Apart from phonetic analysis, we also need to do conversation analysis, which gives us an idea of how social interaction happens in day...

Understanding types of data attributes

Now let's focus on what kind of data attributes can appear in the corpus. Figure 2.3 provides you with details about the different types of data attributes:

Figure 2.3: Types of data attributes

I want to give some examples of the different types of corpora. The examples are generalized, so you guys can understand the different type of data attributes.

Categorical or qualitative data attributes

Categorical or qualitative data attributes are as follows:

  • These kinds of data attributes are more descriptive
  • Examples are our written notes, corpora provided by nltk, a corpus that has recorded different types of breeds of dogs, such as collie, shepherd, and terrier

There are two sub-types...

Exploring different file formats for corpora

Corpora can be in many different formats. In practice, we can use the following file formats. All these file formats are generally used to store features, which we will feed into our machine learning algorithms later. Practical stuff regarding dealing with the following file formats will be incorporated from Chapter 4, Preprocessing onward. Following are the aforementioned file formats:

  • .txt: This format is basically given to us as a raw dataset. The gutenberg corpus is one of the example corpora. Some of the real-life applications have parallel corpora. Suppose you want to make Grammarly a kind of grammar correction software, then you will need a parallel corpus.
  • .csv: This kind of file format is generally given to us if we are participating in some hackathons or on Kaggle. We use this file format to save our features, which we will...

Resources for accessing free corpora

Getting the corpus is a challenging task, but in this section, I will provide you with some of the links from which you can download a free corpus and use it to build NLP applications.

The nltk library provides some inbuilt corpus. To list down all the corpus names, execute the following commands:

    import nltk.corpus
    dir(nltk.corpus) # Python shell
    print dir(nltk.corpus) # Pycharm IDE syntax
  

In Figure 2.2, you can see the output of the preceding code; the highlighted part indicates the name of the corpora that are already installed:

Figure 2.2: List of all available corpora in nltk
If you guys want to use IDE to develop an NLP application using Python, you can use the PyCharm community version. You can follow its installation steps by clicking on the following URL: https://github.com/jalajthanaki/NLPython/blob/master/ch2/Pycharm_installation_guide...

Preparing a dataset for NLP applications

In this section, we will look at the basic steps that can help you prepare a dataset for NLP or any data science applications. There are basically three steps for preparing your dataset, given as follows:

  • Selecting data
  • Preprocessing data
  • Transforming data

Selecting data

Suppose you are working with world tech giants such as Google, Apple, Facebook, and so on. Then you could easily get a large amount of data, but if you are not working with giants and instead doing independent research or learning some NLP concepts, then how and from where can you get a dataset? First, decide what kind of dataset you need as per the NLP application that you want to develop. Also, consider the end...

Web scraping

To develop a web scraping tool, we can use libraries such as beautifulsoup and scrapy. Here, I'm giving some of the basic code for web scraping.

Take a look at the code snippet in Figure 2.6, which is used to develop a basic web scraper using beautifulsoup:

Figure 2.6: Basic web scraper tool using beautifulsoup

The following Figure 2.7 demonstrates the output:

Figure 2.7: Output of basic web scraper using beautifulsoup

You can find the installation guide for beautifulsoup and scrapy at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/Chapter_2_Installation_Commands.txt.

You can find the code at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/2_2_Basic_webscraping_byusing_beautifulsuop.py.

If you get any warning while running the script, it will be fine; don't worry about warnings.

Now, let's do some web scraping...

Summary

In this chapter, we saw that a corpus is the basic building block for NLP applications. We also got an idea about the different types of corpora and their data attributes. We touched upon the practical analysis aspects of a corpus. We used the nltk API to make corpus analysis easy.

In the next chapter, we will address the basic and effective aspects of natural language using linguistic concepts such as parts of speech, lexical items, and tokenization, which will further help us in preprocessing and feature engineering.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement Machine Learning and Deep Learning techniques for efficient natural language processing
  • Get started with NLTK and implement NLP in your applications with ease
  • Understand and interpret human languages with the power of text analysis via Python

Description

This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.

Who is this book for?

This book is intended for Python developers who wish to start with natural language processing and want to make their applications smarter by implementing NLP in them.

What you will learn

  • Focus on Python programming paradigms, which are used to develop NLP applications
  • Understand corpus analysis and different types of data attribute.
  • Learn NLP using Python libraries such as NLTK, Polyglot, SpaCy, Standford CoreNLP and so on
  • Learn about Features Extraction and Feature selection as part of Features Engineering.
  • Explore the advantages of vectorization in Deep Learning.
  • Get a better understanding of the architecture of a rule-based system.
  • Optimize and fine-tune Supervised and Unsupervised Machine Learning algorithms for NLP problems.
  • Identify Deep Learning techniques for Natural Language Processing and Natural Language Generation problems.
Estimated delivery fee Deliver to Malaysia

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 31, 2017
Length: 486 pages
Edition : 1st
Language : English
ISBN-13 : 9781787121423
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Malaysia

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Publication date : Jul 31, 2017
Length: 486 pages
Edition : 1st
Language : English
ISBN-13 : 9781787121423
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 159.97
Python Natural Language Processing
$54.99
Python Machine Learning, Second Edition
$43.99
Python Deep Learning
$60.99
Total $ 159.97 Stars icon
Banner background image

Table of Contents

12 Chapters
Introduction Chevron down icon Chevron up icon
Practical Understanding of a Corpus and Dataset Chevron down icon Chevron up icon
Understanding the Structure of a Sentences Chevron down icon Chevron up icon
Preprocessing Chevron down icon Chevron up icon
Feature Engineering and NLP Algorithms Chevron down icon Chevron up icon
Advanced Feature Engineering and NLP Algorithms Chevron down icon Chevron up icon
Rule-Based System for NLP Chevron down icon Chevron up icon
Machine Learning for NLP Problems Chevron down icon Chevron up icon
Deep Learning for NLU and NLG Problems Chevron down icon Chevron up icon
Advanced Tools Chevron down icon Chevron up icon
How to Improve Your NLP Skills Chevron down icon Chevron up icon
Installation Guide Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6
(5 Ratings)
5 star 60%
4 star 0%
3 star 0%
2 star 20%
1 star 20%
Mattia May 13, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
La cost migliore è is fatto Che non vine date nulla per scontato. L’autore è in grado di spezzettare i contenuti in modo tale da rendere la lettura piacevole e scorrevole. Codici utilissimi.
Amazon Verified review Amazon
Amazon Customer Jan 13, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Just loved this book. It makes very few assumptions about the reader in terms of background and quick start guide if you have basics of programming clear. Loved the way the content is structured and the effort to explain things in simple terms. Most examples are so relatable that its make the understanding of concepts very clear. Would have loved if a chapter int he beginning was dedicated to some important terms in natural language processing making it even more simple to a newbie to connect faster.For anybody who wants to understand NLP and has basic programming skills, this is the book to read. Loved it!
Amazon Verified review Amazon
pavan May 02, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
great book. helps a lot in gaining good knowledge on NLP techniques and how to implement in python
Amazon Verified review Amazon
Liang Yi Jan 21, 2018
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The book is very unreadable. There are many mistakes. The author wrote many useless stuff whereas explained not enough on important things like parser, NER. I am disappointed.
Amazon Verified review Amazon
N. Vadulam Feb 23, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
This book uses Python 2.7. It is obsolete, even though the publication date is shown as 2017.Look elsewhere.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela