Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Python Natural Language Processing
Python Natural Language Processing

Python Natural Language Processing: Advanced machine learning and deep learning techniques for natural language processing

Arrow left icon
Profile Icon Jalaj Thanaki
Arrow right icon
zł59.99 zł177.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (5 Ratings)
eBook Jul 2017 486 pages 1st Edition
eBook
zł59.99 zł177.99
Paperback
zł221.99
Subscription
Free Trial
Arrow left icon
Profile Icon Jalaj Thanaki
Arrow right icon
zł59.99 zł177.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (5 Ratings)
eBook Jul 2017 486 pages 1st Edition
eBook
zł59.99 zł177.99
Paperback
zł221.99
Subscription
Free Trial
eBook
zł59.99 zł177.99
Paperback
zł221.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Python Natural Language Processing

Practical Understanding of a Corpus and Dataset

In this chapter, we'll explore the first building block of natural language processing. We are going to cover the following topics to get a practical understanding of a corpus or dataset:

  • What is corpus?
  • Why do we need corpus?
  • Understanding corpus analysis
  • Understanding types of data attributes
  • Exploring different file formats of datasets
  • Resources for access free corpus
  • Preparing datasets for NLP applications
  • Developing the web scrapping application

What is a corpus?

Natural language processing related applications are built using a huge amount of data. In layman's terms, you can say that a large collection of data is called corpus. So, more formally and technically, corpus can be defined as follows:

Corpus is a collection of written or spoken natural language material, stored on computer, and used to find out how language is used. So more precisely, a corpus is a systematic computerized collection of authentic language that is used for linguistic analysis as well as corpus analysis. If you have more than one corpus, it is called corpora.

In order to develop NLP applications, we need corpus that is written or spoken natural language material. We use this material or data as input data and try to find out the facts that can help us develop NLP applications. Sometimes, NLP applications use a single corpus as the input...

Why do we need a corpus?

In any NLP application, we need data or corpus to building NLP tools and applications. A corpus is the most critical and basic building block of any NLP-related application. It provides us with quantitative data that is used to build NLP applications. We can also use some part of the data to test and challenge our ideas and intuitions about the language. Corpus plays a very big role in NLP applications. Challenges regarding creating a corpus for NLP applications are as follows:

  • Deciding the type of data we need in order to solve the problem statement
  • Availability of data
  • Quality of the data
  • Adequacy of the data in terms of amount

Now you may want to know the details of all the preceding questions; for that, I will take an example that can help you to understand all the previous points easily. Consider that you want to make an NLP tool that understands...

Understanding corpus analysis

In this section, we will first understand what corpus analysis is. After this, we will briefly touch upon speech analysis. We will also understand how we can analyze text corpus for different NLP applications. At the end, we will do some practical corpus analysis for text corpus. Let's begin!

Corpus analysis can be defined as a methodology for pursuing in-depth investigations of linguistic concepts as grounded in the context of authentic and communicative situations. Here, we are talking about the digitally stored language corpora, which is made available for access, retrieval, and analysis via computer.

Corpus analysis for speech data needs the analysis of phonetic understanding of each of the data instances. Apart from phonetic analysis, we also need to do conversation analysis, which gives us an idea of how social interaction happens in day...

Understanding types of data attributes

Now let's focus on what kind of data attributes can appear in the corpus. Figure 2.3 provides you with details about the different types of data attributes:

Figure 2.3: Types of data attributes

I want to give some examples of the different types of corpora. The examples are generalized, so you guys can understand the different type of data attributes.

Categorical or qualitative data attributes

Categorical or qualitative data attributes are as follows:

  • These kinds of data attributes are more descriptive
  • Examples are our written notes, corpora provided by nltk, a corpus that has recorded different types of breeds of dogs, such as collie, shepherd, and terrier

There are two sub-types...

Exploring different file formats for corpora

Corpora can be in many different formats. In practice, we can use the following file formats. All these file formats are generally used to store features, which we will feed into our machine learning algorithms later. Practical stuff regarding dealing with the following file formats will be incorporated from Chapter 4, Preprocessing onward. Following are the aforementioned file formats:

  • .txt: This format is basically given to us as a raw dataset. The gutenberg corpus is one of the example corpora. Some of the real-life applications have parallel corpora. Suppose you want to make Grammarly a kind of grammar correction software, then you will need a parallel corpus.
  • .csv: This kind of file format is generally given to us if we are participating in some hackathons or on Kaggle. We use this file format to save our features, which we will...

Resources for accessing free corpora

Getting the corpus is a challenging task, but in this section, I will provide you with some of the links from which you can download a free corpus and use it to build NLP applications.

The nltk library provides some inbuilt corpus. To list down all the corpus names, execute the following commands:

    import nltk.corpus
    dir(nltk.corpus) # Python shell
    print dir(nltk.corpus) # Pycharm IDE syntax
  

In Figure 2.2, you can see the output of the preceding code; the highlighted part indicates the name of the corpora that are already installed:

Figure 2.2: List of all available corpora in nltk
If you guys want to use IDE to develop an NLP application using Python, you can use the PyCharm community version. You can follow its installation steps by clicking on the following URL: https://github.com/jalajthanaki/NLPython/blob/master/ch2/Pycharm_installation_guide...

Preparing a dataset for NLP applications

In this section, we will look at the basic steps that can help you prepare a dataset for NLP or any data science applications. There are basically three steps for preparing your dataset, given as follows:

  • Selecting data
  • Preprocessing data
  • Transforming data

Selecting data

Suppose you are working with world tech giants such as Google, Apple, Facebook, and so on. Then you could easily get a large amount of data, but if you are not working with giants and instead doing independent research or learning some NLP concepts, then how and from where can you get a dataset? First, decide what kind of dataset you need as per the NLP application that you want to develop. Also, consider the end...

Web scraping

To develop a web scraping tool, we can use libraries such as beautifulsoup and scrapy. Here, I'm giving some of the basic code for web scraping.

Take a look at the code snippet in Figure 2.6, which is used to develop a basic web scraper using beautifulsoup:

Figure 2.6: Basic web scraper tool using beautifulsoup

The following Figure 2.7 demonstrates the output:

Figure 2.7: Output of basic web scraper using beautifulsoup

You can find the installation guide for beautifulsoup and scrapy at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/Chapter_2_Installation_Commands.txt.

You can find the code at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/2_2_Basic_webscraping_byusing_beautifulsuop.py.

If you get any warning while running the script, it will be fine; don't worry about warnings.

Now, let's do some web scraping...

Summary

In this chapter, we saw that a corpus is the basic building block for NLP applications. We also got an idea about the different types of corpora and their data attributes. We touched upon the practical analysis aspects of a corpus. We used the nltk API to make corpus analysis easy.

In the next chapter, we will address the basic and effective aspects of natural language using linguistic concepts such as parts of speech, lexical items, and tokenization, which will further help us in preprocessing and feature engineering.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement Machine Learning and Deep Learning techniques for efficient natural language processing
  • Get started with NLTK and implement NLP in your applications with ease
  • Understand and interpret human languages with the power of text analysis via Python

Description

This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.

Who is this book for?

This book is intended for Python developers who wish to start with natural language processing and want to make their applications smarter by implementing NLP in them.

What you will learn

  • Focus on Python programming paradigms, which are used to develop NLP applications
  • Understand corpus analysis and different types of data attribute.
  • Learn NLP using Python libraries such as NLTK, Polyglot, SpaCy, Standford CoreNLP and so on
  • Learn about Features Extraction and Feature selection as part of Features Engineering.
  • Explore the advantages of vectorization in Deep Learning.
  • Get a better understanding of the architecture of a rule-based system.
  • Optimize and fine-tune Supervised and Unsupervised Machine Learning algorithms for NLP problems.
  • Identify Deep Learning techniques for Natural Language Processing and Natural Language Generation problems.

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 31, 2017
Length: 486 pages
Edition : 1st
Language : English
ISBN-13 : 9781787285521
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 31, 2017
Length: 486 pages
Edition : 1st
Language : English
ISBN-13 : 9781787285521
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just zł20 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 646.97
Python Deep Learning
zł246.99
Python Machine Learning, Second Edition
zł177.99
Python Natural Language Processing
zł221.99
Total 646.97 Stars icon
Banner background image

Table of Contents

12 Chapters
Introduction Chevron down icon Chevron up icon
Practical Understanding of a Corpus and Dataset Chevron down icon Chevron up icon
Understanding the Structure of a Sentences Chevron down icon Chevron up icon
Preprocessing Chevron down icon Chevron up icon
Feature Engineering and NLP Algorithms Chevron down icon Chevron up icon
Advanced Feature Engineering and NLP Algorithms Chevron down icon Chevron up icon
Rule-Based System for NLP Chevron down icon Chevron up icon
Machine Learning for NLP Problems Chevron down icon Chevron up icon
Deep Learning for NLU and NLG Problems Chevron down icon Chevron up icon
Advanced Tools Chevron down icon Chevron up icon
How to Improve Your NLP Skills Chevron down icon Chevron up icon
Installation Guide Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6
(5 Ratings)
5 star 60%
4 star 0%
3 star 0%
2 star 20%
1 star 20%
Mattia May 13, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
La cost migliore è is fatto Che non vine date nulla per scontato. L’autore è in grado di spezzettare i contenuti in modo tale da rendere la lettura piacevole e scorrevole. Codici utilissimi.
Amazon Verified review Amazon
Amazon Customer Jan 13, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Just loved this book. It makes very few assumptions about the reader in terms of background and quick start guide if you have basics of programming clear. Loved the way the content is structured and the effort to explain things in simple terms. Most examples are so relatable that its make the understanding of concepts very clear. Would have loved if a chapter int he beginning was dedicated to some important terms in natural language processing making it even more simple to a newbie to connect faster.For anybody who wants to understand NLP and has basic programming skills, this is the book to read. Loved it!
Amazon Verified review Amazon
pavan May 02, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
great book. helps a lot in gaining good knowledge on NLP techniques and how to implement in python
Amazon Verified review Amazon
Liang Yi Jan 21, 2018
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The book is very unreadable. There are many mistakes. The author wrote many useless stuff whereas explained not enough on important things like parser, NER. I am disappointed.
Amazon Verified review Amazon
N. Vadulam Feb 23, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
This book uses Python 2.7. It is obsolete, even though the publication date is shown as 2017.Look elsewhere.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.