Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Natural Language Processing with PyTorch 1.x

You're reading from   Hands-On Natural Language Processing with PyTorch 1.x Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Arrow left icon
Product type Paperback
Published in Jul 2020
Publisher Packt
ISBN-13 9781789802740
Length 276 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Thomas Dop Thomas Dop
Author Profile Icon Thomas Dop
Thomas Dop
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Section 1: Essentials of PyTorch 1.x for NLP
2. Chapter 1: Fundamentals of Machine Learning and Deep Learning FREE CHAPTER 3. Chapter 2: Getting Started with PyTorch 1.x for NLP 4. Section 2: Fundamentals of Natural Language Processing
5. Chapter 3: NLP and Text Embeddings 6. Chapter 4: Text Preprocessing, Stemming, and Lemmatization 7. Section 3: Real-World NLP Applications Using PyTorch 1.x
8. Chapter 5: Recurrent Neural Networks and Sentiment Analysis 9. Chapter 6: Convolutional Neural Networks for Text Classification 10. Chapter 7: Text Translation Using Sequence-to-Sequence Neural Networks 11. Chapter 8: Building a Chatbot Using Attention-Based Neural Networks 12. Chapter 9: The Road Ahead 13. Other Books You May Enjoy

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Textual data can be gathered from a number of different sources and takes many different forms. Text can be tidy and readable or raw and messy and can also come in many different styles and formats. Being able to preprocess this data so that it can be converted into a standard format before it reaches our NLP models is what we'll be looking at in this chapter.

Stemming and lemmatization, similar to tokenization, are other forms of NLP preprocessing. However, unlike tokenization, which reduces a document into individual words, stemming and lemmatization are attempts to reduce these words further to their lexical roots. For example, almost any verb in English has many different variations, depending on tense:

He jumped

He is jumping

He jumps

While all these words are different, they all relate to the same root word – jump. Stemming and lemmatization are both techniques we can use to reduce word variations...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image