Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Hands-On Python Natural Language Processing Explore tools and techniques to analyze and process text with a view to building real-world NLP applications

Product type Paperback

Published in Jun 2020

Publisher Packt

ISBN-13 9781838989590

Length 316 pages

Edition 1st Edition

Languages

Processing

Tools

NumPy

Concepts

Mobile Application Development

Authors (2):

Mayank Rasu

Aman Kedia

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction

2. Understanding the Basics of NLP FREE CHAPTER

3. NLP Using Python

4. Section 2: Natural Language Representation and Mathematics

5. Building Your NLP Vocabulary

6. Transforming Text into Data Structures

7. Word Embeddings and Distance Measurements for Text

8. Exploring Sentence-, Document-, and Character-Level Embeddings

9. Section 3: NLP and Learning

10. Identifying Patterns in Text Using Machine Learning

11. From Human Neurons to Artificial Neurons for Understanding Text

12. Applying Convolutions to Text

13. Capturing Temporal Relationships in Text

14. State of the Art in NLP

15. Other Books You May Enjoy

Leave a review - let other readers know what you think

Exploring the Bag-of-Words architecture

A very intuitive approach to representing a document is to use the frequency of the words in that particular document. This is exactly what is done as part of the BoW approach.

In Chapter 3, Building Your NLP Vocabulary, we saw how it is possible to build a vocabulary based on a list of sentences. The vocabulary-building step comes as a prerequisite to the BoW methodology. Once the vocabulary is available, each sentence can be represented as a vector. The length of this vector would be equal to the size of the vocabulary. Each entry in the vector would correspond to a term in the vocabulary, and the number in that particular entry would be the frequency of the term in the sentence under consideration. The lower limit for this number would be 0, indicating that the vocabulary term does not occur in the sentence concerned.

What would be the upper limit for the entry in the vector?

Think!

Well, that could possibly be the frequency of the occurrence...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Kedia

Aman Kedia is a data enthusiast and lifelong learner. He is an avid believer in Artificial Intelligence (AI) and the algorithms supporting it. He has worked on state-of-the-art problems in Natural Language Processing (NLP), encompassing resume matching and digital assistants, among others. He has worked at Oracle and SAP, trying to solve problems leveraging advancements in AI. He has four published research papers in the domain of AI.

See other products by Kedia

Rasu

Mayank Rasu is the author of the book Hands-On Natural Language Processing with Python. He has more than 12 years of global experience as a data scientist and quantitative analyst in the investment banking domain. He has worked at the intersection of finance and technology and has developed and deployed AI-based applications in the finance domain, which include sentiment analyzer, robotics process automation, and deep learning-based document reviewers. Mayank is also an educator and has trained/mentored working professionals on applied AI.

See other products by Rasu