Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Natural Language Processing with Java Cookbook Over 70 recipes to create linguistic and language translation applications using Java libraries

Product type Paperback

Published in Apr 2019

Publisher Packt

ISBN-13 9781789801156

Length 386 pages

Edition 1st Edition

Languages

Java

Tools

Deeplearning4j

Concepts

Mobile Application Development

Authors (2):

Richard M. Reese

Richard M Reese

View More author details

Table of Contents (14) Chapters

Preface

1. Preparing Text for Analysis and Tokenization FREE CHAPTER

2. Isolating Sentences within a Document

3. Performing Name Entity Recognition

4. Detecting POS Using Neural Networks

5. Performing Text Classification

6. Finding Relationships within Text

7. Language Identification and Translation

8. Identifying Semantic Similarities within Text

9. Common Text Processing and Generation Tasks

10. Extracting Data for Use in NLP Analysis

11. Creating a Chatbot

12. Installation and Configuration

13. Other Books You May Enjoy

Leave a review - let other readers know what you think

Extracting Data for Use in NLP Analysis

Most NLP tasks are concerned with the analysis of data. In this chapter, we will illustrate several approaches to acquiring data from multiple sources. This includes processing data from an HTML page and PDF, Word, and Excel documents. Each of these techniques involves connecting to a data source and then extracting the data from that source. For complex documents, such as Wikipedia articles or a Word document, we will be faced with choices in terms of what type of data we want to retrieve.

For example, with an HTML document, we may be interested in the actual text and possibly the HTML markup. For a document containing a table of contents, we may want to process that information separately. To extract text form a Wikipedia article, we treat it as an HTML document.

These recipes are an introduction to the topic. Most of these data sources...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Richard M Reese

Richard M. Reese has worked in both industry and academia. For 17 years, he worked in the telephone and aerospace industries, serving in several capacities, including research and development, software development, supervision, and training. He currently teaches at Tarleton State University, where he has the opportunity to apply his years of industry experience to enhance his teaching. Richard has written several Java books and a C pointer book. He uses a concise and easy-to-follow approach to the topics at hand. His Java books have addressed EJB 3.1, updates to Java 7 and 8, certification, jMonkeyEngine, natural language processing, functional programming, networks, and data science.

See other products by Richard M Reese

Richard M. Reese

Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in Java

See other products by Richard M. Reese