Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Natural Language Processing with AWS AI Services
Natural Language Processing with AWS AI Services

Natural Language Processing with AWS AI Services: Derive strategic insights from unstructured data with Amazon Textract and Amazon Comprehend

Arrow left icon
Profile Icon M Profile Icon Premkumar Rangarajan
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (21 Ratings)
Paperback Nov 2021 508 pages 1st Edition
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial
Arrow left icon
Profile Icon M Profile Icon Premkumar Rangarajan
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (21 Ratings)
Paperback Nov 2021 508 pages 1st Edition
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial
eBook
Can$38.99 Can$55.99
Paperback
Can$69.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing with AWS AI Services

Chapter 1: NLP in the Business Context and Introduction to AWS AI Services

Natural language processing, or NLP, is quite popular in the scientific community, but the value of using this Artificial Intelligence (AI) technique to gain business benefits is not immediately obvious to mainstream users. Our focus will be to raise awareness and educate you on the business context of NLP, provide examples of the proliferation of data in unstructured text, and show how NLP can help derive meaningful insights to inform strategic decisions within an enterprise.

In this introductory chapter, we will be establishing the basic context to familiarize you with some of the underlying concepts of AI and Machine Learning (ML), the types of challenges that NLP can help solve, common pitfalls when building NLP solutions, and how NLP works and what it's really good at doing, with examples.

In this chapter, we will cover the following:

  • Introducing NLP
  • Overcoming the challenges in building NLP solutions
  • Understanding why NLP is becoming mainstream
  • Introducing the AWS ML stack

Introducing NLP

Language is as old as civilization itself and no other communication tool is as effective as the spoken or written word. In their childhood days, the authors were enamored with The Arabian Nights, a centuries-old collection of stories from India, Persia, and Arabia. In one famous story, Ali Baba and the Forty Thieves, Ali Baba is a poor man who discovers a thieves' den containing hordes of treasure hidden in a cave that can only be opened by saying the magic words open sesame. In the authors' experience, this was the first recollection of a voice-activated application. Though purely a work of fiction, it was indeed an inspiration to explore the art of the possible.

Recently, in the last two decades, the popularity of the internet and the proliferation of smart devices has fueled significant technological advancements in digital communications. In parallel, the long-running research to develop AI made rapid strides with the advent of ML. Arthur Lee Samuel was the first to coin the term machine learning, in 1959, and helped make it mainstream in the field of computer science by creating a checkers playing program that demonstrated how computers can be taught.

The concept that machines can be taught to mimic human cognition, though, was popularized a little earlier in 1950 by Alan Turing in his paper Computing Machinery and Intelligence. This paper introduced the Turing Test, a variation of a common party game of the time. The purpose of the test was for an interpreter to ask questions and compare responses from a human participant and a computer. The trick was that the interpreter was not aware which was which, considering all three were isolated in different rooms. If the interpreter was unable to differentiate the two participants because the responses matched closely, the Turing Test had successfully validated that the computer possessed AI.

Of course, the field of AI has progressed leaps and bounds since then, largely due to the success of ML algorithms in solving real-world problems. An algorithm, at its simplest, is a programmatic function that converts inputs to outputs based on conditions. In contradiction to regular programmable algorithms, ML algorithms have learned the ability to alter their processing based on the data they encounter. There are different ML algorithms to choose from based on requirements, for example, Extreme Gradient Boosting (XGBoost), a popular algorithm for regression and classification problems, Exponential Smoothing (ETS), for statistical time series forecasting, Single Shot MultiBox Detector (SSD), for computer vision problems, and Latent Dirichlet Allocation (LDA), for topic modeling in NLP problems.

For more complex problems, ML has evolved into deep learning with the introduction of Artificial Neural Networks (ANNs), which have the ability to solve highly challenging tasks by learning from massive volumes of data. For example, AWS DeepComposer (https://aws.amazon.com/deepcomposer/), an ML service from Amazon Web Services (AWS), educates developers with music as a medium of instruction. One of the ML models that DeepComposer uses is trained with a type of neural network called the Convolutional Neural Network (CNN) to create new and unique musical compositions from a simple input melody using AutoRegressive (AR) techniques:

Figure 1.1 – Composing music with AWS DeepComposer and ML

Figure 1.1 – Composing music with AWS DeepComposer and ML

A piano roll is an image representation of music, and AR-CNN considers music generation as a sequence of these piano roll images:

Figure 1.2 – Piano roll representation of music

Figure 1.2 – Piano roll representation of music

While there is broad adoption of ML across organizations of all sizes and industries spurred by the democratization of advanced technologies, the potential to solve many types of problems, and the breadth and depth of capabilities in AWS, ML is only a subset of what is possible today with AI. According to one report (https://www.gartner.com/en/newsroom/press-releases/2019-01-21-gartner-survey-shows-37-percent-of-organizations-have, accessed on March 23, 2021), AI adoption grew by 270% in the period 2015 to 2019. And it is continuing to grow at a rapid pace. AI is no longer a peripheral technology only available to those enterprises that have the economic resources to afford high-performance computers. Today, AI is a mainstream option for organizations looking to add cognitive intelligence to their applications to accelerate business value. For example, ExxonMobil in partnership with Amazon created an innovative and efficient way for customers to pay at gas stations. The Alexa pay for gas skill uses the car's Alexa-enabled device or your smartphone's Alexa app to communicate with the gas pump to manage the payment. The authors paid a visit to a local ExxonMobil gas station to try it out, and it was an awesome experience. For more details, please refer to https://www.exxon.com/en/amazon-alexa-pay-for-gas.

AI addresses a broad spectrum of tasks similar to human intelligence, both sensory and cognitive. Typically, these are grouped into categories, for example, computer vision (mimics human vision), NLP (mimics human speech, writing, and auditory processes), conversational interfaces (such as chatbots, mimics dialogue-based interactions), and personalization (mimics human intuition). For example, C-SPAN, a broadcaster that reports on proceedings at the US Senate and the House of Representatives, uses Amazon Rekognition (a computer vision-based image and video analysis service) to tag who is speaking/on camera at each time. With Amazon Rekognition, C-SPAN was able to index twice as much content compared to what they were doing previously. In addition, AWS offers AI services for intelligent search, forecasting, fraud detection, anomaly detection, predictive maintenance, and much more, which is why AWS was named the leader in the first Gartner Magic Quadrant for Cloud AI.

While language is inherently structured and well defined, the usage or interpretation of language is subjective, and may inadvertently cause an unintended influence that you need to be cognizant of when building natural language solutions. Consider, for example, the Telephone Game, which shows how conversations are involuntarily embellished, resulting in an entirely different version compared to how it began. Each participant repeats exactly what they think they heard, but not what they actually heard. It is fun when played as a party game but may have more serious repercussions in real life. Computers, too, will repeat what they heard, based on how their underlying ML model interprets language.

To understand how small incremental changes can completely change the meaning, let's look at another popular game: Word Ladder (https://en.wikipedia.org/wiki/Word_ladder). The objective is to convert one word into a different word, often one with the opposite meaning, in as few steps as possible with only one letter in the word changing in one step.

An example is illustrated in the following table:

Figure 1.3 – The Word Ladder game

Figure 1.3 – The Word Ladder game

Adapting AI to work with natural language resulted in a group of capabilities that primarily deal with computational emulation of cognitive and sensory processes associated with human speech and text. There are two main categories that applications can be grouped into:

  • Natural Language Understanding (NLU), for voice-based applications such as Amazon Alexa, and speech-to-text/text-to-speech conversions
  • NLP, for the interpretation of context-based insights from text

With NLU, applications that hitherto needed multiple and sometimes cumbersome interfaces, such as a screen, keyboard, and a mouse to enable computer-to-human interactions, can work as efficiently with just voice.

In Stanley Kubrick's 1968 movie 2001: A Space Odyssey, (spoiler alert!!) an artificially intelligent computer known as the HAL 9000 uses vision and voice to interact with the humans on board, and in the course of the movie, develops a personality, does not accept when it is in error, and attempts to kill the humans when it discovers their plot to shut it down. Fast forward to now, 20 years after the future depicted in the movie, and we have made significant progress in language understanding and processing, but not to the extreme extent of the dramatization of the plot elements used in the movie, thankfully.

Now that we have a good understanding of the context in which NLP has developed and how it can be used, let's try examining some of the common challenges you might face while developing NLP solutions.

Overcoming the challenges in building NLP solutions

We read earlier that the main difference between the algorithms used for regular programming and those used for ML is the ability of ML algorithms to modify their processing based on the input data fed to them. In the NLP context, as in other areas of ML, these differences add significant value and accelerate enterprise business outcomes. Consider, for example, a book publishing organization that needs to create an intelligent search capability displaying book recommendations to users based on topics of interest they enter.

In a traditional world, you would need multiple teams to go through the entire book collection, read books individually, identify keywords, phrases, topics, and other relevant information, create an index to associate book titles, authors, and genres to these keywords, and link this with the search capability. This is a massive effort that takes months or years to set up based on the size of the collection, the number of people, and their skill levels, and the accuracy of the index is prone to human error. As books are updated to newer editions, and new books are added or removed, this effort would have to be repeated incrementally. This is also a significant cost and time investment that may deter many unless that time and those resources have already been budgeted for.

To bring in a semblance of automation in our previous example, we need the ability to digitize text from documents. However, this is not the only requirement, as we are interested in deriving context-based insights from the books to power a recommendations index for a reader. And if we are talking about, for example, a publishing house such as Packt, with 7,500+ books in its collection, we need a solution that not only scales to process large numbers of pages, but also understands relationships in text, and provides interpretations based on semantics, grammar, word tokenization, and language to create smart indexes. We will cover a detailed walkthrough of this solution, along with code samples and demo videos, in Chapter 5, Creating NLP Search.

Today's enterprises are grappling with leveraging meaningful insights from their data primarily due to the pace at which it is growing. Until a decade or so, most organizations used relational databases for all their data management needs, and some still do even today. This was fine because the data volume need was in single-digit terabytes or less. In the last few years, the technology landscape has witnessed a significant upheaval with smartphones becoming ubiquitous, the large-scale proliferation of connected devices (in the billions), the ability to dynamically scale infrastructure in size and into new geographies, and storage and compute costs becoming cheaper due to the democratization offered by the cloud. All of this means applications get used more often, have much larger user bases, more processing power, and capabilities, can accelerate their pace of innovation with faster go-to-market cycles, and as a result, have a need to store and manage petabytes of data. This, coupled with application users demanding faster response times and higher throughput, has put a strain on the performance of relational databases, fueling a move toward purpose-built databases such as Amazon DynamoDB, a key-value and document database that delivers single-digit millisecond latency at any scale.

While this move signals a positive trend, what is more interesting is how enterprises utilize this data to gain strategic insights. After all, data is only as useful as the information we can glean from it. We see many organizations, while accepting the benefits of purpose-built tools, implementing these changes in silos. So, there are varying levels of maturity in properly harnessing the advantages of data. Some departments use an S3 data lake (https://aws.amazon.com/products/storage/data-lake-storage/) to source data from disparate sources and run ML to derive context-based insights, others are consolidating their data in purpose-built databases, while the rest are still using relational databases for all their needs.

You can see a basic explanation of the main components of a data lake in the following Figure 1.5, An example of an Amazon S3 data lake:

Figure 1.4 – An example of an Amazon S3 data lake

Figure 1.4 – An example of an Amazon S3 data lake

Let's see how NLP can continue to add business value in this situation by referring back to our book publishing example. Suppose we successfully built our smart indexing solution, and now we need to update it with book reviews received via Twitter feeds. The searchable index should provide book recommendations based on review sentiment (for example, don't recommend a book if reviews are negative > 50% in the last 3 months). Traditionally, business insights are generated by running a suite of reports on behemoth data warehouses that collect, mine, and organize data into marts and dimensions. A tweet may not even be under consideration as a data source. These days, things have changed and mining social media data is an important aspect of generating insights. Setting up business rules to examine every tweet is a time-consuming and compute-intensive task. Furthermore, since a tweet is unstructured text, a slight change in semantics may impact the effectiveness of the solution.

Now, if you consider model training, the infrastructure required to build accurate NLP models typically uses the deep learning architecture called Transformers (please see https://www.packtpub.com/product/transformers-for-natural-language-processing/9781800565791) that use sequence-to-sequence processing without needing to process the tokens in order, resulting in a higher degree of parallelization. Transformer model families use billions of parameters with the training architecture using clusters of instances for distributed learning, which adds to time and costs.

AWS offers AI services that allow you, with just a few lines of code, to add NLP to your applications for the sentiment analysis of unstructured text at an almost limitless scale and immediately take advantage of the immense potential waiting to be discovered in unstructured text. We will cover AWS AI services in more detail from Chapter 2, Introducing Amazon Textract, onward.

In this section, we reviewed some challenges organizations encounter when building NLP solutions, such as complexities in digitizing paper-based text, understanding patterns from structured and unstructured data, and how resource-intensive these solutions can be. Let's now understand why NLP is an important mainstream technology for enterprises today.

Understanding why NLP is becoming mainstream

According to this report (https://www.marketsandmarkets.com/Market-Reports/natural-language-processing-nlp-825.html, accessed on March 23, 2021), the global NLP market is expected to grow to USD 35.1 billion by 2026, at a Compound Annual Growth Rate (CAGR) of 20.3% during the forecast period. This is not surprising considering the impact ML is making across every industry (such as finance, retail, manufacturing, energy, utilities, real estate, healthcare, and so on) in organizations of every size, primarily driven by the advent of cloud computing and the economies of scale available.

This article about Emergence Cycle (https://blogs.gartner.com/anthony_bradley/2020/10/07/announcing-gartners-new-emergence-cycle-research-for-ai/), a research into emerging technologies in NLP (based on patents submitted, and looking at technology still in labs or recently released), shows the most mature usage of NLP is multimedia content analysis. This trend is true based on our experience, and content analysis to gain strategic insights is a common NLP requirement based on our discussions with a number of organizations across industries:

Figure 1.5 – Gartner's NLP Emergence Cycle 2020

Figure 1.5 – Gartner's NLP Emergence Cycle 2020

For example, in 2020, when the world was struggling with the effects of the pandemic, a number of organizations adopted AI and specifically NLP to power predictions on the virus spread patterns, assimilate knowledge on virus behavior and vaccine research, and monitor the effectiveness of safety measures, to name a few. In April 2020, AWS launched an NLP-powered search site called https://cord19.aws/ using an AWS AI service called Amazon Kendra (https://aws.amazon.com/kendra/). The site provides an easy interface to search the COVID-19 Open Research Dataset using natural language questions. As the dataset is constantly updated based on the latest research on COVID-19, CORD-19 Search, due to its support for NLP, makes it easy to navigate this ever-expanding collection of research documents and find precise answers to questions. The search results provide not only specific text that contains the answer to the question but also the original body of text in which these answers are located:

Figure 1.6 – CORD-19 search results

Figure 1.6 – CORD-19 search results

Fred Hutchinson Cancer Research Center is a research institute focused on curing cancer by 2025. Matthew Trunnell, Chief Information Officer of Fred Hutchinson Cancer Research Center, has said the following:

"The process of developing clinical trials and connecting them with the right patients requires research teams to sift through and label mountains of unstructured clinical record data. Amazon Comprehend Medical will reduce this time burden from hours to seconds. This is a vital step toward getting researchers rapid access to the information they need when they need it so they can find actionable insights to advance lifesaving therapies for patients."

For more details and usage examples of Amazon Comprehend and Amazon Comprehend Medical, please refer to Chapter 3, Introducing Amazon Comprehend.

So, how can AI and NLP help us cure cancer or prepare for a pandemic? It's about recognizing patterns where none seem to exist. Unstructured text, such as documents, social media posts, and email messages, is similar to the treasure waiting in Ali Baba's cave. To understand why, let's briefly look at how NLP works.

NLP models train by learning what are called word embeddings, which are vector representations of words in large collections of documents. These embeddings capture semantic relationships and word distributions in documents, thereby helping to map the context of a word based on its relationship to other words in the document. The two common training architectures for learning word embeddings are Skip-gram and Continuous Bag of Words (CBOW). In Skip-gram, the embeddings of the input word are used to derive the distribution of the related words to predict the context, and in CBOW, the embeddings of the related words are used to predict the word in the middle. Both are neural network-based architectures and work well for context-based analytics use cases.

Now that we understand the basics of NLP (analyzing patterns in text by converting words to their vector representations), when we look at training models using text data from disparate data sources, unique insights are often derived due to patterns that emerge that previously appeared hidden when looked at within a narrower context, because we are using numbers to find relationships in text. For example, The Rubber Episode in the Amazon Prime TV show This Giant Beast That Is The Global Economy shows how a fungal disease has the potential to devastate the global economy, even though at first it might appear there is no link between the two. According to the US National Library of Medicine, natural rubber accounts for 40% of the world's consumption, and the South American Leaf Blight (SALB) fungal disease has the potential to spread worldwide and severely inhibit rubber production. Airplanes can't land without rubber, and its uses are so myriad that it would have unprecedented implications on the economy. This an example of a pattern that ML and NLP models are so good at finding specific items of interest across vast text corpora.

Before AWS and cloud computing revolutionized access to advanced technologies, setting up NLP models for text analytics was challenging to say the least. The most common reasons were as follows:

  • Lack of skills: Expertise in identifying data, feature engineering, building models, training, and tuning are all tasks that require a unique combination of skills, including software engineering, mathematics, statistics, and data engineering, that only a few practitioners have.
  • Initial infrastructure setup cost: ML training is an iterative process, often requiring a trial-and-error approach to tune the models to get the desired accuracy. Further training and inference may require GPU acceleration based on the volume of data and the number of requests, requiring a high initial investment.
  • Scalability with the current on-premises environment: Running ML training and inference from on-premises servers constrains the elasticity required to scale compute and storage based on model size, data volumes, and inference throughput needed. For example, training large-scale transformer models may require massively parallel clusters, and capacity planning for such scenarios is challenging.
  • Availability of tools to help orchestrate the various moving parts of NLP training: As mentioned before, the ML workflow comprises many tasks, such as data discovery, feature engineering, algorithm selection, model building, which includes training and fine-tuning the models several times, and finally deploying those models into production. Furthermore, getting an accurate model is a highly iterative process. Each of these tasks requires purpose-built tools and expertise to achieve the level of efficiency needed for good models, which is not easy.

Not anymore. The AWS AI services for natural language capabilities enable adding speech and text intelligence to applications using API calls rather than needing to develop and train models. NLU services provide the ability to convert speech to text with Amazon Transcribe (https://aws.amazon.com/transcribe/) or text to speech with Amazon Polly (https://aws.amazon.com/polly/). For NLP requirements, Amazon Textract (https://aws.amazon.com/textract/) enables applications to read and process handwritten and printed text from images and PDF documents, and with Amazon Comprehend (https://aws.amazon.com/comprehend/), applications can quickly analyze text and find insights and relationships with no prior ML training. For example, Assent, a supply chain data management company, used Amazon Textract to read forms, tables, and free-form text, and Amazon Comprehend to derive business-specific entities and values from the text. In this book, we will be walking you through how to use these services for some popular workflows. For more details, please refer to Chapter 4, Automating Document Processing Workflows.

In this section, we saw some examples of NLP's significance in solving real-world challenges, and what exactly it means. We understood that finding patterns in data can bring new meaning to light, and NLP models are very good at deriving these patterns. We then reviewed some technology challenges in NLP implementations and saw a brief overview of the AWS AI services. In the next section, we will introduce the AWS ML stack, and provide a brief overview of each of the layers.

Introducing the AWS ML stack

The AWS ML services and features are organized into three layers of the stack, keeping in mind that some developers and data scientists are expert ML practitioners who are comfortable working with ML frameworks, algorithms, and infrastructure to build, train, and deploy models.

For these experts, the bottom layer of the AWS ML stack offers powerful CPU and GPU compute instances (the https://aws.amazon.com/ec2/instance-types/p4/ instances offer the highest performance for ML training in the cloud today), support for major ML frameworks including TensorFlow, PyTorch, and MXNet, which customers can use to build models with Amazon SageMaker as a managed experience, or using deep learning AMIs and containers on Amazon EC2 instances.

You can see the three layers of the AWS ML stack in the next figure. For more details, please refer to https://aws.amazon.com/machine-learning/infrastructure/:

To make ML more accessible and expansive, at the middle layer of the stack, Amazon SageMaker is a fully managed ML platform that removes the undifferentiated heavy lifting at each step of the ML process. Launched in 2018, SageMaker is one of the fastest-growing services in AWS history and is built on Amazon's two decades of experience in building real-world ML applications. With SageMaker Studio, developers and data scientists have the first fully integrated development environment designed specifically for ML. To learn how to build ML models using Amazon SageMaker, refer to Julien Simon's book, Learn Amazon SageMaker, also published by Packt (https://www.packtpub.com/product/learn-amazon-sagemaker/9781800208919):

Figure 1.7 – A tabular list of Amazon SageMaker features for each step of the ML workflow

Figure 1.7 – A tabular list of Amazon SageMaker features for each step of the ML workflow

For customers who are not interested in dealing with models and training, at the top layer of the stack, the AWS AI services provide pre-trained models with easy integration by means of API endpoints for common ML use cases including speech, text, vision, recommendations, and anomaly detection:

Figure 1.8 – AWS AI services

Figure 1.8 – AWS AI services

Alright, it's time that we started getting technical. Now that we understand how cloud computing played a major role in bringing ML and AI to the mainstream and how adding NLP to your application can accelerate business outcomes, let's deep dive into the NLP services Amazon Textract for document analysis and Amazon Comprehend for advanced text analytics.

Ready? Let's go!!

Summary

In this chapter, we introduced NLP by tracing the origins of AI, how it evolved over the last few decades, and how the application of AI became mainstream with the significant advances made with ML algorithms. We reviewed some examples of these algorithms, along with an example of how they can be used. We then pivoted to AI trends and saw how AI adoption grew exponentially over the last few years and has become a key technology in accelerating enterprise business value.

We read a cool example of how ExxonMobil uses Alexa at their gas stations and delved into how AI was created to mimic human cognition, and the broad categories of their applicability, such as text, speech, and vision. We saw how AI in natural language has two main areas of usage NLU for voice-based uses and NLP for deriving insights from text.

In analyzing how enterprises are building NLP models today, we reviewed some of the common challenges and how to mitigate them, such as digitizing paper-based text, collecting data from disparate sources, and understanding patterns in data, and how resource-intensive these solutions can be.

We then reviewed NLP industry trends and market segmentation and saw with an example how important NLP was and still continues to be during the pandemic. We dove deep into the philosophy of NLP and realized it was all about converting text to numerical representations and understanding the underlying patterns to decipher new meanings. We looked at an example of this pattern with how SALB could impact the global economy.

Finally, we reviewed the technology implications in setting up NLP training and the associated challenges. We reviewed the three layers of the AWS ML stack and introduced AWS AI services that provided pre-built models and ready-made intelligence.

In the next chapter, we will introduce Amazon Textract, a fully managed ML service that can read both printed and handwritten text from images and PDFs without having to train or build models and can be used without the need for ML skills. We will cover the features of Amazon Textract, what its functions are, what business challenges it was created to solve, what types of user requirements it can be applied to, and how easy it is to integrate Amazon Textract with other AWS services such as AWS Lambda for building business applications.

Further reading

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get to grips with AWS AI services for NLP and find out how to use them to gain strategic insights
  • Run Python code to use Amazon Textract and Amazon Comprehend to accelerate business outcomes
  • Understand how you can integrate human-in-the-loop for custom NLP use cases with Amazon A2I

Description

Natural language processing (NLP) uses machine learning to extract information from unstructured data. This book will help you to move quickly from business questions to high-performance models in production. To start with, you'll understand the importance of NLP in today’s business applications and learn the features of Amazon Comprehend and Amazon Textract to build NLP models using Python and Jupyter Notebooks. The book then shows you how to integrate AI in applications for accelerating business outcomes with just a few lines of code. Throughout the book, you'll cover use cases such as smart text search, setting up compliance and controls when processing confidential documents, real-time text analytics, and much more to understand various NLP scenarios. You'll deploy and monitor scalable NLP models in production for real-time and batch requirements. As you advance, you'll explore strategies for including humans in the loop for different purposes in a document processing workflow. Moreover, you'll learn best practices for auto-scaling your NLP inference for enterprise traffic. Whether you're new to ML or an experienced practitioner, by the end of this NLP book, you'll have the confidence to use AWS AI services to build powerful NLP applications.

Who is this book for?

If you're an NLP developer or data scientist looking to get started with AWS AI services to implement various NLP scenarios quickly, this book is for you. It will show you how easy it is to integrate AI in applications with just a few lines of code. A basic understanding of machine learning (ML) concepts is necessary to understand the concepts covered. Experience with Jupyter notebooks and Python will be helpful.

What you will learn

  • Automate various NLP workflows on AWS to accelerate business outcomes
  • Use Amazon Textract for text, tables, and handwriting recognition from images and PDF files
  • Gain insights from unstructured text in the form of sentiment analysis, topic modeling, and more using Amazon Comprehend
  • Set up end-to-end document processing pipelines to understand the role of humans in the loop
  • Develop NLP-based intelligent search solutions with just a few lines of code
  • Create both real-time and batch document processing pipelines using Python

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 26, 2021
Length: 508 pages
Edition : 1st
Language : English
ISBN-13 : 9781801812535
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 26, 2021
Length: 508 pages
Edition : 1st
Language : English
ISBN-13 : 9781801812535
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Can$6 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Can$ 201.97
Natural Language Processing with AWS AI Services
Can$69.99
Learn Amazon SageMaker
Can$61.99
Machine Learning with Amazon SageMaker Cookbook
Can$69.99
Total Can$ 201.97 Stars icon
Banner background image

Table of Contents

22 Chapters
Section 1:Introduction to AWS AI NLP Services Chevron down icon Chevron up icon
Chapter 1: NLP in the Business Context and Introduction to AWS AI Services Chevron down icon Chevron up icon
Chapter 2: Introducing Amazon Textract Chevron down icon Chevron up icon
Chapter 3: Introducing Amazon Comprehend Chevron down icon Chevron up icon
Section 2: Using NLP to Accelerate Business Outcomes Chevron down icon Chevron up icon
Chapter 4: Automating Document Processing Workflows Chevron down icon Chevron up icon
Chapter 5: Creating NLP Search Chevron down icon Chevron up icon
Chapter 6: Using NLP to Improve Customer Service Efficiency Chevron down icon Chevron up icon
Chapter 7: Understanding the Voice of Your Customer Analytics Chevron down icon Chevron up icon
Chapter 8: Leveraging NLP to Monetize Your Media Content Chevron down icon Chevron up icon
Chapter 9: Extracting Metadata from Financial Documents Chevron down icon Chevron up icon
Chapter 10: Reducing Localization Costs with Machine Translation Chevron down icon Chevron up icon
Chapter 11: Using Chatbots for Querying Documents Chevron down icon Chevron up icon
Chapter 12: AI and NLP in Healthcare Chevron down icon Chevron up icon
Section 3: Improving NLP Models in Production Chevron down icon Chevron up icon
Chapter 13: Improving the Accuracy of Document Processing Workflows Chevron down icon Chevron up icon
Chapter 14: Auditing Named Entity Recognition Workflows Chevron down icon Chevron up icon
Chapter 15: Classifying Documents and Setting up Human in the Loop for Active Learning Chevron down icon Chevron up icon
Chapter 16: Improving the Accuracy of PDF Batch Processing Chevron down icon Chevron up icon
Chapter 17: Visualizing Insights from Handwritten Content Chevron down icon Chevron up icon
Chapter 18: Building Secure, Reliable, and Efficient NLP Solutions Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(21 Ratings)
5 star 95.2%
4 star 4.8%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Hitesh Hinduja Sep 29, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is one of the books where you would want to read more and more. The author has explained concepts in a simplified way with significant amount of practical demos. It also gives you a strong business sense before starting any chapter and then deep dives into how NLP techniques can be used in AWS to solve the problem. Must read book for all
Amazon Verified review Amazon
JANGWON KIM Dec 05, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I like the most about this book is three: easy read to follow, covering how to build end to end solutions, and good explanation about AWS AI services.I'm working on healthcare AI domain, which is a part of what this book help for.
Amazon Verified review Amazon
Wrick T Dec 02, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A very practical approach to natural language processing and solving business problems.The authors took a very practical, engaging and intuitive approach with examples, tips and well-structured content. Examples clearly outline solutions to practical business problem using AWS services, which is very helpful for the readers. Highly recommended read if you are starting your journey implementing natural language processing in the AWS Cloud.
Amazon Verified review Amazon
Amit Lodh Dec 29, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is targeted for NLP practitioners looking to use AWS AI services to create solutions for business outcomes. The authors, experienced practitioners themselves, do an outstanding job in taking the reader through the details of AWS AI services for building NLP solutions. The readers will get a lot of value from the details from the AWS console, visual examples and the varied perspectives of building NLP solutions. I especially appreciated the examples of NLP addressing different business needs such as enterprise search solutions, improving customer service efficiency, automating document processing, monetizing media content and data extraction for financial services and healthcare. Authors also cover the production aspects of NLP solutions, such as security, reliability and efficiency. Utilizing the untapped immense value from the unstructured data is going to be critical for realizing value of digital transformation in the area of business model transformation. NLP is going to be key for that. If you are a technical practitioner, you will want to read this book to add value to the business outcome. If you are a business or technical leader, leading the charge of digital transformation in your organization, you should read the book to understand the value NLP and extraction of information from unstructured document in accelerating your business outcome.
Amazon Verified review Amazon
Viral Shah Dec 21, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Anyone who is new to AI/ML space, I would highly recommend to read this book. Both the authors have done fantastic job to explain the concepts in a lucid manner. Book is full of relevant examples which makes every chapter interesting. This book really helps to get up to speed on this newer concepts, and you won't regret reading this book !
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.