Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Natural Language Processing with TensorFlow
Natural Language Processing with TensorFlow

Natural Language Processing with TensorFlow: Teach language to machines using Python's deep learning library

eBook
AU$33.99 AU$48.99
Paperback
AU$60.99
Subscription
Free Trial
Renews at AU$24.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing with TensorFlow

Chapter 1. Introduction to Natural Language Processing

Natural Language Processing (NLP) is an important tool for understanding and processing the immense volume of unstructured data in today's world. Recently, deep learning has been widely adopted for many NLP tasks because of the remarkable performance that deep learning algorithms have shown in a plethora of challenging tasks, such as, image classification, speech recognition, and realistic text generation. TensorFlow, in turn, is one of the most intuitive and efficient deep learning frameworks currently in existence. This book will enable aspiring deep learning developers to handle massive amounts of data using NLP and TensorFlow.

In this chapter, we will provide an introduction to NLP and to the rest of the book. We will answer the question, "What is Natural Language Processing?" Also, we'll look at some of its most important uses. We will also consider the traditional approaches and the more recent deep learning-based approaches to NLP, including a Fully-Connected Neural Network (FCNN). Finally, we will conclude with an overview of the rest of the book and the technical tools we will be using.

What is Natural Language Processing?

According to IBM, 2.5 exabytes (1 exabyte = 1,000,000,000 gigabytes) of data were generated every day in 2017, and this is growing as this book is being written. To put that into perspective, if all the human beings in the world were to process that data, it would be roughly 300 MB for each of us every day to process. Of all this data, a large fraction is unstructured text and speech as there are millions of emails and social media content created and phone calls made every day.

These statistics provide a good basis for us to define what NLP is. Simply put, the goal of NLP is to make machines understand our spoken and written languages. Moreover, NLP is ubiquitous and is already a large part of human life. Virtual Assistants (VAs), such as Google Assistant, Cortana, and Apple Siri, are largely NLP systems. Numerous NLP tasks take place when one asks a VA, "Can you show me a good Italian restaurant nearby?". First, the VA needs to convert the utterance to text (that is, speech-to-text). Next, it must understand the semantics of the request (for example, the user is looking for a good restaurant with an Italian cuisine) and formulate a structured request (for example, cuisine = Italian, rating = 3-5, distance < 10 km). Then, the VA must search for restaurants filtering by the location and cuisine, and then, sort the restaurants by the ratings received. To calculate an overall rating for a restaurant, a good NLP system may look at both the rating and text description provided by each user. Finally, once the user is at the restaurant, the VA might assist the user by translating various menu items from Italian to English. This example shows that NLP has become an integral part of human life.

It should be understood that NLP is an extremely challenging field of research as words and semantics have a highly complex nonlinear relationship, and it is even more difficult to capture this information as a robust numerical representation. To make matters worse, each language has its own grammar, syntax, and vocabulary. Therefore, processing textual data involves various complex tasks such as text parsing (for example, tokenization and stemming), morphological analysis, word sense disambiguation, and understanding the underlying grammatical structure of a language. For example, in these two sentences, I went to the bank and I walked along the river bank, the word bank has two entirely different meanings. To distinguish or (disambiguate) the word bank, we need to understand the context in which the word is being used. Machine learning has become a key enabler for NLP, helping to accomplish the aforementioned tasks through machines.

Tasks of Natural Language Processing

NLP has a multitude of real-world applications. A good NLP system is that which performs many NLP tasks. When you search for today's weather on Google or use Google Translate to find out how to say, "How are you?" in French, you rely on a subset of such tasks in NLP. We will list some of the most ubiquitous tasks here, and this book covers most of these tasks:

  • Tokenization: Tokenization is the task of separating a text corpus into atomic units (for example, words). Although it may seem trivial, tokenization is an important task. For example, in the Japanese language, words are not delimited by spaces nor punctuation marks.
  • Word-sense Disambiguation (WSD): WSD is the task of identifying the correct meaning of a word. For example, in the sentences, The dog barked at the mailman, and Tree bark is sometimes used as a medicine, the word bark has two different meanings. WSD is critical for tasks such as question answering.
  • Named Entity Recognition (NER): NER attempts to extract entities (for example, person, location, and organization) from a given body of text or a text corpus. For example, the sentence, John gave Mary two apples at school on Monday will be transformed to [John]name gave [Mary]name [two]number apples at [school]organization on [Monday.]time. NER is an imperative topic in fields such as information retrieval and knowledge representation.
  • Part-of-Speech (PoS) tagging: PoS tagging is the task of assigning words to their respective parts of speech. It can either be basic tags such as noun, verb, adjective, adverb, and preposition, or it can be granular such as proper noun, common noun, phrasal verb, verb, and so on.
  • Sentence/Synopsis classification: Sentence or synopsis (for example, movie reviews) classification has many use cases such as spam detection, news article classification (for example, political, technology, and sport), and product review ratings (that is, positive or negative). This is achieved by training a classification model with labeled data (that is, reviews annotated by humans, with either a positive or negative label).
  • Language generation: In language generation, a learning model (for example, neural network) is trained with text corpora (a large collection of textual documents), which predict new text that follows. For example, language generation can output an entirely new science fiction story by using existing science fiction stories for training.
  • Question Answering (QA): QA techniques possess a high commercial value, and such techniques are found at the foundation of chatbots and VA (for example, Google Assistant and Apple Siri). Chatbots have been adopted by many companies for customer support. Chatbots can be used to answer and resolve straightforward customer concerns (for example, changing a customer's monthly mobile plan), which can be solved without human intervention. QA touches upon many other aspects of NLP such as information retrieval, and knowledge representation. Consequently, all this makes developing a QA system very difficult.
  • Machine Translation (MT): MT is the task of transforming a sentence/phrase from a source language (for example, German) to a target language (for example, English). This is a very challenging task as, different languages have highly different morphological structures, which means that it is not a one-to-one transformation. Furthermore, word-to-word relationships between languages can be one-to-many, one-to-one, many-to-one, or many-to-many. This is known as the word alignment problem in MT literature.

Finally, to develop a system that can assist a human in day-to-day tasks (for example, VA or a chatbot) many of these tasks need to be performed together. As we saw in the previous example where the user asks, "Can you show me a good Italian restaurant nearby?" several different NLP tasks, such as speech-to-text conversion, semantic and sentiment analyses, question answering, and machine translation, need to be completed. In Figure 1.1, we provide a hierarchical taxonomy of different NLP tasks categorized into several different types. We first have two broad categories: analysis (analyzing existing text) and generation (generating new text) tasks. Then we divide analysis into three different categories: syntactic (language structure-based tasks), semantic (meaning-based tasks), and pragmatic (open problems difficult to solve):

Tasks of Natural Language Processing

Figure 1.1: A taxonomy of the popular tasks of NLP categorized under broader categories

Having understood the various tasks in NLP, let us now move on to understand how we can solve these tasks with the help of machines.

The traditional approach to Natural Language Processing

The traditional or classical approach to solving NLP is a sequential flow of several key steps, and it is a statistical approach. When we take a closer look at a traditional NLP learning model, we will be able to see a set of distinct tasks taking place, such as preprocessing data by removing unwanted data, feature engineering to get good numerical representations of textual data, learning to use machine learning algorithms with the aid of training data, and predicting outputs for novel unfamiliar data. Of these, feature engineering was the most time-consuming and crucial step for obtaining good performance on a given NLP task.

Understanding the traditional approach

The traditional approach to solving NLP tasks involves a collection of distinct subtasks. First, the text corpora need to be preprocessed focusing on reducing the vocabulary and distractions. By distractions, I refer to the things that distract the algorithm (for example, punctuation marks and stop word removal) from capturing the vital linguistic information required for the task.

Next, comes several feature engineering steps. The main objective of feature engineering is to make the learning easier for the algorithms. Often the features are hand-engineered and biased toward the human understanding of a language. Feature engineering was of utter importance for classical NLP algorithms, and consequently, the best performing systems often had the best engineered features. For example, for a sentiment classification task, you can represent a sentence with a parse tree and assign positive, negative, or neutral labels to each node/subtree in the tree to classify that sentence as positive or negative. Additionally, the feature engineering phase can use external resources such as WordNet (a lexical database) to develop better features. We will soon look at a simple feature engineering technique known as bag-of-words.

Next, the learning algorithm learns to perform well at the given task using the obtained features and optionally, the external resources. For example, for a text summarization task, a thesaurus that contains synonyms of words can be a good external resource. Finally, prediction occurs. Prediction is straightforward, where you will feed a new input and obtain the predicted label by forwarding the input through the learning model. The entire process of the traditional approach is depicted in Figure 1.2:

Understanding the traditional approach

Figure 1.2: The general approach of classical NLP

Example – generating football game summaries

To gain an in-depth understanding of the traditional approach to NLP, let's consider a task of automatic text generation from the statistics of a game of football. We have several sets of game statistics (for example, score, penalties, and yellow cards) and the corresponding articles generated for that game by a journalist, as the training data. Let's also assume that for a given game, we have a mapping from each statistical parameter to the most relevant phrase of the summary for that parameter. Our task here is that, given a new game, we need to generate a natural looking summary about the game. Of course, this can be as simple as finding the best-matching statistics for the new game from the training data and retrieving the corresponding summary. However, there are more sophisticated and elegant ways of generating text.

If we were to incorporate machine learning to generate natural language, a sequence of operations such as preprocessing the text, tokenization, feature engineering, learning, and prediction are likely to be performed.

Preprocessing the text involves operations, such as stemming (for example, converting listened to listen) and removing punctuation (for example, ! and ;), in order to reduce the vocabulary (that is, features), thus reducing the memory requirement. It is important to understand that stemming is not a trivial operation. It might appear that stemming is a simple operation that relies on a simple set of rules such as removing ed from a verb (for example, the stemmed result of listened is listen); however, it requires more than a simple rule base to develop a good stemming algorithm, as stemming certain words can be tricky (for example, the stemmed result of argued is argue). In addition, the effort required for proper stemming can vary in complexity for other languages.

Tokenization is another preprocessing step that might need to be performed. Tokenization is the process of dividing a corpus into small entities (for example, words). This might appear trivial for a language such as English, as the words are isolated; however, this is not the case for certain languages such as Thai, Japanese, and Chinese, as these languages are not consistently delimited.

Feature engineering is used to transform raw text data into an appealing numerical format so that a model can be trained on that data, for example, converting text into a bag-of-words representation or using the n-gram representation which we will discuss later. However, remember that state-of-the-art classical models rely on much more sophisticated feature engineering techniques.

The following are some of the feature engineering techniques:

Bag-of-words: This is a feature engineering technique that creates feature representations based on the word occurrence frequency. For example, let's consider the following sentences:

  • Bob went to the market to buy some flowers
  • Bob bought the flowers to give to Mary

The vocabulary for these two sentences would be:

["Bob", "went", "to", "the", "market", "buy", "some", "flowers", "bought", "give", "Mary"]

Next, we will create a feature vector of size V (vocabulary size) for each sentence showing how many times each word in the vocabulary appears in the sentence. In this example, the feature vectors for the sentences would respectively be as follows:

[1, 1, 2, 1, 1, 1, 1, 1, 0, 0, 0]

[1, 0, 2, 1, 0, 0, 0, 1, 1, 1, 1]

A crucial limitation of the bag-of-words method is that it loses contextual information as the order of words is no longer preserved.

n-gram: This is another feature engineering technique that breaks down text into smaller components consisting of n letters (or words). For example, 2-gram would break the text into two-letter (or two-word) entities. For example, consider this sentence:

Bob went to the market to buy some flowers

The letter level n-gram decomposition for this sentence is as follows:

["Bo", "ob", "b ", " w", "we", "en", ..., "me", "e "," f", "fl", "lo", "ow", "we", "er", "rs"]

The word-based n-gram decomposition is this:

["Bob went", "went to", "to the", "the market", ..., "to buy", "buy some", "some flowers"]

The advantage in this representation (letter, level) is that the vocabulary will be significantly smaller than if we were to use words as features for large corpora.

Next, we need to structure our data to be able to feed it into a learning model. For example, we will have data tuples of the form, (statistic, a phrase explaining the statistic) as follows:

Total goals = 4, "The game was tied with 2 goals for each team at the end of the first half"

Team 1 = Manchester United, "The game was between Manchester United and Barcelona"

Team 1 goals = 5, "Manchester United managed to get 5 goals"

The learning process may comprise three sub modules: a Hidden Markov Model (HMM), a sentence planner, and a discourse planner. In our example, a HMM might learn the morphological structure and grammatical properties of the language by analyzing the corpus of related phrases. More specifically, we will concatenate each phrase in our dataset to form a sequence, where the first element is the statistic followed by the phrase explaining it. Then, we will train a HMM by asking it to predict the next word, given the current sequence. Concretely, we will first input the statistic to the HMM and then get the prediction made by the HMM; then, we will concatenate the last prediction to the current sequence and ask the HMM to give another prediction, and so on. This will enable the HMM to output meaningful phrases, given statistics.

Next, we can have a sentence planner that corrects any linguistic mistakes (for example, morphological or grammar), which we might have in the phrases. For examples, a sentence planner outputs the phrase, I go house as I go home; it can use a database of rules, which contains the correct way of conveying meanings (for example, the need of a preposition between a verb and the word house).

Now we can generate a set of phrases for a given set of statistics using a HMM. Then, we need to aggregate these phrases in such a way that an essay made from the collection of phrases is human readable and flows correctly. For example, consider the three phrases, Player 10 of the Barcelona team scored a goal in the second half, Barcelona played against Manchester United, and Player 3 from Manchester United got a yellow card in the first half; having these sentences in this order does not make much sense. We like to have them in this order: Barcelona played against Manchester United, Player 3 from Manchester United got a yellow card in the first half, and Player 10 of the Barcelona team scored a goal in the second half. To do this, we use a discourse planner; discourse planners can order and structure a set of messages that need to be conveyed.

Now we can get a set of arbitrary test statistics and obtain an essay explaining the statistics by following the preceding workflow, which is depicted in Figure 1.3:

Example – generating football game summaries

Figure 1.3: A step from a classical approach example of solving a language modelling task

Here, it is important to note that this is a very high level explanation that only covers the main general-purpose components that are most likely to be included in the traditional way of NLP. The details can largely vary according to the particular application we are interested in solving. For example, additional application-specific crucial components might be needed for certain tasks (a rule base and an alignment model in machine translation). However, in this book, we do not stress about such details as the main objective here is to discuss more modern ways of natural language processing.

Drawbacks of the traditional approach

Let's list several key drawbacks of the traditional approach as this would lay a good foundation for discussing the motivation for deep learning:

  • The preprocessing steps used in traditional NLP forces a trade-off of potentially useful information embedded in the text (for example, punctuation and tense information) in order to make the learning feasible by reducing the vocabulary. Though preprocessing is still used in modern deep-learning-based solutions, it is not as crucial as for the traditional NLP workflow due to the large representational capacity of deep networks.
  • Feature engineering needs to be performed manually by hand. In order to design a reliable system, good features need to be devised. This process can be very tedious as different feature spaces need to be extensively explored. Additionally, in order to effectively explore robust features, domain expertise is required, which can be scarce for certain NLP tasks.
  • Various external resources are needed for it to perform well, and there are not many freely available ones. Such external resources often consist of manually created information stored in large databases. Creating one for a particular task can take several years, depending on the severity of the task (for example, a machine translation rule base).

The deep learning approach to Natural Language Processing

I think it is safe to assume that deep learning revolutionized machine learning, especially in fields such as computer vision, speech recognition, and of course, NLP. Deep models created a wave of paradigm shifts in many of the fields in machine learning, as deep models learned rich features from raw data instead of using limited human-engineered features. This consequentially caused the pesky and expensive feature engineering to be obsolete. With this, deep models made the traditional workflow more efficient, as deep models perform feature learning and task learning, simultaneously. Moreover, due to the massive number of parameters (that is, weights) in a deep model, it can encompass significantly more features than a human would've engineered. However, deep models are considered a black box due to the poor interpretability of the model. For example, understanding the "how" and "what" features learnt by deep models for a given problem still remains an open problem.

A deep model is essentially an artificial neural network that has an input layer, many interconnected hidden layers in the middle, and finally, an output layer (for example, a classifier or a regressor). As you can see, this forms an end-to-end model from raw data to predictions. These hidden layers in the middle give the power to deep models as they are responsible for learning the good features from raw data, eventually succeeding at the task at hand.

History of deep learning

Let's briefly discuss the roots of deep learning and how the field evolved to be a very promising technique for machine learning. In 1960, Hubel and Weisel performed an interesting experiment and discovered that a cat's visual cortex is made of simple and complex cells, and that these cells are organized in a hierarchical form. Also, these cells react differently to different stimuli. For example, simple cells are activated by variously oriented edges while complex cells are insensitive to spatial variations (for example, the orientation of the edge). This kindled the motivation for replicating a similar behavior in machines, giving rise to the concept of deep learning.

In the years that followed, neural networks gained the attention of many researchers. In 1965, a neural network trained by a method known as the Group Method of Data Handling (GMDH) and based on the famous Perceptron by Rosenblatt, was introduced by Ivakhnenko and others. Later, in 1979, Fukushima introduced the Neocognitron, which laid the base for one of the most famous variants of deep models—Convolution Neural Networks. Unlike the perceptrons, which always took in a 1D input, a neocognitron was able to process 2D inputs using convolution operations.

Artificial neural networks used to backpropagate the error signal to optimize the network parameters by computing a Jacobian matrix from one layer to the layer before it. Furthermore, the problem of vanishing gradients strictly limited the potential number of layers (depth) of the neural network. The gradients of layers closer to the inputs, being very small, is known as the vanishing gradients phenomenon. This transpired due to the application of the chain rule to compute gradients (the Jacobian matrix) of lower layer weights. This in turn limited the plausible maximum depth of classical neural networks.

Then in 2006, it was found that pretraining a deep neural network by minimizing the reconstruction error (obtained by trying to compress the input to a lower dimensionality and then reconstructing it back into the original dimensionality) for each layer of the network, provides a good initial starting point for the weight of the neural network; this allows a consistent flow of gradients from the output layer to the input layer. This essentially allowed neural network models to have more layers without the ill-effects of the vanishing gradient. Also, these deeper models were able to surpass traditional machine learning models in many tasks, mostly in computer vision (for example, test accuracy for the MNIST hand-written digit dataset). With this breakthrough, deep learning became the buzzword in the machine learning community.

Things started gaining a progressive momentum, when in 2012, AlexNet (a deep convolution neural network created by Alex Krizhevsky (http://www.cs.toronto.edu/~kriz/), Ilya Sutskever (http://www.cs.toronto.edu/~ilya/), and Geoff Hinton) won the Large Scale Visual Recognition Challenge (LSVRC) 2012 with an error decrease of 10% from the previous best. During this time, advances were made in speech recognition, wherein state-of-the-art speech recognition accuracies were reported using deep neural networks. Furthermore, people began realizing that Graphical Processing Units (GPUs) enable more parallelism, which allows for faster training of larger and deeper networks compared with Central Processing Units (CPUs).

Deep models were further improved with better model initialization techniques (for example, Xavier initialization), making the time-consuming pretraining redundant. Also, better nonlinear activation functions, such as Rectified Linear Units (ReLUs), were introduced, which alleviated the ill-effects of the vanishing gradient in deeper models. Better optimization (or learning) techniques, such as Adam, automatically tweaked individual learning rates of each parameter among the millions of parameters that we have in the neural network model, which rewrote the state-of-the-art performance in many different fields of machine learning, such as object classification and speech recognition. These advancements also allowed neural network models to have large numbers of hidden layers. The ability to increase the number of hidden layers (that is, to make the neural networks deep) is one of the primary contributors to the significantly better performance of neural network models compared with other machine learning models. Furthermore, better intermediate regularizers, such as batch normalization layers, have improved the performance of deep nets for many tasks.

Later, even deeper models such as ResNets, Highway Nets, and Ladder Nets were introduced, which had hundreds of layers and billions of parameters. It was possible to have such an enormous number of layers with the help of various empirically and theoretically inspired techniques. For example, ResNets use shortcut connections to connect layers that are far apart, which minimizes the diminishing of gradients, layer to layer, as discussed earlier.

The current state of deep learning and NLP

Many different deep models have seen the light since their inception in early 2000. Even though they share a resemblance, such as all of them using nonlinear transformation of the inputs and parameters, the details can vary vastly. For example, a Convolution Neural Network (CNN) can learn from two-dimensional data (for example, RGB images) as it is, while a multilayer perceptron model requires the input to be unwrapped to a one-dimensional vector, causing loss of important spatial information.

When processing text, as one of the most intuitive interpretations of text is to perceive it as a sequence of characters, the learning model should be able to do time-series modelling, thus requiring the memory of the past. To understand this, think of a language modelling task; the next word for the word cat should be different from the next word for the word climbed. One such popular model that encompasses this ability is known as a Recurrent Neural Network (RNN). We will see in Chapter 6, Recurrent Neural Networks how exactly RNNs achieve this by going through interactive exercises.

It should be noted that memory is not a trivial operation that is inherent to a learning model. Conversely, ways of persisting memory should be carefully designed. Also, the term memory should not be confused with the learned weights of a non-sequential deep network that only looks at the current input, where a sequential model (for example, RNN) will look at both the learned weights and the previous element of the sequence to predict the next output.

One prominent drawback of RNNs is that they cannot remember more than few (approximately 7) time steps, thus lacking long-term memory. Long Short-Term Memory (LSTM) networks are an extension of RNNs that encapsulate long-term memory. Therefore, often LSTMs are preferred over standard RNNs, nowadays. We will peek under the hood in Chapter 7, Long Short-Term Memory Networks to understand them better.

In summary, we can mainly separate deep networks into two categories: the non-sequential models that deal with only a single input at a time for both training and prediction (for example, image classification) and the sequential models that cope with sequences of inputs of arbitrary length (for example, text generation where a single word is a single input). Then we can categorize non-sequential (also called feed-forward) models into deep (approximately less than 20 layers) and very deep networks (can be greater than hundreds of layers). The sequential models are categorized into short-term memory models (for example, RNNs), which can only memorize short-term patterns and long-term memory models, which can memorize longer patterns. In Figure 1.4, we outline the discussed taxonomy. It is not expected that you understand these different deep learning models fully at this point, but it only illustrates the diversity of the deep learning models:

The current state of deep learning and NLP

Figure 1.4: A general taxonomy of the most commonly used deep learning methods, categorized into several classes

Understanding a simple deep model – a Fully-Connected Neural Network

Now let's have a closer look at a deep neural network in order to gain a better understanding. Although there are numerous different variants of deep models, let's look at one of the earliest models (dating back to 1950-60), known as a Fully-Connected Neural Network (FCNN), or sometimes called a multilayer perceptron. The Figure 1.5 depicts a standard three-layered FCNN.

The goal of a FCNN is to map an input (for example, an image or a sentence) to a certain label or annotation (for example, the object category for images). This is achieved by using an input x to compute h—a hidden representation of x—using a transformation such as h = sigma (W * x + b); here, W and b are the weights and bias of the FCNN, respectively, and sigma is the sigmoid activation function. Next, a classifier (for example, a softmax classifier) is placed on top of the FCNN that gives the ability to leverage the learned features in hidden layers to classify inputs. Classifier, essentially is a part of the FCNN and yet another hidden layer with some weights, W s and a bias, b s. Also, we can calculate the final output of the FCNN as, output = softmax (W s * h + b s ). For example, a softmax classifier provides a normalized representation of the scores output by the classifier layer; the label is considered to be the output node with the highest softmax value. Then, with this, we can define a classification loss that is calculated as the difference between the predicted output label and the actual output label. An example of such a loss function is the mean squared loss. You don't have to worry if you don't understand the actual intricacies of the loss function. We will discuss quite a few of them in later chapters. Next, the neural network parameters, W, b, W s, and b s, are optimized using a standard stochastic optimizer (for example, the stochastic gradient descent) to reduce the classification loss all the inputs. Figure 1.5 depicts the process explained in this paragraph for a three-layer FCNN. We will walk-through the details on how to use such a model for NLP tasks, step by step in Chapter 3, Word2vec – Learning Word Embeddings.

Understanding a simple deep model – a Fully-Connected Neural Network

Figure 1.5: An example of a Fully Connected Neural Network (FCNN)

Let's look at an example of how to use a neural network for a sentiment analysis task. Consider that we have a dataset where the input is a sentence expressing a positive or negative opinion about a movie and a corresponding label saying if the sentence is actually positive (1) or negative (0). Then, we are given a test data set, where we have single sentence movie reviews, and our task is to classify these new sentences as positive or negative.

It is possible to use a neural network (which can be deep or shallow, depending on the difficulty of the task) for this task by adhering to the following workflow:

  1. Tokenize the sentence by words
  2. Pad the sentences with a special token if necessary, to bring all sentences to a fixed length
  3. Convert the sentences into a numerical representation (for example, Bag-of-Words representation)
  4. Feed the numerical inputs to the neural network and predict the output (positive or negative)
  5. Optimize the neural network using a desired loss function

The roadmap – beyond this chapter

This section delineates the details of the rest of the book; it's brief, but has informative details about what each chapter of the book covers. In this book, we will be looking at numerous exciting fields of NLP, from algorithms that find word similarities without any sort of annotated data, to algorithms that can write a story by themselves.

Starting from the next chapter, we will dive into the details about several popular and interesting NLP tasks. In order to gain an in-depth knowledge and to make the learning interactive, various exercises are also provided. We will use Python and TensorFlow, an open-source library for distributed numerical computations, for all the implementations. TensorFlow encapsulates advance technicalities such as optimizing your code for GPUs using Compute Unified Device Architecture (CUDA), which can be challenging. Furthermore, TensorFlow provides built-in functions for implementing deep learning algorithms, for example, activations, stochastic optimization methods, and convolutions, making everyone's life easier.

We will embark on a journey that covers many hot topics of NLP and how they perform, while using TensorFlow to see the state-of-the-art algorithms in action. This is what we will look at in this book:

  • Chapter 2, Understanding TensorFlow, provides you with a sound guide to understand how to write client programs and run them in TensorFlow. This is important especially if you are new to TensorFlow, because TensorFlow behaves differently from a traditional coding language such as Python. This chapter will first offer an in-depth explanation about how TensorFlow executes a client. This will help you to understand the TensorFlow execution workflow and feel comfortable around TensorFlow terminology. Next, the chapter will walk you through various elements of a TensorFlow client such as defining variables, defining operations/functions, feeding inputs to an algorithm, and obtaining the results. We will finally discuss how all this knowledge of TensorFlow can be used to implement a moderately complex neural network to classify images of hand-written images.
  • Chapter 3, Word2vec – Learning Word Embeddings. The objective of this chapter is to introduce Word2vec—a method to learn numerical representations of words that reflects semantic of the words. But before diving straight into the Word2vec techniques, we will first discuss some classical approaches used to represent word semantics. One of the early approach was to rely on WordNet—a large lexical database. WordNet can be used to measure the semantic similarity between different words. However, maintaining such a large lexical database is costly. Therefore, there exist other simpler representation techniques, such as one-hot-encoded representations, and the term-frequency inverse document frequency method, that doesn't rely on external resources. Following this, we will move onto the modern way of learning word vectors known as Word2vec, where we use a neural network to learn word representations. We will discuss two popular Word2vec techniques: skip-gram and continuous bag-of-words (CBOW) model.
  • Chapter 4, Advanced Word2vec. We will start this chapter with several comparisons including a comparison between the skip-gram and CBOW algorithms to see if there is a clear-cut winner. Then we will discuss several extensions that have been introduced to the original Word2vec techniques over the course of the past few years. For example, ignoring common words in the text, such as "the" and "a", that have a high probability, improves the performance of the Word2vec models. On the other hand, the Word2vec model only considers the local context of a word and ignores the global statistics of the entire corpus. Consequently, a word embedding learning technique known as GloVe, which incorporates both global and local statistics in finding word vectors will be discussed.
  • Chapter 5, Sentence Classification with Convolution Neural Networks, introduces you to convolution neural networks (CNNs). Convolution networks are a powerful family of deep models that can leverage the spatial structure of an input to learn from data. In other words, a CNN can process images in their two-dimensional form, where a multilayer perceptron needs the image to be unwrapped to a one-dimensional vector. We will first discuss various operations that undergoes in CNNs, such as the convolution and pooling operations, in detail. Then we will see an example where we will learn to classify hand-written digit images with a CNN. Then we will transition into an application of CNNs in NLP. Precisely, we will be investigating how to apply a CNN to classify sentences, where the task is to classify if a sentence is about a person, location, object, and so on.
  • Chapter 6, Recurrent Neural Networks, focuses on introducing recurrent neural networks (RNNs) and using RNNs for language generation. RNNs are different from feed-forward neural networks (for example, CNNs) as RNNs have memory. The memory is stored as a continuously updated system state. We will start with a representation of a feed-forward neural network and modify that representation to learn from sequences of data instead of individual data points. This process will transform the feed-forward network to a RNN. This will be followed by a technical description about the exact equations used for computations within the RNN. Next, we will discuss the optimization process of RNNs that is used to update the RNN's weights. Thereafter we will iterate through different types of RNNs such as one-to-one RNNs and one-to-many RNNs. We will then walkthrough an exciting application of RNNs, where the RNN will learn to tell new stories by learning from a corpus of existing stories. We achieve this by training the RNN to predict the next word given the preceding sequence of words of the story. Finally, we will discuss a variant of standard RNNs, which we call RNN-CF (RNN with contextual features), and will compare it with the standard RNN to see which one performs better.
  • Chapter 7, Long Short-Term Memory Networks, discusses LSTMs by initially providing a solid intuition to how these models work and progressively diving into the technical details adequate to implement them on your own. Standard RNNs suffer from the crucial limitation of the inability to persist long-term memory. However, advanced RNN models (for example, long short-term memory cells (LSTMs) and gated recurrent units (GRUs)) have been proposed, which can remember sequences for large number of time steps. We will also examine how exactly does the LSTMs alleviate the problem of persisting long-term memory (this is known as the vanishing gradient problem). We will then discuss several improvements that can be used to improve LSTM models further such as predicting for several time steps ahead at once and reading sequences both forward and backward. Finally, we will discuss several variants of LSTM models such as GRUs and LSTMs with peephole connections.
  • Chapter 8, Applications of LSTM – Generating Text, explains how to implement LSTMs, GRUs, and LSTMs with peephole connections discussed in Chapter 7, Long Short-Term Memory Networks. Furthermore, we will compare the performance of these extensions both qualitatively and quantitatively. We will also discuss how to implement some of the extensions examined in Chapter 7, Long Short-Term Memory Networks such as predicting several time steps ahead (known as beam search) and using word vectors as inputs instead of one-hot-encoded representations. Finally, we will discuss how we can use the RNN API, which is a sub library of TensorFlow that simplifies the implementation of recurrent models.
  • Chapter 9, Applications of LSTM – Image Caption Generation, looks at another exciting application, where the model learns how to generate captions (that is, descriptions) for images using an LSTM and a CNN. This application is interesting because it shows us how to combine two different types of models as well as how to learn with multimodal data (for example, images and text). The specific way to achieve this is to first learn image representations (similar to word vectors) with the CNN and train the LSTM by feeding that image vector followed by the words of the description of the image as a sequence. We will first discuss how we can use a pretrained CNN to obtain the image representations. Then we will discuss how to learn the word embeddings. Next we will discuss how to feed the image vectors along with word embeddings to train the LSTM. This is followed by a description of different evaluation metrics that exist for evaluating image captioning systems. Afterwards, we will evaluate the captions generated by our model, both qualitatively and quantitatively. We will conclude the chapter with a guide of how to implement the same system using the TensorFlow RNN API.
  • Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation. Machine Translation has gained a lot of attention both due to the necessity of automating translation and the inherent difficulty of the task. We will start the chapter with a brief historical flashback of how machine translation was implemented in the early days. This discussion ends with an introduction to neural machine translation (NMT) systems. We will see how well current NMT systems are doing compared to old systems (such as statistical machine translation systems), which will motivate us to learn about NMT systems. Afterwards, we will discuss the intuition behind the design of NMT systems and continue with the technical details. Then we will discuss the evaluation metric we use to evaluate our system. Following this, we will investigate how we can implement a German to English translator from scratch. Next, we will learn about ways to improve NMT systems. We will look at one of those extensions in detail, called attention mechanism. Attention mechanism has become an essential in sequence to sequence learning problems. Finally, we will compare the performance improvement obtained with attention mechanism and analyze reasons behind the performance gain. This chapter concludes with a section on how the same concept of NMT systems can be extended to implement chatbots. Chatbots are systems that can communicate with humans and are used to fulfill various customer requests.
  • Chapter 11, Current Trends and the Future of Natural Language Processing. Natural language processing has branched out to a vast spectrum of different tasks. Here we will discuss some of the current trends and future developments of NLP we can expect in the future. We will first discuss various word embedding extensions that have emerged recently. We will also look at the implementation of one such embedding learning technique, known as tv-embeddings. Next, we will examine various trends growing in the field of neural machine translation. Then we will look at how NLP is combined with other fields such as computer vision and reinforcement learning to solve some interesting problems such as teaching computer agents to communicate by devising their own language. Another booming area these days is artificial general intelligence, which is about developing systems that can do multiple tasks (classify images, translate text, caption images, and so on) with a single system. We will investigate several such systems. Afterwards, we will talk about the introduction of NLP into mining social media. We will conclude this chapter with some of the new tasks emerging (for example, language grounding – developing common sense NLP systems) and new models (for example, phased LSTMs).
  • Appendix, Mathematical Foundations and Advanced TensorFlow, will introduce the reader to various mathematical data structures (for example, matrices) and operations (for example, matrix inverse). We will also discuss several important concepts in probability. We will then introduce Keras—a high-level library that uses TensorFlow underneath. Keras makes the implementing of neural networks simpler by hiding some of the details in TensorFlow, which some might find challenging. Concretely, we will see how we can implement a CNN with Keras, to get a feel of how to use Keras. Next, we will discuss how we can use the seq2seq library in TensorFlow to implement a neural machine translation system with much less code that we used in Chapter 11, Current Trends and the Future of Natural Language Processing. Finally, we will walk you through a guide aimed at teaching to use the TensorBoard to visualize word embeddings. TensorBoard is a handy visualization tool that is shipped with TensorFlow. This can be used to visualize and monitor various variables in your TensorFlow client.

Introduction to the technical tools

In this section, you will be introduced to the technical tools that will be used in the exercises of the following chapters. First, we will present a brief introduction to the main tools provided. Next, we will present a coarse guide on how to install each tool along with hyperlinks to detailed guides provided by the official websites. Additionally, we will share tips on how to make sure that the tools were installed properly.

Description of the tools

We will use Python as the coding/scripting language. Python is a very versatile easy-to-set-up coding language that is heavily used by the scientific community. Additionally, there are numerous scientific libraries floating around Python, catering to areas ranging from deep learning to probabilistic inference to data visualization. TensorFlow is one such library that is well-known among the deep learning community, providing many basic and advanced operations that are useful for deep learning. Next, we will use Jupyter notebooks in all our exercises as it provides a more interactive environment for coding compared to using an IDE. We will also use scikit-learn—another popular machine learning toolkit for Python—for various miscellaneous purposes such as data preprocessing. Another library we will be using for various text related operations is NLTK—Python natural language toolkit. Finally, we will use matplotlib for data visualization.

Installing Python and scikit-learn

Python is hassle-free to install in any of the commonly used operating systems such as Windows, macOS, or Linux. We will use Anaconda to set up Python, as it does all the laborious work for setting up Python as well as the essential libraries.

To install Anaconda, follow these steps:

  1. Download Anaconda from https://www.continuum.io/downloads
  2. Select the appropriate OS and download Python 3.5
  3. Install Anaconda by following the instructions at https://docs.continuum.io/anaconda/install/

To check whether Anaconda was properly installed, follow these steps:

  1. Open a Terminal window (Command Prompt in Windows)
  2. Now, run the following command:
    conda --version
    

If installed properly, the version of the current Anaconda distribution should be shown in Terminalthe instructions at http://scikit-learn.org/stable/install.html, NLTK from https://www.nltk.org/install.html and Matplotlib from https://matplotlib.org/users/installing.html.

Installing Jupyter Notebook

You can install Jupyter Notebook by following the instruction at http://jupyter.readthedocs.io/en/latest/install.html.

To check whether Jupyter Notebook is properly installed, follow these steps:

  1. Open a Terminal window
  2. Run this command:
    jupyter notebook
    

    You should be presented with a new browser window that looks like Figure 1.6:

    Installing Jupyter Notebook

    Figure 1.6. Jupyter Notebook installed successfully

Installing TensorFlow

Follow the instructions given at https://www.tensorflow.org/install/ under the Installing with Anaconda subsection to install TensorFlow. We will use TensorFlow 1.8.x throughout all the exercises.

When providing the tfBinaryURL as asked in the instruction, make sure that you provide a TensorFlow 1.8.x version. We stress this as the API has undergone many changes compared to the previous TensorFlow versions.

To check whether TensorFlow installed properly, follow these steps:

  1. Open Command Prompt in Windows or Terminal in Linux or macOS.
  2. Type python to enter the Python environment. You should now see the Python version right below. Make sure that you are using Python 3.
  3. Next, enter the following commands:
    import tensorflow as tf
    print(tf.__version__)

If all went well, you should not have any errors (there might be warnings if your computer does not have a dedicated GPU, but you can ignore them) and the TensorFlow version 1.8.x should be shown.

Note

Many cloud-based computational platforms are also available, where you can set up your own machine with various customization (operating system, GPU card type, number of GPU cards, and so on). Many are migrating to such cloud-based services due to the following benefits:

  • More customization options
  • Less maintenance effort
  • No infrastructure requirements

Several popular cloud-based computational platforms are as follows:

Summary

In this chapter, we broadly explored NLP to get an impression of the kind of tasks involved in building a good NLP-based system. First, we explained why we need NLP and then discussed various tasks of NLP to generally understand the objective of each task and how difficult it is to succeed at these tasks.

Next, we looked at the classical approach of solving NLP and went into the details of the workflow using an example of generating sport summaries for football games. We saw that the traditional approach usually involves cumbersome and tedious feature engineering. For example, in order to check the correctness of a generated phrase, we might need to generate a parse tree for that phrase. Next, we discussed the paradigm shift that transpired with deep learning and saw how deep learning made the feature engineering step obsolete. We started with a bit of time-travelling to go back to the inception of deep learning and artificial neural networks and worked our way to the massive modern networks with hundreds of hidden layers. Afterward, we walked through a simple example illustrating a deep model—a multilayer perceptron model—to understand the mathematical wizardry taking place in such a model (on the surface of course!).

With a nice foundation to both traditional and modern ways of approaching NLP, we then discussed the roadmap to understand the topics we will be covering in the book, from learning word embeddings to mighty LSTMs, generating captions for images to neural machine translators! Finally, we set up our environment by installing Python, scikit-learn, Jupyter Notebook, and TensorFlow.

In the next chapter, you will learn the basics of TensorFlow. By the end of the chapter, you should be comfortable with writing a simple algorithm that can take some input, transform the input through a defined function and output the result.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • • Focuses on more efficient natural language processing using TensorFlow
  • • Covers NLP as a field in its own right to improve understanding for choosing TensorFlow tools and other deep learning approaches
  • • Provides choices for how to process and evaluate large unstructured text datasets
  • • Learn to apply the TensorFlow toolbox to specific tasks in the most interesting field in artificial intelligence

Description

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.

Who is this book for?

This book is for Python developers with a strong interest in deep learning, who want to learn how to leverage TensorFlow to simplify NLP tasks. Fundamental Python skills are assumed, as well as some knowledge of machine learning and undergraduate-level calculus and linear algebra. No previous natural language processing experience required, although some background in NLP or computational linguistics will be helpful.

What you will learn

  • • Core concepts of NLP and various approaches to natural language processing
  • • How to solve NLP tasks by applying TensorFlow functions to create neural networks
  • • Strategies to process large amounts of data into word representations that can be used by deep learning applications
  • • Techniques for performing sentence classification and language generation using CNNs and RNNs
  • • About employing state-of-the art advanced RNNs, like long short-term memory, to solve complex text generation tasks
  • • How to write automatic translation programs and implement an actual neural machine translator from scratch
  • • The trends and innovations that are paving the future in NLP
Estimated delivery fee Deliver to Australia

Economy delivery 7 - 10 business days

AU$19.95

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 31, 2018
Length: 472 pages
Edition : 1st
Language : English
ISBN-13 : 9781788478311
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Australia

Economy delivery 7 - 10 business days

AU$19.95

Product Details

Publication date : May 31, 2018
Length: 472 pages
Edition : 1st
Language : English
ISBN-13 : 9781788478311
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
AU$24.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
AU$249.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just AU$5 each
Feature tick icon Exclusive print discounts
AU$349.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just AU$5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total AU$ 182.97
Hands-On Natural Language Processing with Python
AU$60.99
Natural Language Processing with TensorFlow
AU$60.99
Natural Language Processing and Computational Linguistics
AU$60.99
Total AU$ 182.97 Stars icon
Banner background image

Table of Contents

13 Chapters
1. Introduction to Natural Language Processing Chevron down icon Chevron up icon
2. Understanding TensorFlow Chevron down icon Chevron up icon
3. Word2vec – Learning Word Embeddings Chevron down icon Chevron up icon
4. Advanced Word2vec Chevron down icon Chevron up icon
5. Sentence Classification with Convolutional Neural Networks Chevron down icon Chevron up icon
6. Recurrent Neural Networks Chevron down icon Chevron up icon
7. Long Short-Term Memory Networks Chevron down icon Chevron up icon
8. Applications of LSTM – Generating Text Chevron down icon Chevron up icon
9. Applications of LSTM – Image Caption Generation Chevron down icon Chevron up icon
10. Sequence-to-Sequence Learning – Neural Machine Translation Chevron down icon Chevron up icon
11. Current Trends and the Future of Natural Language Processing Chevron down icon Chevron up icon
A. Mathematical Foundations and Advanced TensorFlow Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(10 Ratings)
5 star 80%
4 star 10%
3 star 0%
2 star 0%
1 star 10%
Filter icon Filter
Top Reviews

Filter reviews by




Victor Oct 07, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
With experience in Language Natural Processing I went into this book to introduce myself into TensorFlow. Definitely a good decision. The code is quite easy to follow, examples are useful and well explained. I recommend it.
Amazon Verified review Amazon
Deepal Bandaranayake Aug 27, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Natural language processing with Tensorflow is a very well-written book that gives a strong introduction to novel deep learning based NLP systems. With this book I've learned about word vectors, text generation, machine translation which are hot topics flying around at the moment.The book really dives into the details of implementing various NLP systems scanning through various TensorFlow functions involved in a modularized easy-to-follow manner. Each chapter is accompanied with Jupyter notebooks which again provide the full picture of the system end-to-end.If I'm to pick one particular example I liked in the book, I'd say it's the way the author describes the functioning of LSTMs. The author really brings the reader to his world and walk the reader through a easy-to-digest analogy of how an LSTM might operation, without much focus on mathematics. With that graceful entrance, he then continue to explain the LSTM in a mathematical perspective, which I found quite impressive.In conclusion, I genuinely enjoyed the book and think the book is a bang for bucks! I wouldn't hesitate this to another ML enthusiast looking for a good practical view of things!
Amazon Verified review Amazon
Tishan Jun 20, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book welcomes the reader to the modern deep learning based natural language processing techniques in a very progressive manner. Furthermore, the book often visualize elusive concepts with simpler explanations and colourful examples. For example, I liked the way Thushan discusses basics of TensorFlow and illustrates the workflow with a colourful example. Moreover, the book touches upon a multitude of NLP applications, providing a very diverse practical exposure to the current NLP solutions. I found the approach with more weight on the practicality and application, Thushan takes very appealing, in understanding the mechanics of various methods.
Amazon Verified review Amazon
Amazon Customer Sep 10, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The content covered in this book is excellent!!But binding quality of the publisher is very very poor. This happened for two books I purchased from Packet.
Amazon Verified review Amazon
Shirley Jun 29, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
it's useful~
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela