Summary
In this chapter, we explored the foundations of NLP. We began by looking at how to handle real-world text data, and we explored some preprocessing ideas, using tools such as Beautiful Soup, requests, and regular expressions. Then, we unpacked various ideas, such as tokenization, sequencing, and the use of word embedding to transform text data into vector representations, which not only preserved the sequential order of text data but also captured the relationships between words. We took a step further by building a sentiment analysis classifier using the Yelp Polarity dataset from the TensorFlow dataset. Finally, we performed a series of experiments with different hyperparameters in a bid to improve our base model’s performance and overcome overfitting.
In the next chapter, we will introduce Recurrent Neural Networks (RNNs) and see how they do things differently from the DNN we used in this chapter. We will put RNNs to the test as we will build a new classifier with...