4. Deep Learning for Text – Embeddings
Overview
In this chapter, we will begin our foray into Natural Language Processing for text. We will start by using the Natural Language Toolkit to perform text preprocessing on raw text data, where we will tokenize the raw text and remove punctuations and stop words. As we progress through this chapter, we will implement classical approaches to text representation, such as one-hot encoding and the TF-lDF approach. This chapter demonstrates the power of word embeddings and explains the popular deep learning-based approaches for embeddings. We will use the Skip-gram and Continuous Bag of Words algorithms to generate our own word embeddings. We will explore the properties of the embeddings, the different parameters of the algorithms, and generate vectors for phrases. By the end of this chapter, you will be able to handle text data and start using word embeddings by using pre-trained models, as well as your own embeddings.