Text preprocessing
NLP is an exciting and evolving field that lies at the intersection of computer science and linguistics. It empowers computers with the ability to understand, analyze, interpret, and generate text data. However, working with text data presents a unique set of challenges, one that differs from the tabular and image data we worked with in the earlier sections of this book. Figure 10.1 gives us a high-level overview of some of the inherent challenges that text data presents. Let’s drill into them and see what and how they present issues to us when building deep learning models with text data.
Figure 10.1 – The challenges presented by text data
Text data in its natural form is unstructured, and this is just the beginning of the uniqueness of this interesting type of data we will work with in this chapter. Let's illustrate some of the issues by looking at these two sentences – “The house next to ours is beautiful...