Chapter 8. Text Mining and Natural Language Processing
Natural language processing (NLP) is ubiquitous today in various applications such as mobile apps, ecommerce websites, emails, news websites, and more. Detecting spam in e-mails, characterizing e-mails, speech synthesis, categorizing news, searching and recommending products, performing sentiment analysis on social media brands—these are all different aspects of NLP and mining text for information.
There has been an exponential increase in digital information that is textual in content—in the form of web pages, e-books, SMS messages, documents of various formats, e-mails, social media messages such as tweets and Facebook posts, now ranges in exabytes (an exabyte is 1,018 bytes). Historically, the earliest foundational work relying on automata and probabilistic modeling began in the 1950s. The 1970s saw changes such as stochastic modeling, Markov modeling, and syntactic parsing, but their progress was limited...