You're reading from The Handbook of NLP with Gensim Leverage topic modeling to uncover hidden patterns, themes, and valuable insights within textual data

Product type Paperback

Published in Oct 2023

Publisher Packt

ISBN-13 9781803244945

Length 310 pages

Edition 1st Edition

Tools

fastText

Concepts

Mobile Application Development

Author (1):

Chris Kuo

Preface

1. Part 1: NLP Basics

2. Chapter 1: Introduction to NLP FREE CHAPTER

3. Chapter 2: Text Representation

4. Chapter 3: Text Wrangling and Preprocessing

5. Part 2: Latent Semantic Analysis/Latent Semantic Indexing

6. Chapter 4: Latent Semantic Analysis with scikit-learn

7. Chapter 5: Cosine Similarity

8. Chapter 6: Latent Semantic Indexing with Gensim

9. Part 3: Word2Vec and Doc2Vec

10. Chapter 7: Using Word2Vec

11. Chapter 8: Doc2Vec with Gensim

12. Part 4: Topic Modeling with Latent Dirichlet Allocation

13. Chapter 9: Understanding Discrete Distributions

14. Chapter 10: Latent Dirichlet Allocation

15. Chapter 11: LDA Modeling

16. Chapter 12: LDA Visualization

17. Chapter 13: The Ensemble LDA for Model Stability

18. Part 5: Comparison and Applications

19. Chapter 14: LDA and BERTopic

20. Chapter 15: Real-World Use Cases

21. Assessments

22. Index

23. Other Books You May Enjoy

Chapter 14 – LDA and BERTopic

BERT enhances the Transformer model by teaching the Transformer to learn from the words before and after each word so it knows the context and order better. This helps the Transformer understand tricky things such as jokes or words with multiple meanings, making it excellent at understanding all kinds of text, such as chatting or reading books. BERT removes the unidirectionality constraint in the Transformer and uses an MLM that randomly masks some of the input tokens. Since some tokens are masked, MLM has to predict the original vocabulary of the masked word based on its before and after context.
BERT consists of five modules: BERT, UMAP, HDBSCAN, c-TFIDF, and MMR.
UMAP stands for Uniform Manifold Approximation and Projection. It is a clever way to turn complex data into simpler pictures. Imagine you have a big puzzle with lots of pieces (data points), and you want to arrange them on a board so that similar pieces are close together...

The rest of the chapter is locked