Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Mastering Transformers

You're reading from Mastering Transformers Build state-of-the-art models from scratch with advanced natural language processing techniques

Product type Paperback

Published in Sep 2021

Publisher Packt

ISBN-13 9781801077651

Length 374 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Mobile Application Development

Authors (2):

Savaş Yıldırım

Meysam Asgari- Chenaghlu

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

2. Chapter 1: From Bag-of-Words to the Transformer FREE CHAPTER

3. Chapter 2: A Hands-On Introduction to the Subject

4. Section 2: Transformer Models – From Autoencoding to Autoregressive Models

5. Chapter 3: Autoencoding Language Models

6. Chapter 4:Autoregressive and Other Language Models

7. Chapter 5: Fine-Tuning Language Models for Text Classification

8. Chapter 6: Fine-Tuning Language Models for Token Classification

9. Chapter 7: Text Representation

10. Section 3: Advanced Topics

11. Chapter 8: Working with Efficient Transformers

12. Chapter 9:Cross-Lingual and Multilingual Language Modeling

13. Chapter 10: Serving Transformer Models

14. Chapter 11: Attention Visualization and Experiment Tracking

15. Other Books You May Enjoy

Autoencoding language model training for any language

We have discussed how BERT works and that it is possible to use the pretrained version of it provided by the HuggingFace repository. In this section, you will learn how to use the HuggingFace library to train your own BERT.

Before we start, it is essential to have good training data, which will be used for the language modeling. This data is called the corpus, which is normally a huge pile of data (sometimes it is preprocessed and cleaned). This unlabeled corpus must be appropriate for the use case you wish to have your language model trained on; for example, if you are trying to have a special BERT for, let's say, the English language. Although there are tons of huge, good datasets, such as Common Crawl (https://commoncrawl.org/), we would prefer a small one for faster training.

The IMDB dataset of 50K movie reviews (available at https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews) is a large dataset...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Savaş Yıldırım

Savaş Yıldırım

Savaş Yıldırım graduated from the Istanbul Technical University Department of Computer Engineering and holds a Ph.D. degree in Natural Language Processing (NLP). Currently, he is an associate professor at the Istanbul Bilgi University, Turkey, and is a visiting researcher at the Ryerson University, Canada. He is a proactive lecturer and researcher with more than 20 years of experience teaching courses on machine learning, deep learning, and NLP. He has significantly contributed to the Turkish NLP community by developing a lot of open source software and resources. He also provides comprehensive consultancy to AI companies on their R&D projects. In his spare time, he writes and directs short films, and enjoys practicing yoga.

See other products by Savaş Yıldırım

Meysam Asgari- Chenaghlu

Meysam Asgari- Chenaghlu

Meysam Asgari-Chenaghlu is an AI manager at Carbon Consulting and is also a Ph.D. candidate at the University of Tabriz. He has been a consultant for Turkey's leading telecommunication and banking companies. He has also worked on various projects, including natural language understanding and semantic search.

See other products by Meysam Asgari- Chenaghlu

Other recommended products

Related to this chapter

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m

Getting Started with Google BERT

Getting Started with Google BERT

Getting Started with Google BERT will help you become well-versed with the BERT model from scratch and learn how to create interesting NLP applications. You'll understand several variants of BERT such as ALBERT, RoBERTa, DistilBERT, ELECTRA, VideoBERT, and many others in detail.

Jan 2021 11h 44m