You're reading from Pretrain Vision and Large Language Models in Python End-to-end techniques for building and deploying foundation models on AWS

Product type Paperback

Published in May 2023

Publisher Packt

ISBN-13 9781804618257

Length 258 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

GPT/LLMs

Author (1):

Emily Webber

View More author details

Table of Contents (23) Chapters

Preface

1. Part 1: Before Pretraining

2. Chapter 1: An Introduction to Pretraining Foundation Models FREE CHAPTER

3. Chapter 2: Dataset Preparation: Part One

4. Chapter 3: Model Preparation

5. Part 2: Configure Your Environment

6. Chapter 4: Containers and Accelerators on the Cloud

7. Chapter 5: Distribution Fundamentals

8. Chapter 6: Dataset Preparation: Part Two, the Data Loader

9. Part 3: Train Your Model

10. Chapter 7: Finding the Right Hyperparameters

11. Chapter 8: Large-Scale Training on SageMaker

12. Chapter 9: Advanced Training Concepts

13. Part 4: Evaluate Your Model

14. Chapter 10: Fine-Tuning and Evaluating

15. Chapter 11: Detecting, Mitigating, and Monitoring Bias

16. Chapter 12: How to Deploy Your Model

17. Part 5: Deploy Your Model

18. Chapter 13: Prompt Engineering

19. Chapter 14: MLOps for Vision and Language

20. Chapter 15: Future Trends in Pretraining Foundation Models

21. Index

Why subscribe?

22. Other Books You May Enjoy

Encoders and decoders

Now, I’d like to briefly introduce you to two key topics that you’ll see in the discussion of transformer-based models: encoders and decoders. Let’s establish some basic intuition to help you understand what they are all about. An encoder is simply a computational graph (or neural network, function, or object depending on your background), which takes an input with a larger feature space and returns an object with a smaller feature space. We hope (and demonstrate computationally) that the encoder is able to learn what is most essential about the provided input data.

Typically, in large language and vision models, the encoder itself is composed of a number of multi-head self-attention objects. This means that in transformer-based models, an encoder is usually a number of self-attention steps, learning what is most essential about the provided input data and passing this onto the downstream model. Let’s look at a quick visual:

Figure 1.1 – Encoders and decoders

Intuitively, as you can see in the preceding figure, the encoder starts with a larger input space and iteratively compresses this to a smaller latent space. In the case of classification, this is just a classification head with output allotted for each class. In the case of masked language modeling, encoders are stacked on top of each other to better predict tokens to replace the masks. This means the encoders output an embedding, or a numerical representation of that token, and after prediction, the tokenizer is reused to translate that embedding back into natural language.

One of the earliest large language models, BERT, is an encoder-only model. Most other BERT-based models, such as DeBERTa, DistiliBERT, RoBERTa, DeBERTa, and others in this family use encoder-only model architectures. Decoders operate exactly in reverse, starting with a compressed representation and iteratively recomposing that back into a larger feature space. Both encoders and decoders can be combined, as in the original Transformer, to solve text-to-text problems.

To make it easier, here’s a short table quickly summarizing the three types of self-attention blocks we’ve looked at, encoders, decoders, and their combination.

Size of inputs and outputs	Type of self-attention blocks	Machine learning tasks	Example models
Long to short	Encoder	Classification, any dense representation	BERT, DeBERTa, DistiliBERT, RoBERTa, XLM, AlBERT, CLIP, VL-BERT, Vision Transformer
Short to long	Decoder	Generation, summarization, question-answering, any sparse representation	GPT, GPT-2, GPT-Neo, GPT-J, ChatGPT, GPT-4, BLOOM, OPT
Equal	Encoder-decoder	Machine translation, style translation	T5, BART, BigBird, FLAN-T5, Stable Diffusion

Table 1.3 – Encoders, decoders, and their combination

Now that you have a better understanding of encoders, decoders, and the models they create, let’s close out the chapter with a quick recap of all the concepts you just learned about.

You're reading from Pretrain Vision and Large Language Models in Python End-to-end techniques for building and deploying foundation models on AWS

Table of Contents (23) Chapters

Encoders and decoders

Authors (1)

Personalised recommendations for you