You're reading from Generative AI with LangChain Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

Product type Paperback

Published in Dec 2023

Publisher Packt

ISBN-13 9781835083468

Length 368 pages

Edition 1st Edition

Languages

Python

Tools

ChatGPT

Concepts

Artificial Intelligence

Author (1):

Ben Auffarth

View More author details

Table of Contents (13) Chapters

Preface

1. What Is Generative AI? FREE CHAPTER

2. LangChain for LLM Apps

3. Getting Started with LangChain

4. Building Capable Assistants

5. Building a Chatbot Like ChatGPT

6. Developing Software with Generative AI

7. LLMs for Data Science

8. Customizing LLMs and Their Output

9. Generative AI in Production

10. The Future of Generative Models

11. Other Books You May Enjoy

12. Index

What can AI do in other domains?

Generative AI models have demonstrated impressive capabilities across modalities such as sound, music, video, and 3D shapes. In the audio domain, models can synthesize natural speech, generate original music compositions, and even mimic a speaker’s voice and the patterns of rhythm and sound (prosody).

Speech-to-text systems can convert spoken language into text (Automatic Speech Recognition (ASR)). For video, AI systems can create photorealistic footage from text prompts and perform sophisticated editing like object removal. 3D models learned to reconstruct scenes from images and generate intricate objects from textual descriptions.

There are many types of generative models, handling different data modalities across various domains, as shown in the following table:

Model Type	Input	Output	Examples
Text-to-Text	Text	Text	Mixtral, GPT-4, Claude 3, Gemini
Text-to-Image	Text	Images	DALL-E 2, Stable Diffusion, Imagen
Text-to-Audio	Text	Audio	Jukebox, AudioLM, MusicGen
Text-to-Video	Text	Video	Sora
Image-to-Text	Images	Text	CLIP, DALL-E 3
Image-to-Image	Images	Images	Super-resolution, style transfer, inpainting
Text-to-Code	Text	Code	Stable Diffusion, DALL-E 3, AlphaCode, Codex
Video-to-Audio	Video	Audio	Soundify
Text-to-Math	Text	Mathematical Expressions	ChatGPT, Claude
Text-to-Scientific	Text	Scientific Output	Minerva, Galactica
Algorithm Discovery	Text/Data	Algorithms	AlphaTensor
Multimodal Input	Text, Images	Text, Images	GPT-4V

Table 1.1: Models for audio, video, and other domains

There are a lot more combinations of modalities to consider; these are just some that I have come across. We could consider subcategories of text, such as text-to-math, which generates mathematical expressions from text, where some models such as ChatGPT and Claude shine; or text-to-code, which are models that generate programming code from text, such as AlphaCode and Codex. A few models specialize in scientific text, such as Minerva and Galactica, or algorithm discovery, such as AlphaTensor.

A few models work with several modalities for input or output. An example of a model that demonstrates generative capabilities in multimodal input is OpenAI’s GPT-4V model (GPT-4 with vision), released in September 2023, which takes both text and images and comes with better Optical Character Recognition (OCR) than previous versions to read text from images. Images can be translated into descriptive words, then text filters are applied. This mitigates the risk of generating unconstrained image captions.

As the table shows, text is a common input modality that can be converted into various outputs, like image, audio, and video. The outputs can also be converted back into text or kept within the same modality. LLMs have driven rapid progress for text-focused domains. These models enable a diverse range of capabilities via different modalities and domains. The LLM categories are the main focus of this book; however, we’ll also occasionally look at other models, text-to-image in particular. These models typically use a Transformer architecture trained on massive datasets via self-supervised learning.

Underlying many of these innovations are advances in deep generative architectures like GANs, diffusion models, and transformers. Leading AI labs at Google, OpenAI, Meta, and DeepMind are leading the way in innovation.

You're reading from Generative AI with LangChain Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

Table of Contents (13) Chapters

What can AI do in other domains?

Authors (1)

Personalised recommendations for you