Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Generative AI with LangChain

You're reading from   Generative AI with LangChain Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781835083468
Length 368 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ben Auffarth Ben Auffarth
Author Profile Icon Ben Auffarth
Ben Auffarth
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. What Is Generative AI? FREE CHAPTER 2. LangChain for LLM Apps 3. Getting Started with LangChain 4. Building Capable Assistants 5. Building a Chatbot Like ChatGPT 6. Developing Software with Generative AI 7. LLMs for Data Science 8. Customizing LLMs and Their Output 9. Generative AI in Production 10. The Future of Generative Models 11. Other Books You May Enjoy
12. Index

What can AI do in other domains?

Generative AI models have demonstrated impressive capabilities across modalities such as sound, music, video, and 3D shapes. In the audio domain, models can synthesize natural speech, generate original music compositions, and even mimic a speaker’s voice and the patterns of rhythm and sound (prosody).

Speech-to-text systems can convert spoken language into text (Automatic Speech Recognition (ASR)). For video, AI systems can create photorealistic footage from text prompts and perform sophisticated editing like object removal. 3D models learned to reconstruct scenes from images and generate intricate objects from textual descriptions.

There are many types of generative models, handling different data modalities across various domains, as shown in the following table:

Model Type

Input

Output

Examples

Text-to-Text

Text

Text

Mixtral, GPT-4, Claude 3, Gemini

Text-to-Image

Text

Images

DALL-E 2, Stable Diffusion, Imagen

Text-to-Audio

Text

Audio

Jukebox, AudioLM, MusicGen

Text-to-Video

Text

Video

Sora

Image-to-Text

Images

Text

CLIP, DALL-E 3

Image-to-Image

Images

Images

Super-resolution, style transfer, inpainting

Text-to-Code

Text

Code

Stable Diffusion, DALL-E 3, AlphaCode, Codex

Video-to-Audio

Video

Audio

Soundify

Text-to-Math

Text

Mathematical Expressions

ChatGPT, Claude

Text-to-Scientific

Text

Scientific Output

Minerva, Galactica

Algorithm Discovery

Text/Data

Algorithms

AlphaTensor

Multimodal Input

Text, Images

Text, Images

GPT-4V

Table 1.1: Models for audio, video, and other domains

There are a lot more combinations of modalities to consider; these are just some that I have come across. We could consider subcategories of text, such as text-to-math, which generates mathematical expressions from text, where some models such as ChatGPT and Claude shine; or text-to-code, which are models that generate programming code from text, such as AlphaCode and Codex. A few models specialize in scientific text, such as Minerva and Galactica, or algorithm discovery, such as AlphaTensor.

A few models work with several modalities for input or output. An example of a model that demonstrates generative capabilities in multimodal input is OpenAI’s GPT-4V model (GPT-4 with vision), released in September 2023, which takes both text and images and comes with better Optical Character Recognition (OCR) than previous versions to read text from images. Images can be translated into descriptive words, then text filters are applied. This mitigates the risk of generating unconstrained image captions.

As the table shows, text is a common input modality that can be converted into various outputs, like image, audio, and video. The outputs can also be converted back into text or kept within the same modality. LLMs have driven rapid progress for text-focused domains. These models enable a diverse range of capabilities via different modalities and domains. The LLM categories are the main focus of this book; however, we’ll also occasionally look at other models, text-to-image in particular. These models typically use a Transformer architecture trained on massive datasets via self-supervised learning.

Underlying many of these innovations are advances in deep generative architectures like GANs, diffusion models, and transformers. Leading AI labs at Google, OpenAI, Meta, and DeepMind are leading the way in innovation.

You have been reading a chapter from
Generative AI with LangChain
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781835083468
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image