Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Generative AI Application Integration Patterns
Generative AI Application Integration Patterns

Generative AI Application Integration Patterns: Integrate large language models into your applications

Arrow left icon
Profile Icon Juan Pablo Bustos Profile Icon Luis Lopez Soria
Arrow right icon
$34.98 $49.99
Paperback Sep 2024 218 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$34.98 $49.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Juan Pablo Bustos Profile Icon Luis Lopez Soria
Arrow right icon
$34.98 $49.99
Paperback Sep 2024 218 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$34.98 $49.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$27.98 $39.99
Paperback
$34.98 $49.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Generative AI Application Integration Patterns

Introduction to Generative AI Patterns

This chapter provides an overview of key concepts, techniques, and integration patterns related to generative AI that will empower you to harness these capabilities in real-world applications.

We will provide an overview of generative AI architectures, such as transformers and diffusion models, which are the basis for these generative models to produce text, images, audio, and more. You’ll get a brief introduction to specialized training techniques, like pre-training and prompt engineering, that upgrade basic language models into creative powerhouses.

Understanding the relentless pace of innovation in this space is critical due to new models and ethical considerations emerging constantly. We’ll introduce strategies for experimenting rapidly while ensuring responsible, transparent development.

The chapter also introduces common integration patterns for connecting generative AI into practical workflows. Whether crafting chatbots that leverage models in real time or performing batch enrichment of data, we will introduce prototyping blueprints to jumpstart building AI-powered systems.

By the end, you will have a one-thousand-foot view of which generative AI models are available, why experimentation is important, and how these integration patterns can help create value for your organization leveraging generative AI.

In a nutshell, the following main topics will be covered:

  • Interacting with AI
  • Predictive AI vs generative AI use case ideation
  • A change in the paradigm
  • General generative AI concepts
  • Introduction to generative AI integration patterns

From AI predictions to generative AI

The intent of this section is to provide a brief overview of artificial intelligence, highlighting our initial experiences with it. In the early 2000s, AI started to become more tangible for consumers. For example, in 2001, Google introduced the “Did you mean?” feature (https://blog.google/intl/en-mena/product-updates/explore-get-answers/25-biggest-moments-in-search-from-helpful-images-to-ai/), which suggests spelling corrections. This was one of Google’s first applications of machine learning and one of the early AI features that the general public got to experience on a large scale.

Over the following years, AI systems became more sophisticated, especially in areas like computer vision, speech-to-text conversion, and text-to-speech synthesis. Working in the telecom industry helped me witness the innovation driven by speech-to-text in particular. Integrating speech-to-text capabilities into interactive voice response (IVR) systems led to better user experiences by allowing people to speak their requests rather than punch numbers into a keypad. For example, you could be calling a bank where you would be welcomed by a message asking you to say “balance” to check your balance, “open account” in order to open an account, etc. Nowadays we are seeing more and more implementations of AI, simplifying more complex and time-consuming tasks.

The exponential increase in available computing power, paired with the massive datasets needed to train machine learning models, unleashed new AI capabilities. In the 2010s, AI started matching and even surpassing human performance on certain tightly defined tasks like image classification.

The advent of generative AI has reignited interest and innovation in the AI field, introducing new approaches for exploring use cases and system integration. Models like Gemini, PaLM, Claude, DALL-E, OpenAI GPT, and Stable Diffusion showcase the ability of AI systems to generate synthetic text, images, and other media. The outputs exhibit creativity and imagination that capture the public’s attention. However, the powerful capabilities of generative models also highlight new challenges around system design and responsible deployment. There is a need to rethink integration patterns and architecture to support safe, robust, and cost-effective implementations. Specifically, issues around security, bias, toxicity, and misinformation must be addressed through techniques like dataset filtering, human-in-the-loop systems, enhanced monitoring, and immediate remediation.

As generative AI continues maturing, best practices and governance frameworks must evolve in tandem. Industry leaders have formed partnerships like the Content Authenticity Initiative to develop technical standards and policy guidance around the responsible development of the next iteration of AI. This technology’s incredible potential, from accelerating drug discovery to envisioning new products, can only be realized through a commitment to transparency, ethics, and human rights. Constructive collaboration that balances innovation with caution is imperative.

Generative AI marks an inflection point for the field. The ripples from this groundswell of creative possibility are just beginning to reach organizations and communities. Maintaining an open, evidence-driven dialogue around not just capabilities but also challenges lays a foundation for AI deployment that empowers people, unlocks new utility, and earns widespread trust.

We are witnessing an unprecedented democratization of generative AI capabilities through publicly accessible APIs from established companies like Google, Meta, and Amazon, and startups such as Anthropic, Mistral AI, Stability AI, and OpenAI. The table below summarizes several leading models that provide versatile foundations for natural language and image generation.

Just a few years ago, developing with generative AI required specialized expertise in deep learning and access to vast computational resources. Now, models like Gemini, Claude, GPT-4, DALL-E, and Stable Diffusion can be accessed via simple API calls at near-zero cost. The bar for experimentation has never been lower.

This commoditization has sparked an explosion of new applications leveraging these pre-trained models – from creative tools for content generation to process automation solutions infused with AI. Expect integrations with generative foundations across all industries in the coming months and years.

Models are becoming more knowledgeable, with broader capabilities and reasoning that will reduce hallucinations and increase accuracy across model responses. Multimodality is also gaining traction, with models able to ingest and generate content across text, images, audio, video, and 3D scenes. In terms of scalability, model size and context windows continue expanding exponentially; for example, Google’s Gemini 1.5 now supports a context window of 1 million tokens.

Overall, the outlook points to a future where generative AI will become deeply integrated into most technologies. These models introduce new efficiencies and automation potential and inspire creativity across nearly every industry imaginable.

The table below highlights some of the most popular LLMs and their providers. The purpose of the table is to highlight the vast number of options available on the market at the time of writing this book. We expect this table to quickly become outdated by the time of publication and highly encourage readers to dive deep into the model providers’ websites to stay up to date with any new launches.

Model

Provider

Landing Page

Gemini

Google

https://deepmind.google/technologies/gemini

Claude

Anthropic

https://claude.ai/

ChatGPT

OpenAI

https://openai.com/blog/chatgpt

Stable Diffusion

Stability AI

https://stability.ai/

Mistral

Mistral AI

https://mistral.ai/

LLaMA

Meta

https://llama.meta.com/

Table 1.1: Overview of popular LLMs and their providers

Predictive AI vs generative AI use case ideation

Predictive AI refers to systems that analyze data to identify patterns and make forecasts or classifications about future events. In contrast, generative AI models create new synthetic content like images, text, or code based on the patterns gleaned from their training data. For example, with predictive AI, you can confidently identify if an image contains a cat or not, whereas with generative AI you can create an image of a cat from a text prompt, modify an existing image to include a cat where there was none, or generate a creative text blurb about a cat.

Product innovation focused on AI involves various phases of the product development lifecycle. With the emergence of generative AI, the paradigm has shifted away from initially needing to compile training data to train traditional ML models and toward leveraging flexible pre-trained models.

Foundational models like Google’s PaLM 2 and Gemini, OpenAI’s GPT and DALL-E, and Stable Diffusion provide broad foundations enabling rapid prototype development. Their versatile capabilities lower the barrier for experimenting with novel AI applications.

Where previously data curation and model training from scratch could take months before assessing viability, now proof-of-concept generation is possible within days without the need to fine-tune a foundation model.

This generative approach facilitates more iterative concept validation. After quickly building an initial prototype powered by the baseline model, developers can then collect niche training data and perform knowledge transfer via techniques like distillation to customize later versions; we will deep dive into the concept of distillation later in the book. The model’s primary foundation contains already encoded patterns useful for kickstarting and for iterations of new models.

In contrast, the predictive modeling approach requires upfront data gathering and training before any application testing. This more linear progression limits early-stage flexibility. However, predictive systems can efficiently learn specialized correlations and achieve a high level of confidence inference metrics once substantial data exists.

Leveraging versatile generative foundations supports rapid prototyping and use case exploration. But, later, custom predictive modeling boosts performance on narrow tasks with sufficient data. Blending these AI approaches capitalizes on their complementary strengths throughout the model deployment lifecycle.

Beyond the basic use – prompt engineering – of a foundational model, several auxiliary, more complex techniques can enhance its capabilities. Examples include Chain-of-Thought (CoT) and ReAct, which empower the model to not only reason about a situation but also define and evaluate a course of action.

ReAct, presented in the paper ReAct: Synergizing Reasoning and Acting in Language Models (https://arxiv.org/abs/2210.03629), addresses the current disconnect between LLMs’ language understanding and their ability to make decisions. While LLMs excel at tasks like comprehension and question answering, their reasoning and action-taking skills (for example, generating action plans or adapting to unforeseen situations) are often treated separately.

ReAct bridges this gap by prompting LLMs to generate both “reasoning traces,” detailing the model’s thought process, and task-specific actions in an interleaved manner. This tight coupling allows the model to leverage reasoning for planning, execution monitoring, and error handling, while simultaneously using actions to gather additional information from external sources like knowledge bases or environments. This integrated approach demonstrably improves LLM performance in both language and decision-making tasks.

For example, in question-answering and fact-verification tasks, ReAct combats common issues like hallucination and error propagation by utilizing a simple Wikipedia API. This interaction allows the model to generate more transparent and trustworthy solutions compared to methods lacking reasoning or action components. LLM hallucinations are defined as content generated that seems plausible yet factually unsupported. There are various papers that aim to address this phenomenon. For example, A survey of Hallucination in Large Language Models – Principles, Taxonomy, Challenges, and Open Questions deep dives into an approach to not only identify but also mitigate hallucinations. Another good example of a mitigation technique is covered in the paper Chain-of-Verification Reduces Hallucination in Large Language Models (https://arxiv.org/pdf/2309.11495.pdf). At the time of writing this book, hallucinations are a very rapidly changing field.

Both CoT and ReAct rely on prompting: feeding the LLM with carefully crafted instructions that guide its thought process. CoT, as presented in the paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (https://arxiv.org/abs/2201.11903), focuses on building a chain of reasoning steps, mimicking human thinking. Imagine prompting the model with: “I want to bake a cake. First, I need flour. Where can I find some?” The model responds with a potential source, like your pantry. This back-and-forth continues, building a logical chain of actions and decisions.

ReAct takes things a step further, integrating action into the reasoning loop. Think of it as a dynamic dance between thought and action. The LLM not only reasons about the situation but also interacts with the world, fetching information or taking concrete steps, and then updates its reasoning based on the results. It’s like the model simultaneously planning a trip and checking maps to adjust the route if it hits a roadblock.

This powerful synergy between reasoning and action unlocks a new realm of possibilities for LLMs. CoT and ReAct tackle challenges like error propagation (jumping to the wrong conclusions based on faulty assumptions) by allowing the model to trace its logic and correct course. They also improve transparency, making the LLM’s thought process clear and understandable.

In other words, large language models (LLMs) are like brilliant linguists, adept at understanding and generating text. But when it comes to real-world tasks demanding reasoning and action, they often stumble. Here’s where techniques like CoT and ReAct enter the scene, transforming LLMs into reasoning powerhouses.

Imagine an LLM helping diagnose a complex disease. CoT could guide it through a logical chain of symptoms and examinations, while ReAct could prompt it to consult medical databases or run simulations. This not only leads to more accurate diagnoses but also enables doctors to understand the LLM’s reasoning, fostering trust and collaboration.

These futuristic applications are what drive us to keep building and investing in this technology, which is very exciting. Before we dive deep into the patterns that are needed to leverage generative AI technology to generate business value, let’s take a step back and look at some initial concepts.

A change in the paradigm

It feels like eons ago in tech years, but let’s rewind just a couple of years, back when if you were embarking on solving an AI problem, you couldn’t default to utilizing a pre-trained model through the web or a managed endpoint. The process was meticulous – you’d have to first clearly define the specific use case, identify what data you had available and could collect to train a custom model, select the appropriate algorithm and model architecture, train the model using specialized hardware and software, and validate if the outputs would actually help solve the task at hand. If all went well, you would have a model that would take a predefined input and also provide a predefined output.

The paradigm profoundly shifted with the advent of LLMs and large multimodal models. Suddenly, you could access a pre-trained model with billions of parameters and start experimenting right off the bat with these versatile foundational models where the inputs and outputs are dynamic in nature. After tinkering around, you’d then evaluate if any fine-tuning is necessary to adapt the model to your needs, rather than pre-training an entire model from scratch. And spoiler alert – in most cases, chances are you won’t even need to fine-tune a foundational model.

Another key shift relates to the early belief that one model would outperform all others and solve all tasks. However, the model itself is just the engine; you still need an entire ecosystem packaged together to provide a complete solution. Foundational models have certainly demonstrated some incredible capabilities beyond initial expectations. But we also observe that certain models are better suited for certain tasks. And running the same prompt through other models can produce very different outputs depending on the underlying model’s training datasets and architecture.

So, the new experimental path often focuses first on prompt engineering, response evaluation, and then fine-tuning the foundational model if gaps exist. This contrasts sharply with the previous flow of data prep, training, and experimentation before you could get your hands dirty. The bar to start creating with AI has never been lower.

In the following sections, we will explore the difference between the development lifecycle of predictive AI and generative AI use cases. In each section, we have provided a high-level visual representation of a simplified development lifecycle and an explanation of the thought process behind each approach.

Predictive AI use case development – simplified lifecycle

Figure 1.1: Predictive AI use case development simplified lifecycle

Let’s dive into the process of developing a predictive AI model first. Everything starts with a good use case, and ROI (return on investment) is top of mind when evaluating AI use cases. Think about pain points in your business or industry that could be solved by predicting an outcome. It is very important to always keep an eye on feasibility – whether you can procure the data you need, etc.

Once you’ve landed on a compelling value-driven use case, next up is picking algorithms. You’ve got endless options here – decision trees, neural nets, regressions, random forests, and on and on. It is very important not to be swayed by the bias for the latest and greatest and to focus on the core requirements of your data and use case to narrow the options down. You can always switch it up or add additional experiments as you iterate through your testing.

With a plan in place, now it is time to get your hands dirty with the data. Identifying sources, cleaning things up, and carrying out feature engineering is an art and, more often than not, the key to improving your model’s results. There is no shortcut for rigor here, unfortunately! Garbage in, garbage out, as they say. But once you’ve wrangled datasets you can rely on, then comes the fun part.

It’s time to work with your model. Define your evaluation process upfront, split data wisely, and start training various configurations. Don’t forget to monitor and tune based on validation performance. Then, once you’ve got your golden model, implement robust serving infrastructure so it scales without a hitch.

But wait, not so fast! Testing doesn’t end when models are in production. Collect performance data continuously, monitor for concept drifts, and retrain when needed. A solid predictive model requires ongoing feedback mechanisms, as shown via the arrow connecting Model Enhancement to Testing in Figure 1.1. There is no such thing as set and forget in this space.

Generative AI use case development – simplified lifecycle

Figure 1.2: Generative AI use case development simplified lifecycle

The process of generative AI use case development is similar but not the same as in predictive AI; there are some common steps, but the order of tasks is different.

The first step is the ideation of potential use cases. This selection needs to be balanced with business needs as satisfying them is our main objective.

With a clear problem definition in place, extensive analysis of published model benchmarks often informs the selection of a robust foundational model best suited for the task. In this step, it is worth asking ourselves the question is this use case better suited for a predictive model?

As foundational models provide capabilities out of the box, initial testing comes as a step early in the process. A structured testing methodology helps reveal innate strengths, weaknesses, and quirks of a specific model. Both quantitative metrics and qualitative human evaluations fuel iterative improvement throughout the full development lifecycle.

The next step is to move to the art of prompt engineering. Prompting is the mechanism used to interact with LLMs. Techniques like chain-of-thought prompting, skeleton prompts, and retrieval augmentation build guardrails enabling more consistent, logical outputs.

If gaps remain after prompt optimization, model enhancement via fine-tuning and distillation offers a precision tool to adapt models closer to the target task.

In rare cases, pretraining a fully custom model from scratch is warranted when no existing model can viably serve the use case. However, it is important to keep in mind that due to the massive data requirements posed by model retraining, this task won’t be suitable for most use cases and teams; retraining a foundational model requires an extensive amount of data and processing power that makes the process unpractical from a financial and technical perspective.

Above all, the interplay between evaluation and model improvement underscores the deeply empirical nature of advancing generative AI responsibly. Testing often reveals that better solutions come from creativity in problem framing rather than pure technological advances alone.

Figure 1.3: Predictive and generative AI development lifecycle side-by-side comparison

As we can see from the preceding figure, the development lifecycle is an iterative process that enables us to realize value from a given use case and technology type. Across the rest of this chapter and this book, we are going to focus on generative AI general concepts, some that are going to be familiar if you are experienced in predictive AI and others that are specific to this new field in AI.

General generative AI concepts

When integrating generative AI into practical applications, it is important to have an understanding of concepts such as model architecture and training. In this section, we cover an overview of prominent concepts, including transformers, diffusion models, pre-training, and prompt engineering, that enable systems to generate impressively accurate text, images, audio, and more.

Understanding these core concepts will equip you to make informed decisions when selecting foundations for your use cases. However, putting models into production requires further architectural considerations. We will be highlighting these decision points in the rest of the chapters in the book and in practical examples.

Generative AI model architectures

Generative AI models are based on specialized neural network architectures optimized for generative tasks. The two more widely known models are transformers and diffusion models.

Transformer models are not a new concept. They were first introduced by Google in a 2017 paper called Attention Is All You Need (https://arxiv.org/pdf/1706.03762.pdf). The paper explains the Transformer neural network architecture, which is entirely based on attention mechanisms using the encoder and decoder concepts. This architecture enables models to identify relationships across an input text. By having these relationships, the model predicts the next token, leveraging its previous prediction as an input, creating this recursive loop to generate new content.

Diffusion models have drawn considerable interest as generative models due to their foundation in the physical processes of non-equilibrium thermodynamics. In physics, diffusion refers to the motion of particles from areas of high concentration to low concentration over time. Diffusion models try to mimic this concept in their training process. These models are trained through two phases: the forward diffusion process adds “noise” to the original training data, followed by a reverse conditioning process, which then learns how to remove noise in the reverse diffusion process. By learning this process, these models can produce samples by starting from pure noise and letting the reverse diffusion model clear away unnecessary “noise” and preserving the desired “generated” content.

Other types of deep learning architectures, such as Generative Adversarial Networks (GANs), allow you to generate synthetic data based on existing data. GANs are useful because they leverage two models: one to generate a synthetic output and another one that tries to predict if this output is real or fake.

Through this iterative process, we can generate data that is indistinguishable from the real data but different enough to be used to enhance our training datasets. Another example of data generation architectures is Variational Autoencoders (VAEs), which use an encoder-decoder approach to generate new data samples resembling their training datasets.

Techniques available to optimize foundational models

There are several techniques used to develop and optimize foundational models that have driven significant gains in AI capabilities, some of which are more complex than others from a technical and monetary perspective:

  • Pre-training refers to fully training a model on a large dataset. It allows models to learn very broad representations from billions of data points, which help the model adapt to other closely related tasks. Popular methods include contrastive self-supervised pre-training on unlabeled data and pre-training on vast supervised data like the internet.
  • Fine-tuning adapts a pre-trained model’s already learned feature representations to perform a specific task. This only tunes some higher-level model layers rather than training from scratch. On the other hand, adapter tuning equips models with small, lightweight adapters that can rapidly tune to new tasks without interfering with existing capabilities. These pluggable adapters give a parameter-efficient way of accumulating knowledge across multiple tasks by learning task-specific behaviors while reusing the bulk of model weights. They help mitigate forgetting previous tasks and simplify personalization. For example, models may first be pre-trained on billions of text webpages to acquire general linguistic knowledge, before being fine-tuned on more domain-specific datasets for question answering, classification, etc.
  • Distillation uses a “teacher” model to train a smaller “student” model to reproduce the performance of the larger pre-trained model at a lower cost and latency. Quantizing and compressing large models into efficient forms for deployments also helps optimize performance and cost.

The combination of comprehensive pre-training followed by specialized fine-tuning, adapter tuning, and portable distillation has enabled unprecedented versatility of deep learning across domains. Each approach smartly reuses and transfers available knowledge, enabling the customization and scaling of generative AI.

Techniques to augment your foundational model responses

In addition to architecture and training advances, progress in generative AI has been fueled by innovations in how these models are augmented by external data at inference time.

Prompt engineering tunes the text prompts provided to models to steer their generation quality, capabilities, and properties. Well-designed prompts guide the model to produce the desired output format, reduce ambiguity, and provide helpful contextual constraints. This allows simpler model architectures to solve complex problems by encoding human knowledge into the prompts.

Retrieval augmented generation, also known as RAG, enhances text generation through efficient retrieval of relevant knowledge from external stores. Models receive contextual pieces of information as “context” to be considered as additional input before generating its output. Grounding LLMs (large language models) refers to providing model-specific factual knowledge rather than just model parameters, enabling more accurate, knowledgeable, and specific language generation.

Together, these approaches augment basic predictive language models to become far more versatile, robust, and scalable. They reduce brittleness via tight integration of human knowledge and grounded learning rather than just statistical patterns. RAG handles the breadth and real-time retrieval of information, prompts provide depth and rules to the desired outputs, and grounding binds them to reality. We would highly encourage readers to get familiar with this topic, as it is an industry best practice to perform RAG and to ground your model to prevent it from hallucinating. A good start is the following paper: Retrieval-Augmented Generation for Large Language Models: A Survey (https://arxiv.org/pdf/2312.10997).

Constant evolution across the generative AI space

The generative AI space is characterized by relentless innovation and rapid advancement across model architectures, applications, and ethical considerations. As soon as one method or architecture shows promising results, hundreds of competing and complementary approaches emerge to push capabilities even further. Transformers gave way to BERT, which was outpaced by GPT-3, soon rivaled by image synthesizers like DALL-E, and now GPT-4 and Gemini are competing for the top spot. All of which happened in the past few years.

Meanwhile, we are seeing new modalities like audio, video, and 3D scene generation gaining vast popularity and usability. On the business front, new services are launched monthly, targeting media and entertainment, finance, healthcare, art, code, music, and more. However, considerations around ethics, access control, and legalities are key in order to maintain public trust.

One breakthrough enables several more, and each unlocks added potential. This self-fueling cycle arises from the very nature of AI – its ability to recursively assist innovation. The only certainty is that the field will look very different within months, not years. Maintaining awareness, responsiveness, and responsibility is critical amid this constant evolution.

Introducing generative AI integration patterns

Let’s now assume you already have a promising use case in mind. As I’m sure you would agree, clearly defining the use case is critical before proceeding further. You’ve already identified which foundational model provides acceptable performance for your needs. So now you’re starting to consider how GenAI fits into the application development process.

At a high level, there are two main workflows for integrating applications with GenAI. One is real time, where you’ll typically interact with an end user or AI agent, providing responses as prompts come in. The second is batch processing, where requirements are bundled up and processed in groups (batches).

A prime example of a real-time workflow would be a chatbot. Here, prompts from the user are processed and then sent to the model and the responses are returned immediately, as you need to consume the outputs without delay. On the other hand, consider a data enrichment use case for batch processing. You could collect multiple data points over time for later consumption after being enriched by the model in batches.

In this book, we will explore these integration patterns through practical examples. This will help you to obtain hands-on experience with GenAI-driven applications and allow you to integrate these patterns in your own use cases.

By “integration pattern,” we refer to a standardized architectural approach for incorporating a technology into your application or system. In this context, integration patterns provide proven methods for connecting generative AI models to real-world software.

There are a few key reasons why we need integration patterns when working with generative AI:

  • Time savings: Following established patterns allows developers to avoid reinventing the wheel for common integration challenges. This accelerates time to value.
  • Improving quality: Leveraging best practices encoded in integration patterns leads to more robust, production-grade integrations. Things like scalability, security, and reliability are top of mind.
  • Reducing risk: Well-defined integration patterns enable developers to mitigate risks around performance, costs, and other pitfalls that can emerge when integrating new technologies.

Overall, integration patterns deliver templates and guardrails, so developers don’t have to start integration efforts from scratch. By relying on proven blueprints, readers can integrate generative AI more efficiently while avoiding common mistakes. This speeds up development cycles significantly and sets integrations up for long-term success.

Summary

In this chapter, we covered an overview of key concepts, techniques, and integration patterns related to generative AI. You now have a high-level background on prominent generative AI model architectures like transformers and diffusion models, as well as various methods for developing and enhancing these models, covering pre-training, fine-tuning, adapter tuning, distillation, prompt engineering, retrieval augmented generation, and grounding.

We discussed how rapid innovation in generative AI leads to constant evolution, with new models and capabilities emerging at a fast pace. It emphasizes the need to keep pace with progress while ensuring ethical, responsible development.

Finally, we introduced common integration patterns for connecting generative AI to real-world applications, considering real-time use cases like chatbots as well as batch processing for data enrichment. Real examples were provided to demonstrate workflows for integrating generative models into production systems.

Innovation in AI has a very fast pace, demanding constant awareness, swift experimentation, and a responsible approach to harnessing the latest advances. This is particularly evident in the field of generative AI, where we’re witnessing a paradigm shift in AI-powered applications that allows for faster experimentation and development.

A wide array of techniques has emerged to enhance models’ capabilities and efficiency. These include pre-training, adapter tuning, distillation, and prompt engineering, each offering unique advantages in different scenarios. When it comes to integrating these AI models into practical applications, key patterns have emerged for both real-time workflows, such as chatbots, and batch processing tasks like data enrichment.

The art of crafting well-designed prompts has become crucial in constraining and steering model outputs effectively. Additionally, techniques like retrieval augmentation and grounding have proven invaluable in improving the accuracy of AI-generated content. The potential in blending predictive and generative approaches is a very interesting space. This combination leverages the strengths of both methodologies, allowing for custom modeling where sufficient data exists while utilizing generative foundations to enable rapid prototyping and innovation.

These core concepts empower informed decision-making when architecting generative AI systems. The integration patterns offer blueprints for connecting models to practical applications across diverse domains.

Harnessing the power of LLMs begins with identifying the right use cases where they can drive value for your business. In the next chapter, we will present a framework and examples for categorizing LLM use cases based on projected business value.

In the next chapter, we will explore identifying use cases that can be solved with Generative AI.

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://packt.link/genpat

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get familiar with the most important tools and concepts used in real scenarios to design GenAI apps
  • Interact with GenAI models to tailor model behavior to minimize hallucinations
  • Get acquainted with a variety of strategies and an easy to follow 4 step frameworks for integrating GenAI into applications

Description

Explore the transformative potential of GenAI in the application development lifecycle. Through concrete examples, you will go through the process of ideation and integration, understanding the tradeoffs and the decision points when integrating GenAI. With recent advances in models like Google Gemini, Anthropic Claude, DALL-E and GPT-4o, this timely resource will help you harness these technologies through proven design patterns. We then delve into the practical applications of GenAI, identifying common use cases and applying design patterns to address real-world challenges. From summarization and metadata extraction to intent classification and question answering, each chapter offers practical examples and blueprints for leveraging GenAI across diverse domains and tasks. You will learn how to fine-tune models for specific applications, progressing from basic prompting to sophisticated strategies such as retrieval augmented generation (RAG) and chain of thought. Additionally, we provide end-to-end guidance on operationalizing models, including data prep, training, deployment, and monitoring. We also focus on responsible and ethical development techniques for transparency, auditing, and governance as crucial design patterns.

Who is this book for?

This book is not an introduction to AI/ML or Python. It offers practical guides for designing, building, and deploying GenAI applications in production. While all readers are welcome, those who benefit most include: Developer engineers with foundational tech knowledge Software architects seeking best practices and design patterns Professionals using ML for data science, research, etc., who want a deeper understanding of Generative AI Technical product managers with a software development background This concise focus ensures practical, actionable insights for experienced professionals

What you will learn

  • Concepts of GenAI: pre-training, fine-tuning, prompt engineering, and RAG
  • Framework for integrating AI: entry points, prompt pre-processing, inference, post-processing, and presentation
  • Patterns for batch and real-time integration
  • Code samples for metadata extraction, summarization, intent classification, question-answering with RAG, and more
  • Ethical use: bias mitigation, data privacy, and monitoring
  • Deployment and hosting options for GenAI models
Estimated delivery fee Deliver to Taiwan

Standard delivery 10 - 13 business days

$12.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 05, 2024
Length: 218 pages
Edition : 1st
Language : English
ISBN-13 : 9781835887608
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Taiwan

Standard delivery 10 - 13 business days

$12.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Publication date : Sep 05, 2024
Length: 218 pages
Edition : 1st
Language : English
ISBN-13 : 9781835887608
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 98.96 142.97 44.01 saved
Generative AI Application Integration Patterns
$34.98 $49.99
Python Natural Language Processing Cookbook
$30.99 $44.99
AI-Assisted Programming for Web and Machine Learning
$32.99 $47.99
Total $ 98.96 142.97 44.01 saved Stars icon
Banner background image

Table of Contents

12 Chapters
Introduction to Generative AI Patterns Chevron down icon Chevron up icon
Identifying Generative AI Use Cases Chevron down icon Chevron up icon
Designing Patterns for Interacting with Generative AI Chevron down icon Chevron up icon
Generative AI Batch and Real-Time Integration Patterns Chevron down icon Chevron up icon
Integration Pattern: Batch Metadata Extraction Chevron down icon Chevron up icon
Integration Pattern: Batch Summarization Chevron down icon Chevron up icon
Integration Pattern: Real-Time Intent Classification Chevron down icon Chevron up icon
Integration Pattern: Real-Time Retrieval Augmented Generation Chevron down icon Chevron up icon
Operationalizing Generative AI Integration Patterns Chevron down icon Chevron up icon
Embedding Responsible AI into Your GenAI Applications Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela