Summary
In this chapter, we discussed the importance of evaluating GenAI applications and learned the difference between pairwise and pointwise evaluators. We also learned how to use native LangChain evaluators, utilize LangSmith for debugging our applications’ performance, and use Vertex AI evaluation capabilities with LangChain.
As we’ve highlighted, from our perspective, one of the most important (and often underappreciated) things to take care of before moving to production is creating a robust evaluation pipeline.
In the next chapter, we’ll look at the further aspects of preparing your GenAI application for production deployment and putting it in front of actual users.