Building a custom RAG application
As we’ve discussed previously, an RAG application consists of two parts – retrieval and generation. Let’s start with retrieval and take a look at important interfaces exposed by LangChain:
langchain_core.documents.base.Document
: It’s aSerializable
object that holds content and metadata. The content is the text of the document (that the downstream chain is dealing with), and the metadata is a collection of key-value pairs that can be associated with the document (you can put filename, size, and any other attributes you’d like there):from langchain_core.documents import Document doc = Document(page_content="my page", metadata={"source_id": "example.pdf", "page": 1}) print(doc.page_content)
langchain_core.retrievers.BaseRetriever
: It’s an abstract class for retrievers. When you invoke a retriever, it returns a list ofDocument
objects...