Ingesting Documents
This chapter explores the range of document parsing capabilities available within Google Cloud and LangChain. Document ingestion is an integral part of RAG applications, providing the knowledge base that will be used to answer questions and generate text.
We’ll explain the use of LangChain document loaders to parse documents of different formats, and examine both pre-trained and custom-built Document AI parsers. Additionally, we will cover a variety of out-of-the-box techniques that Vertex AI Agent Builder offers for ingesting content from different external sources.
We will discuss the following main topics:
- Ingesting documents with LangChain
- Document chunking
- Parsing documents with Document AI
- Ingesting data with Vertex AI Agent Builder