Ingesting documents with LangChain
The main way of ingesting documents to be used by a generative AI application is using the DocumentLoader
interface provided by LangChain.
Document loaders facilitate the extraction of data from various sources, transforming them into LangChain’s Document
class. As discussed in previous chapters, a Document
encapsulates both text content and associated metadata. This versatility allows for the integration of diverse data types, from simple text files and web page content to even transcripts of YouTube videos.
Each document loader features a load
method for eagerly loading data as documents from a specified source, and optionally, a lazy_load
method for deferred loading to optimize memory usage [1] when dealing with large document collections.
It’s important to note that neither the load
nor lazy_load
methods accept additional arguments. This applies to all DocumentLoader
subclasses, as configuration parameters must be provided...